feat(dc/tool-c): Add missing value imputation feature with 6 methods and MICE

Major features:
1. Missing value imputation (6 simple methods + MICE):
   - Mean/Median/Mode/Constant imputation
   - Forward fill (ffill) and Backward fill (bfill) for time series
   - MICE multivariate imputation (in progress, shape issue to fix)

2. Auto precision detection:
   - Automatically match decimal places of original data
   - Prevent false precision (e.g. 13.57 instead of 13.566716417910449)

3. Categorical variable detection:
   - Auto-detect and skip categorical columns in MICE
   - Show warnings for unsuitable columns
   - Suggest mode imputation for categorical data

4. UI improvements:
   - Rename button: "Delete Missing" to "Missing Value Handling"
   - Remove standalone "Dedup" and "MICE" buttons
   - 3-tab dialog: Delete / Fill / Advanced Fill
   - Display column statistics and recommended methods
   - Extended warning messages (8 seconds for skipped columns)

5. Bug fixes:
   - Fix sessionService.updateSessionData -> saveProcessedData
   - Fix OperationResult interface (add message and stats)
   - Fix Toolbar button labels and removal

Modified files:
Python: operations/fillna.py (new, 556 lines), main.py (3 new endpoints)
Backend: QuickActionService.ts, QuickActionController.ts, routes/index.ts
Frontend: MissingValueDialog.tsx (new, 437 lines), Toolbar.tsx, index.tsx
Tests: test_fillna_operations.py (774 lines), test scripts and docs
Docs: 5 documentation files updated

Known issues:
- MICE imputation has DataFrame shape mismatch issue (under debugging)
- Workaround: Use 6 simple imputation methods first

Status: Development complete, MICE debugging in progress
Lines added: ~2000 lines across 3 tiers
This commit is contained in:
2025-12-10 13:06:00 +08:00
parent f4f1d09837
commit 74cf346453
102 changed files with 3806 additions and 181 deletions

View File

@@ -1,10 +1,10 @@
# 工具C 功能按钮开发计划 V1.0
**文档版本**: V1.2 (Phase 2 完成版)
**文档版本**: V1.4 (Phase 2+ 缺失值填补功能开发版)
**创建日期**: 2025-12-08
**最后更新**: 2025-12-08
**最后更新**: 2025-12-10
**负责人**: AI开发团队
**项目状态**: ✅ Phase 1-2 已完成7个核心功能可用
**项目状态**: ✅ Phase 1-2 已完成7个核心功能 + NA处理优化 + Pivot优化 + 缺失值填补开发完成MICE待调试
---
@@ -109,16 +109,15 @@
| 分组 | 功能 | 优先级 | 开发状态 |
|------|------|--------|---------|
| **样本筛选** | 高级筛选器 | P0 | ✅ 已完成 |
| **变量转换** | 数值映射(重编码)| P0 | ✅ 已完成 |
| | 生成分类变量(分箱)| P0 | ✅ 已完成 |
| | 条件生成列 | P0 | ✅ 已完成 |
| | 生成新变量(计算列)| P1 | ✅ 已完成 |
| **数据清理** | 删除缺失值 | P0 | ✅ 已完成 |
| | 去重 | P1 | ⏸️ 暂不开发 |
| **数据重塑** | 长表→宽表Pivot| P1 | ✅ 已完成 |
| **高级功能** | 缺失值填补 | P1 | 待开发 |
| | 多重插补MICE| P0 | 待开发 |
| **样本筛选** | 高级筛选器 | P0 | ✅ 已完成+为空/不为空条件)|
| **变量转换** | 数值映射(重编码)| P0 | ✅ 已完成+NA处理选项|
| | 生成分类变量(分箱)| P0 | ✅ 已完成+NA处理选项|
| | 条件生成列 | P0 | ✅ 已完成+为空/不为空条件)|
| | 生成新变量(计算列)| P1 | ✅ 已完成方案B安全列名映射|
| **数据清理** | 缺失值处理(删除+填补)| P0 | ✅ 已完成6种简单填补+MICEMICE待调试|
| | 去重 | P1 | ⏸️ 已移除(用户需求)|
| **数据重塑** | 长表→宽表Pivot| P1 | ✅ 已完成+保留未选列+原始列顺序)|
| **高级功能** | 多重插补MICE| P0 | 🚧 已集成到缺失值处理(待调试)|
**优先级说明**
- **P0**核心功能Phase 1-2 必须完成
@@ -952,6 +951,8 @@ print(f'插补完成,剩余缺失值: {df[cols_to_impute].isna().sum().sum()}'
| V1.0 | 2025-12-08 | 初版规划Phase 1-4功能 |
| V1.1 | 2025-12-08 | 架构重构改为预写Python函数 |
| V1.2 | 2025-12-08 | Phase 1-2完成7个核心功能上线 |
| V1.3 | 2025-12-10 | NA处理优化4个功能支持空值处理Pivot优化保留未选列+原始列顺序计算列方案B实施安全列名映射UX优化列头tooltip+预览提示可关闭+滚动条优化 |
| V1.4 | 2025-12-10 | 缺失值填补功能开发6种简单填补均值/中位数/众数/固定值/前向/后向)+MICE多重插补自动精度检测分类列识别功能按钮调整删除"去重"和"多重插补""删除缺失值"改为"缺失值处理"状态开发完成MICE的DataFrame shape问题待调试 |
---