feat(dc/tool-c): Add missing value imputation feature with 6 methods and MICE
Major features: 1. Missing value imputation (6 simple methods + MICE): - Mean/Median/Mode/Constant imputation - Forward fill (ffill) and Backward fill (bfill) for time series - MICE multivariate imputation (in progress, shape issue to fix) 2. Auto precision detection: - Automatically match decimal places of original data - Prevent false precision (e.g. 13.57 instead of 13.566716417910449) 3. Categorical variable detection: - Auto-detect and skip categorical columns in MICE - Show warnings for unsuitable columns - Suggest mode imputation for categorical data 4. UI improvements: - Rename button: "Delete Missing" to "Missing Value Handling" - Remove standalone "Dedup" and "MICE" buttons - 3-tab dialog: Delete / Fill / Advanced Fill - Display column statistics and recommended methods - Extended warning messages (8 seconds for skipped columns) 5. Bug fixes: - Fix sessionService.updateSessionData -> saveProcessedData - Fix OperationResult interface (add message and stats) - Fix Toolbar button labels and removal Modified files: Python: operations/fillna.py (new, 556 lines), main.py (3 new endpoints) Backend: QuickActionService.ts, QuickActionController.ts, routes/index.ts Frontend: MissingValueDialog.tsx (new, 437 lines), Toolbar.tsx, index.tsx Tests: test_fillna_operations.py (774 lines), test scripts and docs Docs: 5 documentation files updated Known issues: - MICE imputation has DataFrame shape mismatch issue (under debugging) - Workaround: Use 6 simple imputation methods first Status: Development complete, MICE debugging in progress Lines added: ~2000 lines across 3 tiers
This commit is contained in:
@@ -1,10 +1,10 @@
|
||||
# 工具C 功能按钮开发计划 V1.0
|
||||
|
||||
**文档版本**: V1.2 (Phase 2 完成版)
|
||||
**文档版本**: V1.4 (Phase 2+ 缺失值填补功能开发版)
|
||||
**创建日期**: 2025-12-08
|
||||
**最后更新**: 2025-12-08
|
||||
**最后更新**: 2025-12-10
|
||||
**负责人**: AI开发团队
|
||||
**项目状态**: ✅ Phase 1-2 已完成,7个核心功能可用
|
||||
**项目状态**: ✅ Phase 1-2 已完成,7个核心功能 + NA处理优化 + Pivot优化 + 缺失值填补(开发完成,MICE待调试)
|
||||
|
||||
---
|
||||
|
||||
@@ -109,16 +109,15 @@
|
||||
|
||||
| 分组 | 功能 | 优先级 | 开发状态 |
|
||||
|------|------|--------|---------|
|
||||
| **样本筛选** | 高级筛选器 | P0 | ✅ 已完成 |
|
||||
| **变量转换** | 数值映射(重编码)| P0 | ✅ 已完成 |
|
||||
| | 生成分类变量(分箱)| P0 | ✅ 已完成 |
|
||||
| | 条件生成列 | P0 | ✅ 已完成 |
|
||||
| | 生成新变量(计算列)| P1 | ✅ 已完成 |
|
||||
| **数据清理** | 删除缺失值 | P0 | ✅ 已完成 |
|
||||
| | 去重 | P1 | ⏸️ 暂不开发 |
|
||||
| **数据重塑** | 长表→宽表(Pivot)| P1 | ✅ 已完成 |
|
||||
| **高级功能** | 缺失值填补 | P1 | 待开发 |
|
||||
| | 多重插补(MICE)| P0 | 待开发 |
|
||||
| **样本筛选** | 高级筛选器 | P0 | ✅ 已完成(+为空/不为空条件)|
|
||||
| **变量转换** | 数值映射(重编码)| P0 | ✅ 已完成(+NA处理选项)|
|
||||
| | 生成分类变量(分箱)| P0 | ✅ 已完成(+NA处理选项)|
|
||||
| | 条件生成列 | P0 | ✅ 已完成(+为空/不为空条件)|
|
||||
| | 生成新变量(计算列)| P1 | ✅ 已完成(方案B:安全列名映射)|
|
||||
| **数据清理** | 缺失值处理(删除+填补)| P0 | ✅ 已完成(6种简单填补+MICE,MICE待调试)|
|
||||
| | 去重 | P1 | ⏸️ 已移除(用户需求)|
|
||||
| **数据重塑** | 长表→宽表(Pivot)| P1 | ✅ 已完成(+保留未选列+原始列顺序)|
|
||||
| **高级功能** | 多重插补(MICE)| P0 | 🚧 已集成到缺失值处理(待调试)|
|
||||
|
||||
**优先级说明**:
|
||||
- **P0**:核心功能,Phase 1-2 必须完成
|
||||
@@ -952,6 +951,8 @@ print(f'插补完成,剩余缺失值: {df[cols_to_impute].isna().sum().sum()}'
|
||||
| V1.0 | 2025-12-08 | 初版,规划Phase 1-4功能 |
|
||||
| V1.1 | 2025-12-08 | 架构重构:改为预写Python函数 |
|
||||
| V1.2 | 2025-12-08 | Phase 1-2完成:7个核心功能上线 |
|
||||
| V1.3 | 2025-12-10 | NA处理优化:4个功能支持空值处理;Pivot优化:保留未选列+原始列顺序;计算列方案B实施:安全列名映射;UX优化:列头tooltip+预览提示可关闭+滚动条优化 |
|
||||
| V1.4 | 2025-12-10 | 缺失值填补功能开发:6种简单填补(均值/中位数/众数/固定值/前向/后向)+MICE多重插补;自动精度检测;分类列识别;功能按钮调整(删除"去重"和"多重插补","删除缺失值"改为"缺失值处理");状态:开发完成,MICE的DataFrame shape问题待调试 |
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user