feat(dc/tool-c): Add missing value imputation feature with 6 methods and MICE

Major features:
1. Missing value imputation (6 simple methods + MICE):
   - Mean/Median/Mode/Constant imputation
   - Forward fill (ffill) and Backward fill (bfill) for time series
   - MICE multivariate imputation (in progress, shape issue to fix)

2. Auto precision detection:
   - Automatically match decimal places of original data
   - Prevent false precision (e.g. 13.57 instead of 13.566716417910449)

3. Categorical variable detection:
   - Auto-detect and skip categorical columns in MICE
   - Show warnings for unsuitable columns
   - Suggest mode imputation for categorical data

4. UI improvements:
   - Rename button: "Delete Missing" to "Missing Value Handling"
   - Remove standalone "Dedup" and "MICE" buttons
   - 3-tab dialog: Delete / Fill / Advanced Fill
   - Display column statistics and recommended methods
   - Extended warning messages (8 seconds for skipped columns)

5. Bug fixes:
   - Fix sessionService.updateSessionData -> saveProcessedData
   - Fix OperationResult interface (add message and stats)
   - Fix Toolbar button labels and removal

Modified files:
Python: operations/fillna.py (new, 556 lines), main.py (3 new endpoints)
Backend: QuickActionService.ts, QuickActionController.ts, routes/index.ts
Frontend: MissingValueDialog.tsx (new, 437 lines), Toolbar.tsx, index.tsx
Tests: test_fillna_operations.py (774 lines), test scripts and docs
Docs: 5 documentation files updated

Known issues:
- MICE imputation has DataFrame shape mismatch issue (under debugging)
- Workaround: Use 6 simple imputation methods first

Status: Development complete, MICE debugging in progress
Lines added: ~2000 lines across 3 tiers
This commit is contained in:
2025-12-10 13:06:00 +08:00
parent f4f1d09837
commit 74cf346453
102 changed files with 3806 additions and 181 deletions

View File

@@ -0,0 +1,98 @@
# 🚀 快速开始 - 1分钟运行测试
## Windows用户
### 方法1双击运行最简单
1. 双击 `run_tests.bat`
2. 等待测试完成
### 方法2命令行
```cmd
cd AIclinicalresearch\tests
run_tests.bat
```
---
## Linux/Mac用户
```bash
cd AIclinicalresearch/tests
chmod +x run_tests.sh
./run_tests.sh
```
---
## ⚠️ 前提条件
**必须先启动Python服务**
```bash
# 打开新终端
cd AIclinicalresearch/extraction_service
python main.py
```
看到这行表示启动成功:
```
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8001
```
---
## 📊 预期结果
**全部通过**
```
总测试数: 18
✅ 通过: 18
❌ 失败: 0
通过率: 100.0%
🎉 所有测试通过!
```
⚠️ **部分失败**
- 查看红色错误信息
- 检查失败的具体测试
- 查看Python服务日志
---
## 🎯 测试内容
- ✅ 6种简单填补方法均值、中位数、众数、固定值、前向填充、后向填充
- ✅ MICE多重插补单列、多列
- ✅ 边界情况100%缺失、0%缺失、特殊字符)
- ✅ 各种数据类型(数值、分类、混合)
- ✅ 性能测试1000行数据
---
## 💡 提示
- **第一次运行**会自动安装依赖pandas, numpy, requests
- **测试时间**约 45-60 秒
- **测试数据**自动生成,无需手动准备
- **颜色输出**:绿色=通过,红色=失败,黄色=警告
---
## 🆘 遇到问题?
### 问题1无法连接到服务
**解决**确保Python服务在运行`python main.py`
### 问题2依赖安装失败
**解决**:手动安装 `pip install pandas numpy requests`
### 问题3测试失败
**解决**:查看错误信息,检查代码逻辑
---
**准备好了吗?启动服务,运行测试!** 🚀