Files
AIclinicalresearch/backend/migrations/add_data_stats_to_tool_c_session.sql
HaHafeng 74cf346453 feat(dc/tool-c): Add missing value imputation feature with 6 methods and MICE
Major features:
1. Missing value imputation (6 simple methods + MICE):
   - Mean/Median/Mode/Constant imputation
   - Forward fill (ffill) and Backward fill (bfill) for time series
   - MICE multivariate imputation (in progress, shape issue to fix)

2. Auto precision detection:
   - Automatically match decimal places of original data
   - Prevent false precision (e.g. 13.57 instead of 13.566716417910449)

3. Categorical variable detection:
   - Auto-detect and skip categorical columns in MICE
   - Show warnings for unsuitable columns
   - Suggest mode imputation for categorical data

4. UI improvements:
   - Rename button: "Delete Missing" to "Missing Value Handling"
   - Remove standalone "Dedup" and "MICE" buttons
   - 3-tab dialog: Delete / Fill / Advanced Fill
   - Display column statistics and recommended methods
   - Extended warning messages (8 seconds for skipped columns)

5. Bug fixes:
   - Fix sessionService.updateSessionData -> saveProcessedData
   - Fix OperationResult interface (add message and stats)
   - Fix Toolbar button labels and removal

Modified files:
Python: operations/fillna.py (new, 556 lines), main.py (3 new endpoints)
Backend: QuickActionService.ts, QuickActionController.ts, routes/index.ts
Frontend: MissingValueDialog.tsx (new, 437 lines), Toolbar.tsx, index.tsx
Tests: test_fillna_operations.py (774 lines), test scripts and docs
Docs: 5 documentation files updated

Known issues:
- MICE imputation has DataFrame shape mismatch issue (under debugging)
- Workaround: Use 6 simple imputation methods first

Status: Development complete, MICE debugging in progress
Lines added: ~2000 lines across 3 tiers
2025-12-10 13:06:00 +08:00

29 lines
761 B
SQL
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
-- 为 DcToolCSession 添加 dataStats 字段
-- 用于缓存数据统计信息,支持数据探索问答功能
--
-- 执行方法psql -d airesearch_v2 -f add_data_stats_to_tool_c_session.sql
\c airesearch_v2
-- 添加字段
ALTER TABLE dc_schema.dc_tool_c_sessions
ADD COLUMN IF NOT EXISTS data_stats JSONB NULL;
-- 添加注释
COMMENT ON COLUMN dc_schema.dc_tool_c_sessions.data_stats IS '数据统计信息缓存(用于数据探索问答):{totalRows, columnStats}';
-- 验证
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = 'dc_schema'
AND table_name = 'dc_tool_c_sessions'
AND column_name = 'data_stats';
\echo '✅ 字段 data_stats 已成功添加到 dc_tool_c_sessions 表'