Commit Graph

8 Commits

Author SHA1 Message Date
19f9c5ea93 docs(deployment): Fix 8 critical deployment issues and enhance documentation
Summary of fixes:
- Fix service discovery address (change .sae domain to internal IP)
- Unify timezone configuration (Asia/Shanghai for all services)
- Enhance ECS security group configuration (Redis/Weaviate port binding)
- Add image pull strategy best practices
- Add Python service memory management guidelines
- Update Dify API Key deployment strategy (avoid deadlock)
- Add SSH tunnel for RDS database access
- Add NAT gateway cost optimization explanation

Modified files (7 docs):
- 00-部署架构总览.md (enhanced with 7 sections)
- 03-Dify-ECS部署完全指南.md (security hardening)
- 04-Python微服务-SAE容器部署指南.md (timezone + service discovery)
- 05-Node.js后端-SAE容器部署指南.md (timezone configuration)
- PostgreSQL部署策略-摸底报告.md (timezone best practice)
- 07-关键配置补充说明.md (3 new sections)
- 08-部署检查清单.md (service address fix)

New files:
- 文档修正报告-20251214.md (comprehensive fix report)
- Review documents from technical team

Impact:
- Fixed 3 P0/P1 critical issues (100% connection failure risk)
- Fixed 3 P2 important issues (stability and maintainability)
- Added 2 P3 best practices (developer convenience)

Status: All deployment documents reviewed and corrected, ready for production deployment
2025-12-14 13:25:28 +08:00
fa72beea6c feat(platform): Complete Postgres-Only architecture refactoring (Phase 1-7)
Major Changes:
- Implement Platform-Only architecture pattern (unified task management)
- Add PostgresCacheAdapter for unified caching (platform_schema.app_cache)
- Add PgBossQueue for job queue management (platform_schema.job)
- Implement CheckpointService using job.data (generic for all modules)
- Add intelligent threshold-based dual-mode processing (THRESHOLD=50)
- Add task splitting mechanism (auto chunk size recommendation)
- Refactor ASL screening service with smart mode selection
- Refactor DC extraction service with smart mode selection
- Register workers for ASL and DC modules

Technical Highlights:
- All task management data stored in platform_schema.job.data (JSONB)
- Business tables remain clean (no task management fields)
- CheckpointService is generic (shared by all modules)
- Zero code duplication (DRY principle)
- Follows 3-layer architecture principle
- Zero additional cost (no Redis needed, save 8400 CNY/year)

Code Statistics:
- New code: ~1750 lines
- Modified code: ~500 lines
- Test code: ~1800 lines
- Documentation: ~3000 lines

Testing:
- Unit tests: 8/8 passed
- Integration tests: 2/2 passed
- Architecture validation: passed
- Linter errors: 0

Files:
- Platform layer: PostgresCacheAdapter, PgBossQueue, CheckpointService, utils
- ASL module: screeningService, screeningWorker
- DC module: ExtractionController, extractionWorker
- Tests: 11 test files
- Docs: Updated 4 key documents

Status: Phase 1-7 completed, Phase 8-9 pending
2025-12-13 16:10:04 +08:00
200eab5c2e feat(dc-tool-c): Tool C UX重大改进 - 列头筛选/行号/滚动条/全量数据
新功能
- 列头筛选:Excel风格筛选功能(Community版本,中文本地化,显示唯一值及计数)
- 行号列:添加固定行号列(#列头,灰色背景,左侧固定)
- 全量数据加载:不再限制50行预览,Session加载全量数据
- 全量数据返回:所有快速操作(筛选/映射/分箱/条件/删NA/计算/Pivot)全量返回结果

 Bug修复
- 滚动条终极修复:修改MainLayout为固定高度(h-screen + overflow-hidden),整个浏览器窗口无滚动条,只有AG Grid内部滚动
- 计算列全角字符修复:自动转换中文括号等全角字符为半角
- 计算列特殊字符列名修复:完善列别名机制,支持任意特殊字符列名

 UI优化
- 删除'表格仅展示前50行'提示条,减少干扰
- 筛选对话框美化:白色背景,圆角,阴影
- 列头筛选图标优化:清晰可见,易于点击

 文档更新
- 工具C_功能按钮开发计划_V1.0.md:添加V1.5版本记录
- 工具C_MVP开发_TODO清单.md:添加Day 8 UX优化内容
- 00-工具C当前状态与开发指南.md:更新进度为98%
- 00-模块当前状态与开发指南.md:更新DC模块状态
- 00-系统当前状态与开发指南.md:更新系统整体状态

 影响范围
- Python微服务:无修改
- Node.js后端:5处代码修改(SessionService + QuickActionController + AICodeService)
- 前端:MainLayout + DataGrid + ag-grid-custom.css + index.tsx
- 完成度:Tool C整体完成度提升至98%

 代码统计
- 修改文件:~15个文件
- 新增行数:~200行
- 修改行数:~150行

Co-authored-by: AI Assistant <assistant@example.com>
2025-12-10 18:02:42 +08:00
74cf346453 feat(dc/tool-c): Add missing value imputation feature with 6 methods and MICE
Major features:
1. Missing value imputation (6 simple methods + MICE):
   - Mean/Median/Mode/Constant imputation
   - Forward fill (ffill) and Backward fill (bfill) for time series
   - MICE multivariate imputation (in progress, shape issue to fix)

2. Auto precision detection:
   - Automatically match decimal places of original data
   - Prevent false precision (e.g. 13.57 instead of 13.566716417910449)

3. Categorical variable detection:
   - Auto-detect and skip categorical columns in MICE
   - Show warnings for unsuitable columns
   - Suggest mode imputation for categorical data

4. UI improvements:
   - Rename button: "Delete Missing" to "Missing Value Handling"
   - Remove standalone "Dedup" and "MICE" buttons
   - 3-tab dialog: Delete / Fill / Advanced Fill
   - Display column statistics and recommended methods
   - Extended warning messages (8 seconds for skipped columns)

5. Bug fixes:
   - Fix sessionService.updateSessionData -> saveProcessedData
   - Fix OperationResult interface (add message and stats)
   - Fix Toolbar button labels and removal

Modified files:
Python: operations/fillna.py (new, 556 lines), main.py (3 new endpoints)
Backend: QuickActionService.ts, QuickActionController.ts, routes/index.ts
Frontend: MissingValueDialog.tsx (new, 437 lines), Toolbar.tsx, index.tsx
Tests: test_fillna_operations.py (774 lines), test scripts and docs
Docs: 5 documentation files updated

Known issues:
- MICE imputation has DataFrame shape mismatch issue (under debugging)
- Workaround: Use 6 simple imputation methods first

Status: Development complete, MICE debugging in progress
Lines added: ~2000 lines across 3 tiers
2025-12-10 13:06:00 +08:00
f4f1d09837 feat(dc/tool-c): Add pivot column ordering and NA handling features
Major features:
1. Pivot transformation enhancements:
   - Add option to keep unselected columns with 3 aggregation methods
   - Maintain original column order after pivot (aligned with source file)
   - Preserve pivot value order (first appearance order)

2. NA handling across 4 core functions:
   - Recode: Support keep/map/drop for NA values
   - Filter: Already supports is_null/not_null operators
   - Binning: Support keep/label/assign for NA values (fix nan display)
   - Conditional: Add is_null/not_null operators

3. UI improvements:
   - Enable column header tooltips with custom header component
   - Add closeable alert for 50-row preview
   - Fix page scrollbar issues

Modified files:
Python: pivot.py, recode.py, binning.py, conditional.py, main.py
Backend: SessionController, QuickActionController, QuickActionService
Frontend: PivotDialog, RecodeDialog, BinningDialog, ConditionalDialog, DataGrid, index

Status: Ready for testing
2025-12-09 14:40:14 +08:00
75ceeb0653 hotfix(dc/tool-c): Fix compute formula validation and binning NaN serialization
Critical fixes:
1. Compute column: Add Chinese comma support in formula validation
   - Problem: Formula with Chinese comma failed validation
   - Fix: Add Chinese comma character to allowed_chars regex
   - Example: Support formulas like 'col1(kg)+ col2,col3'

2. Binning operation: Fix NaN serialization error
   - Problem: 'Out of range float values are not JSON compliant: nan'
   - Fix: Enhanced NaN/inf handling in binning endpoint
   - Added np.inf/-np.inf replacement before JSON serialization
   - Added manual JSON serialization with NaN->null conversion

3. Enhanced all operation endpoints for consistency
   - Updated conditional, dropna endpoints with same NaN/inf handling
   - Ensures all operations return JSON-compliant data

Modified files:
- extraction_service/operations/compute.py: Add Chinese comma to regex
- extraction_service/main.py: Enhanced NaN handling in binning/conditional/dropna

Status: Hotfix complete, ready for testing
2025-12-09 08:45:27 +08:00
91cab452d1 fix(dc/tool-c): Fix special character handling and improve UX
Major fixes:
- Fix pivot transformation with special characters in column names
- Fix compute column validation for Chinese punctuation
- Fix recode dialog to fetch unique values from full dataset via new API
- Add column mapping mechanism to handle special characters

Database migration:
- Add column_mapping field to dc_tool_c_sessions table
- Migration file: 20251208_add_column_mapping

UX improvements:
- Darken table grid lines for better visibility
- Reduce column width by 40% with tooltip support
- Insert new columns next to source columns
- Preserve original row order after operations
- Add notice about 50-row preview limit

Modified files:
- Backend: SessionService, SessionController, QuickActionService, routes
- Python: pivot.py, compute.py, recode.py, binning.py, conditional.py
- Frontend: DataGrid, RecodeDialog, index.tsx, ag-grid-custom.css
- Database: schema.prisma, migration SQL

Status: Code complete, database migrated, ready for testing
2025-12-08 23:20:55 +08:00
f729699510 feat(dc): Complete Tool C quick action buttons Phase 1-2 - 7 functions
Summary:
- Implement 7 quick action functions (filter, recode, binning, conditional, dropna, compute, pivot)
- Refactor to pre-written Python functions architecture (stable and secure)
- Add 7 Python operations modules with full type hints
- Add 7 frontend Dialog components with user-friendly UI
- Fix NaN serialization issues and auto type conversion
- Update all related documentation

Technical Details:
- Python: operations/ module (filter.py, recode.py, binning.py, conditional.py, dropna.py, compute.py, pivot.py)
- Backend: QuickActionService.ts with 7 execute methods
- Frontend: 7 Dialog components with complete validation
- Toolbar: Enable 7 quick action buttons

Status: Phase 1-2 completed, basic testing passed, ready for further testing
2025-12-08 17:38:08 +08:00