Files
AIclinicalresearch/tests/run_tests.bat
HaHafeng 4c6eaaecbf feat(dc): Implement Postgres-Only async architecture and performance optimization
Summary:
- Implement async file upload processing (Platform-Only pattern)
- Add parseExcelWorker with pg-boss queue
- Implement React Query polling mechanism
- Add clean data caching (avoid duplicate parsing)
- Fix pivot single-value column tuple issue
- Optimize performance by 99 percent

Technical Details:

1. Async Architecture (Postgres-Only):
   - SessionService.createSession: Fast upload + push to queue (3s)
   - parseExcelWorker: Background parsing + save clean data (53s)
   - SessionController.getSessionStatus: Status query API for polling
   - React Query Hook: useSessionStatus (auto-serial polling)
   - Frontend progress bar with real-time feedback

2. Performance Optimization:
   - Clean data caching: Worker saves processed data to OSS
   - getPreviewData: Read from clean data cache (0.5s vs 43s, -99 percent)
   - getFullData: Read from clean data cache (0.5s vs 43s, -99 percent)
   - Intelligent cleaning: Boundary detection + ghost column/row removal
   - Safety valve: Max 3000 columns, 5M cells

3. Bug Fixes:
   - Fix pivot column name tuple issue for single value column
   - Fix queue name format (colon to underscore: asl:screening -> asl_screening)
   - Fix polling storm (15+ concurrent requests -> 1 serial request)
   - Fix QUEUE_TYPE environment variable (memory -> pgboss)
   - Fix logger import in PgBossQueue
   - Fix formatSession to return cleanDataKey
   - Fix saveProcessedData to update clean data synchronously

4. Database Changes:
   - ALTER TABLE dc_tool_c_sessions ADD COLUMN clean_data_key VARCHAR(1000)
   - ALTER TABLE dc_tool_c_sessions ALTER COLUMN total_rows DROP NOT NULL
   - ALTER TABLE dc_tool_c_sessions ALTER COLUMN total_cols DROP NOT NULL
   - ALTER TABLE dc_tool_c_sessions ALTER COLUMN columns DROP NOT NULL

5. Documentation:
   - Create Postgres-Only async task processing guide (588 lines)
   - Update Tool C status document (Day 10 summary)
   - Update DC module status document
   - Update system overview document
   - Update cloud-native development guide

Performance Improvements:
- Upload + preview: 96s -> 53.5s (-44 percent)
- Filter operation: 44s -> 2.5s (-94 percent)
- Pivot operation: 45s -> 2.5s (-94 percent)
- Concurrent requests: 15+ -> 1 (-93 percent)
- Complete workflow (upload + 7 ops): 404s -> 70.5s (-83 percent)

Files Changed:
- Backend: 15 files (Worker, Service, Controller, Schema, Config)
- Frontend: 4 files (Hook, Component, API)
- Docs: 4 files (Guide, Status, Overview, Spec)
- Database: 4 column modifications
- Total: ~1388 lines of new/modified code

Status: Fully tested and verified, production ready
2025-12-22 21:30:31 +08:00

64 lines
1.1 KiB
Batchfile
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
@echo off
REM Windows批处理脚本 - 运行缺失值处理功能测试
echo ========================================
echo 缺失值处理功能 - 自动化测试
echo ========================================
echo.
REM 检查Python是否安装
python --version >nul 2>&1
if %errorlevel% neq 0 (
echo [错误] Python未安装或不在PATH中
pause
exit /b 1
)
echo [1/3] 检查Python服务状态...
curl -s http://localhost:8001/health >nul 2>&1
if %errorlevel% neq 0 (
echo [警告] Python服务未运行请先启动服务
echo cd extraction_service
echo python main.py
echo.
pause
exit /b 1
)
echo [OK] Python服务运行正常
echo.
echo [2/3] 检查依赖...
python -c "import pandas, numpy, requests" >nul 2>&1
if %errorlevel% neq 0 (
echo [警告] 缺少依赖,正在安装...
pip install pandas numpy requests
)
echo [OK] 依赖检查完成
echo.
echo [3/3] 运行测试...
echo.
python test_fillna_operations.py
echo.
echo ========================================
echo 测试完成
echo ========================================
pause