Files
AIclinicalresearch/backend/prisma/manual-migrations/001_add_postgres_cache_and_checkpoint.sql
HaHafeng 4c6eaaecbf feat(dc): Implement Postgres-Only async architecture and performance optimization
Summary:
- Implement async file upload processing (Platform-Only pattern)
- Add parseExcelWorker with pg-boss queue
- Implement React Query polling mechanism
- Add clean data caching (avoid duplicate parsing)
- Fix pivot single-value column tuple issue
- Optimize performance by 99 percent

Technical Details:

1. Async Architecture (Postgres-Only):
   - SessionService.createSession: Fast upload + push to queue (3s)
   - parseExcelWorker: Background parsing + save clean data (53s)
   - SessionController.getSessionStatus: Status query API for polling
   - React Query Hook: useSessionStatus (auto-serial polling)
   - Frontend progress bar with real-time feedback

2. Performance Optimization:
   - Clean data caching: Worker saves processed data to OSS
   - getPreviewData: Read from clean data cache (0.5s vs 43s, -99 percent)
   - getFullData: Read from clean data cache (0.5s vs 43s, -99 percent)
   - Intelligent cleaning: Boundary detection + ghost column/row removal
   - Safety valve: Max 3000 columns, 5M cells

3. Bug Fixes:
   - Fix pivot column name tuple issue for single value column
   - Fix queue name format (colon to underscore: asl:screening -> asl_screening)
   - Fix polling storm (15+ concurrent requests -> 1 serial request)
   - Fix QUEUE_TYPE environment variable (memory -> pgboss)
   - Fix logger import in PgBossQueue
   - Fix formatSession to return cleanDataKey
   - Fix saveProcessedData to update clean data synchronously

4. Database Changes:
   - ALTER TABLE dc_tool_c_sessions ADD COLUMN clean_data_key VARCHAR(1000)
   - ALTER TABLE dc_tool_c_sessions ALTER COLUMN total_rows DROP NOT NULL
   - ALTER TABLE dc_tool_c_sessions ALTER COLUMN total_cols DROP NOT NULL
   - ALTER TABLE dc_tool_c_sessions ALTER COLUMN columns DROP NOT NULL

5. Documentation:
   - Create Postgres-Only async task processing guide (588 lines)
   - Update Tool C status document (Day 10 summary)
   - Update DC module status document
   - Update system overview document
   - Update cloud-native development guide

Performance Improvements:
- Upload + preview: 96s -> 53.5s (-44 percent)
- Filter operation: 44s -> 2.5s (-94 percent)
- Pivot operation: 45s -> 2.5s (-94 percent)
- Concurrent requests: 15+ -> 1 (-93 percent)
- Complete workflow (upload + 7 ops): 404s -> 70.5s (-83 percent)

Files Changed:
- Backend: 15 files (Worker, Service, Controller, Schema, Config)
- Frontend: 4 files (Hook, Component, API)
- Docs: 4 files (Guide, Status, Overview, Spec)
- Database: 4 column modifications
- Total: ~1388 lines of new/modified code

Status: Fully tested and verified, production ready
2025-12-22 21:30:31 +08:00

81 lines
2.2 KiB
SQL
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
-- ==================== Postgres-Only 改造:手动迁移 ====================
-- 文件: 001_add_postgres_cache_and_checkpoint.sql
-- 目的: 添加缓存表和断点续传字段
-- 日期: 2025-12-13
-- 说明: 避免Prisma migrate的shadow database问题手动添加所需表和字段
-- ==================== 1. 创建缓存表 (AppCache) ====================
CREATE TABLE IF NOT EXISTS platform_schema.app_cache (
id SERIAL PRIMARY KEY,
key VARCHAR(500) UNIQUE NOT NULL,
value JSONB NOT NULL,
expires_at TIMESTAMP NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- 创建索引优化过期查询和key查询
CREATE INDEX IF NOT EXISTS idx_app_cache_expires
ON platform_schema.app_cache(expires_at);
CREATE INDEX IF NOT EXISTS idx_app_cache_key_expires
ON platform_schema.app_cache(key, expires_at);
-- ==================== 2. 为AslScreeningTask添加新字段 ====================
-- 任务拆分支持字段
ALTER TABLE asl_schema.screening_tasks
ADD COLUMN IF NOT EXISTS total_batches INTEGER DEFAULT 1,
ADD COLUMN IF NOT EXISTS processed_batches INTEGER DEFAULT 0,
ADD COLUMN IF NOT EXISTS current_batch_index INTEGER DEFAULT 0;
-- 断点续传支持字段
ALTER TABLE asl_schema.screening_tasks
ADD COLUMN IF NOT EXISTS current_index INTEGER DEFAULT 0,
ADD COLUMN IF NOT EXISTS last_checkpoint TIMESTAMP,
ADD COLUMN IF NOT EXISTS checkpoint_data JSONB;
-- ==================== 3. 验证创建结果 ====================
-- 查看app_cache表结构
SELECT
column_name,
data_type,
is_nullable,
column_default
FROM information_schema.columns
WHERE table_schema = 'platform_schema'
AND table_name = 'app_cache'
ORDER BY ordinal_position;
-- 查看screening_tasks新增字段
SELECT
column_name,
data_type,
is_nullable,
column_default
FROM information_schema.columns
WHERE table_schema = 'asl_schema'
AND table_name = 'screening_tasks'
AND column_name IN (
'total_batches', 'processed_batches', 'current_batch_index',
'current_index', 'last_checkpoint', 'checkpoint_data'
)
ORDER BY ordinal_position;
-- ==================== 完成 ====================
-- ✅ 缓存表已创建
-- ✅ 任务拆分字段已添加
-- ✅ 断点续传字段已添加