Files
AIclinicalresearch/docs/03-业务模块/DC-数据清洗整理/02-技术设计/技术设计文档:工具 A - 医疗数据超级合并器 (The Super Merger).md
HaHafeng 1b53ab9d52 feat(aia): Complete AIA V2.0 with universal streaming capabilities
Major Changes:
- Add StreamingService with OpenAI Compatible format
- Upgrade Chat component V2 with Ant Design X integration
- Implement AIA module with 12 intelligent agents
- Update API routes to unified /api/v1 prefix
- Update system documentation

Backend (~1300 lines):
- common/streaming: OpenAI Compatible adapter
- modules/aia: 12 agents, conversation service, streaming integration
- Update route versions (RVW, PKB to v1)

Frontend (~3500 lines):
- modules/aia: AgentHub + ChatWorkspace (100% prototype restoration)
- shared/Chat: AIStreamChat, ThinkingBlock, useAIStream Hook
- Update API endpoints to v1

Documentation:
- AIA module status guide
- Universal capabilities catalog
- System overview updates
- All module documentation sync

Tested: Stream response verified, authentication working
Status: AIA V2.0 core completed (85%)
2026-01-14 19:15:01 +08:00

4.8 KiB
Raw Blame History

技术设计文档:工具 A - åŒ»ç—æ•°æ<C2B0>®è¶…级å<C2A7>ˆå¹¶å™?(The Super Merger)

文档类型 Technical Design Document (TDD)
对应 PRD PRD_工具A_超级å<EFBFBD>ˆå¹¶å™¨_V2.md
版本 V2.0 (æž¶æž„å<E2809E>‡çº§ï¼šè®¿è§†åŸºå‡?+ æ—¶é—´çª?
*状� Draft
核心目标 构建一个基äº?Web çš?ETL 工具,解决临床ç§ç ”中“一对多â€<C3A2>æ•°æ<C2B0>®å¯¹é½<C3A9>难题,实现基于时间窗的精准å<E280A0>ˆå¹¶ã€?

1. 总体架构设计 (Architecture Overview)

鉴于处ç<EFBFBD>† Excel æ‡ä»¶ï¼ˆè§£æž<C3A6>ã€<C3A3>å<EFBFBD>ˆå¹¶ã€<C3A3>写入)æ˜?CPU 密é†åžåŒå†…å­˜æ•<C3A6>æ„Ÿåžæ“<C3A6>作,为了é<E280A0>¿å…<C3A5>阻塞 Node.js 主线程,我们采用 *“异步任务队åˆ?+ æµ<C3A6>å¼<C3A5>处ç<E2809E>†â€? 的架构模å¼<C3A5>ã€?

*1.1 系统架构�

graph TD
Client[React å‰<C3A5>端 (Wizard UI)]

subgraph API\_Server \[Fastify API æœ<C3A6>务\]  
    UploadAPI\[上传接å<C2A5>£\]  
    TaskAPI\[任务状æ€<C3A6>接å<C2A5>£\]  
    ConfigAPI\[é…<C3A9>置接å<C2A5>£\]  
end  
  
subgraph Async\_Worker \[å<>Žå<C5BD>°å¤„ç<E2809E>† Worker\]  
    BullMQ\[BullMQ 队列\]  
    Merger\[智能å<C2BD>ˆå¹¶å¼•擎 (Time-Window Joiner)\]  
    ExcelParser\[ExcelJS è§£æž<C3A6>器\]  
    DateEngine\[日期归一化引擎\]  
end  
  
subgraph Storage \[æ•°æ<C2B0>®å­˜å¨\]  
    PG\[(PostgreSQL 业务�\]  
    FileSys\[临时文件存储 (Local/S3)\]  
    Redis\[(Redis 缓存/队列)\]  
end

Client \--1.上传文件--\> UploadAPI  
UploadAPI \--ä¿<C3A4>存临时æ‡ä»¶--\> FileSys  
Client \--2.æ<><C3A6>交基准与时间窗é…<C3A9>ç½®--\> ConfigAPI  
ConfigAPI \--创建任务--\> PG  
ConfigAPI \--推入队列--\> BullMQ  
BullMQ \--消费任务--\> Merger  
Merger \--读å<C2BB>辅表(å…¨é‡<C3A9>)--\> FileSys  
Merger \--读å<C2BB>主表(æµ<C3A6>å¼<C3A5>)--\> FileSys  
Merger \--æµ<C3A6>å¼<C3A5>å<EFBFBD>ˆå¹¶ä¸Žå†™å…?-\> FileSys  
Merger \--更新状�-\> PG  
Client \--3.轮询/WS 进度--\> TaskAPI  
Client \--4.下载结果--\> API\_Server

2. 技术选型 (Tech Stack)

基于现有技术栈的针对性选择�

层级 技术组ä»? 选åžç<EFBFBD>†ç”±
å‰<EFBFBD>端 React 19 + Ant Design 5 利用 AntD çš?Steps, Upload, Tree (树状选择å™? 快速构å»?UIã€?
å<EFBFBD>Žç«¯æ¡†æž¶ Fastify 5.x 高性能 HTTP 框架,é€å<E2809A>ˆé«˜å¹¶å<C2B6>?I/Oã€?
Excel 处ç<E2809E> ExcelJS **核心组件**ã€æ”¯æŒ<C3A6>æµ<C3A6>å¼<C3A5>读å†?(Streaming I/O),这是处ç<E2809E>†å¤§æ•°æ<C2B0>®é‡<C3A9>ä¸<C3A4>崩的关键ã€?
日期处ç<EFBFBD> Day.js + CustomParseFormat 新增ã€å¤„ç<EFBFBD>†â€œæ—¶é—´åœ°ç±â€<EFBFBD>的核心库,需è¦<EFBFBD>æž<EFBFBD>强的容错解æž<EFBFBD>能åŠã€?
任务队列 BullMQ + Redis å¿…é¡»å¼æ­¥å¤„ç<EFBFBD>†ã€å<EFBFBD>ˆå¹¶é€»è¾å¤<EFBFBD>æ<EFBFBD>,耗时较长,必须用队列ã€?
*æ•°æ<EFBFBD>®åº? PostgreSQL 15 + Prisma å­˜å¨ä»»åŠ¡çŠ¶æ€<EFBFBD>ã€<EFBFBD>æ‡ä»¶å…ƒæ•°æ<EFBFBD>®ã€?*ä¸<C3A4>å»ºè®®å°†åŽŸå§ Excel æ•°æ<C2B0>®å­˜å…¥ PG**ã€?
*验è¯<EFBFBD>åº? Zod 用于校验å‰<EFBFBD>端æ<EFBFBD><EFBFBD>交的å¤<EFBFBD>æ<EFBFBD>映射é…<EFBFBD>置结构ã€?

2.1 关键技术决ç­?(ADR): 为什么ä¸<C3A4>ç”?Python (Pandas)?

虽然 Python Pandas 在数æ<C2B0>®å<C2AE>ˆå¹¶ä¸Šä»£ç <C3A7>æ´ç®€æ´<C3A6>,但éˆå¯?*本工å…?*的场景,æˆä»¬å†³å®šå<C5A1>šæŒ<C3A6>使用 Node.js,ç<EFBFBD>†ç”±å¦ä¸ï¼š

  1. *æµ<EFBFBD>å¼<EFBFBD>处ç<EFBFBD>†ä¼˜åŠ¿ï¼? Pandas 倾å<C2BE>于全é‡<C3A9>加载内存,容易 OOMã€Node.js çš?Stream API 天然支æŒ<C3A6>背åŽï¼Œèƒ½ç¨³å®šå¤„ç<E2809E>†â€œæ•°æ<C2B0>®è†¨èƒ€â€<C3A2>问题ã€?
  2. 架构一致性: é<>¿å…<C3A5>引入 Python Runtime 带æ<C2A6>¥çš„è¿<C3A8>ç»´æˆ<C3A6>æœ¬åŒ IPC 开销ã€?
  3. *结论ï¼? 对于精确匹é…<C3A9>åŒé€»è¾æ¸…洗,Node.js 性能足够且æ´å<C2B4>¯æŽ§ã€?

3. æ•°æ<C2B0>®åº“设è®?(Database Schema)

Prisma Schema 定义

// 任务状æ€<C3A6>æžšä¸? enum TaskStatus {
PENDING
PROCESSING
COMPLETED
FAILED
}

// å<>ˆå¹¶ä»»åŠ¡è¡? model MergeTask {
id String @id @default(uuid())
userId String
status TaskStatus @default(PENDING)
progress Int @default(0)

// 核心é…<C3A9>置字段 (V2 æ›´æ–°)
// 结构: {
// anchorFileId: string,
// anchorKeys: { id: "ä½<C3A4>院å<C2A2>?, time: "入院日期" },
// window: { daysBefore: 7, daysAfter: 7 },
// files: [{ id: "f2", timeCol: "报告时间", columns: ["白细�] }]
// }
config Json?

resultUrl String?
report Json? // è´¨é‡<C3A9>æŠ¥åŠ { totalRows: 1000, dropped: 50, matchRate: "95%" }
errorMsg String?
createdAt DateTime @default(now())

files SourceFile[]
}

// æº<C3A6>æ‡ä»¶è¡¨
model SourceFile {
id String @id @default(uuid())
taskId String
task MergeTask @relation(fields: [taskId], references: [id])
filename String
filepath String
headers Json // ["ä½<C3A4>院å<C2A2>?, "å§“å<E2809C><C3A5>", "入院日期"]
rowCount Int
fileSize Int
uploadedAt DateTime @default(now())
}