Files
AIclinicalresearch/docs/03-业务模块/DC-数据清洗整理/02-技术设计/技术设计文档:工具 A - 医疗数据超级合并器 (The Super Merger).md
HaHafeng 1b53ab9d52 feat(aia): Complete AIA V2.0 with universal streaming capabilities
Major Changes:
- Add StreamingService with OpenAI Compatible format
- Upgrade Chat component V2 with Ant Design X integration
- Implement AIA module with 12 intelligent agents
- Update API routes to unified /api/v1 prefix
- Update system documentation

Backend (~1300 lines):
- common/streaming: OpenAI Compatible adapter
- modules/aia: 12 agents, conversation service, streaming integration
- Update route versions (RVW, PKB to v1)

Frontend (~3500 lines):
- modules/aia: AgentHub + ChatWorkspace (100% prototype restoration)
- shared/Chat: AIStreamChat, ThinkingBlock, useAIStream Hook
- Update API endpoints to v1

Documentation:
- AIA module status guide
- Universal capabilities catalog
- System overview updates
- All module documentation sync

Tested: Stream response verified, authentication working
Status: AIA V2.0 core completed (85%)
2026-01-14 19:15:01 +08:00

118 lines
4.8 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# **技术设计文档:工具 A \- åŒ»ç—æ•°æ<C2B0>®è¶…级å<C2A7>ˆå¹¶å™?(The Super Merger)**
| 文档类型 | Technical Design Document (TDD) |
| :---- | :---- |
| **对应 PRD** | **PRD\_工具A\_超级å<C2A7>ˆå¹¶å™¨\_V2.md** |
| **版本** | **V2.0** (æž¶æž„å<E2809E>‡çº§ï¼šè®¿è§†åŸºå‡?\+ æ—¶é—´çª? |
| **状�* | Draft |
| **核心目标** | 构建一个基äº?Web çš?ETL 工具,解决临床ç§ç ”中“一对多â€<C3A2>æ•°æ<C2B0>®å¯¹é½<C3A9>难题,实现基于时间窗的精准å<E280A0>ˆå¹¶ã€?|
## **1\. 总体架构设计 (Architecture Overview)**
鉴于处ç<EFBFBD>† Excel æ‡ä»¶ï¼ˆè§£æž<C3A6>ã€<C3A3>å<EFBFBD>ˆå¹¶ã€<C3A3>写入)æ˜?CPU 密é†åžåŒå†…å­˜æ•<C3A6>æ„Ÿåžæ“<C3A6>作,为了é<E280A0>¿å…<C3A5>阻塞 Node.js 主线程,我们采用 **“异步任务队åˆ?\+ æµ<C3A6>å¼<C3A5>处ç<E2809E>†â€?* 的架构模å¼<C3A5>ã€?
### **1.1 系统架构�*
graph TD
Client\[React å‰<C3A5>端 (Wizard UI)\]
subgraph API\_Server \[Fastify API æœ<C3A6>务\]
UploadAPI\[上传接å<C2A5>£\]
TaskAPI\[任务状æ€<C3A6>接å<C2A5>£\]
ConfigAPI\[é…<C3A9>置接å<C2A5>£\]
end
subgraph Async\_Worker \[å<>Žå<C5BD>°å¤„ç<E2809E>† Worker\]
BullMQ\[BullMQ 队列\]
Merger\[智能å<C2BD>ˆå¹¶å¼•擎 (Time-Window Joiner)\]
ExcelParser\[ExcelJS è§£æž<C3A6>器\]
DateEngine\[日期归一化引擎\]
end
subgraph Storage \[æ•°æ<C2B0>®å­˜å¨\]
PG\[(PostgreSQL 业务�\]
FileSys\[临时文件存储 (Local/S3)\]
Redis\[(Redis 缓存/队列)\]
end
Client \--1.上传文件--\> UploadAPI
UploadAPI \--ä¿<C3A4>存临时æ‡ä»¶--\> FileSys
Client \--2.æ<><C3A6>交基准与时间窗é…<C3A9>ç½®--\> ConfigAPI
ConfigAPI \--创建任务--\> PG
ConfigAPI \--推入队列--\> BullMQ
BullMQ \--消费任务--\> Merger
Merger \--读å<C2BB>辅表(å…¨é‡<C3A9>)--\> FileSys
Merger \--读å<C2BB>主表(æµ<C3A6>å¼<C3A5>)--\> FileSys
Merger \--æµ<C3A6>å¼<C3A5>å<EFBFBD>ˆå¹¶ä¸Žå†™å…?-\> FileSys
Merger \--更新状�-\> PG
Client \--3.轮询/WS 进度--\> TaskAPI
Client \--4.下载结果--\> API\_Server
## **2\. 技术选型 (Tech Stack)**
基于现有技术栈的针对性选择�
| 层级 | 技术组ä»?| 选åžç<E280B9>†ç”± |
| :---- | :---- | :---- |
| **å‰<C3A5>端** | **React 19 \+ Ant Design 5** | 利用 AntD çš?Steps, Upload, Tree (树状选择å™? 快速构å»?UIã€?|
| **å<>Žç«¯æ¡†æž¶** | **Fastify 5.x** | 高性能 HTTP 框架,é€å<E2809A>ˆé«˜å¹¶å<C2B6>?I/Oã€?|
| **Excel 处ç<E2809E>** | **ExcelJS** | **核心组件**ã€æ”¯æŒ<C3A6>æµ<C3A6>å¼<C3A5>读å†?(Streaming I/O),这是处ç<E2809E>†å¤§æ•°æ<C2B0>®é‡<C3A9>ä¸<C3A4>崩的关键ã€?|
| **日期处ç<E2809E>** | **Day.js \+ CustomParseFormat** | **新增**ã€å¤„ç<E2809E>†â€œæ—¶é—´åœ°ç±â€<C3A2>的核心库,需è¦<C3A8>æž<C3A6>强的容错解æž<C3A6>能åŠã€?|
| **任务队列** | **BullMQ \+ Redis** | å¿…é¡»å¼æ­¥å¤„ç<E2809E>†ã€å<E2809A>ˆå¹¶é€»è¾å¤<C3A5>æ<EFBFBD>,耗时较长,必须用队列ã€?|
| **æ•°æ<C2B0>®åº?* | **PostgreSQL 15 \+ Prisma** | å­˜å¨ä»»åŠ¡çŠ¶æ€<C3A6>ã€<C3A3>æ‡ä»¶å…ƒæ•°æ<C2B0>®ã€?*ä¸<C3A4>å»ºè®®å°†åŽŸå§ Excel æ•°æ<C2B0>®å­˜å…¥ PG**ã€?|
| **验è¯<C3A8>åº?* | **Zod** | 用于校验å‰<C3A5>端æ<C2AF><C3A6>交的å¤<C3A5>æ<EFBFBD>映射é…<C3A9>置结构ã€?|
### **2.1 关键技术决ç­?(ADR): 为什么ä¸<C3A4>ç”?Python (Pandas)?**
虽然 Python Pandas 在数æ<C2B0>®å<C2AE>ˆå¹¶ä¸Šä»£ç <C3A7>æ´ç®€æ´<C3A6>,但éˆå¯?*本工å…?*的场景,æˆä»¬å†³å®šå<C5A1>šæŒ<C3A6>使用 **Node.js**,ç<C592>†ç”±å¦ä¸ï¼š
1. **æµ<C3A6>å¼<C3A5>处ç<E2809E>†ä¼˜åŠ¿ï¼?* Pandas 倾å<C2BE>于全é‡<C3A9>加载内存,容易 OOMã€Node.js çš?Stream API 天然支æŒ<C3A6>背åŽï¼Œèƒ½ç¨³å®šå¤„ç<E2809E>†â€œæ•°æ<C2B0>®è†¨èƒ€â€<C3A2>问题ã€?
2. **架构一致性:** é<>¿å…<C3A5>引入 Python Runtime 带æ<C2A6>¥çš„è¿<C3A8>ç»´æˆ<C3A6>æœ¬åŒ IPC 开销ã€?
3. **结论ï¼?* 对于精确匹é…<C3A9>åŒé€»è¾æ¸…洗,Node.js 性能足够且æ´å<C2B4>¯æŽ§ã€?
## **3\. æ•°æ<C2B0>®åº“设è®?(Database Schema)**
### **Prisma Schema 定义**
// 任务状æ€<C3A6>æžšä¸?
enum TaskStatus {
PENDING
PROCESSING
COMPLETED
FAILED
}
// å<>ˆå¹¶ä»»åŠ¡è¡?
model MergeTask {
id String @id @default(uuid())
userId String
status TaskStatus @default(PENDING)
progress Int @default(0)
// 核心é…<C3A9>置字段 (V2 æ›´æ–°)
// 结构: {
// anchorFileId: string,
// anchorKeys: { id: "ä½<C3A4>院å<C2A2>?, time: "入院日期" },
// window: { daysBefore: 7, daysAfter: 7 },
// files: \[{ id: "f2", timeCol: "报告时间", columns: \["白细�\] }\]
// }
config Json?
resultUrl String?
report Json? // è´¨é‡<C3A9>æŠ¥åŠ { totalRows: 1000, dropped: 50, matchRate: "95%" }
errorMsg String?
createdAt DateTime @default(now())
files SourceFile\[\]
}
// æº<C3A6>æ‡ä»¶è¡¨
model SourceFile {
id String @id @default(uuid())
taskId String
task MergeTask @relation(fields: \[taskId\], references: \[id\])
filename String
filepath String
headers Json // \["ä½<C3A4>院å<C2A2>?, "å§“å<E2809C><C3A5>", "入院日期"\]
rowCount Int
fileSize Int
uploadedAt DateTime @default(now())
}