Files
AIclinicalresearch/docs/03-业务模块/DC-数据清洗整理/README.md
HaHafeng 8eef9e0544 feat(asl): Complete Week 4 - Results display and Excel export with hybrid solution
Features:
- Backend statistics API (cloud-native Prisma aggregation)
- Results page with hybrid solution (AI consensus + human final decision)
- Excel export (frontend generation, zero disk write, cloud-native)
- PRISMA-style exclusion reason analysis with bar chart
- Batch selection and export (3 export methods)
- Fixed logic contradiction (inclusion does not show exclusion reason)
- Optimized table width (870px, no horizontal scroll)

Components:
- Backend: screeningController.ts - add getProjectStatistics API
- Frontend: ScreeningResults.tsx - complete results page (hybrid solution)
- Frontend: excelExport.ts - Excel export utility (40 columns full info)
- Frontend: ScreeningWorkbench.tsx - add navigation button
- Utils: get-test-projects.mjs - quick test tool

Architecture:
- Cloud-native: backend aggregation reduces network transfer
- Cloud-native: frontend Excel generation (zero file persistence)
- Reuse platform: global prisma instance, logger
- Performance: statistics API < 500ms, Excel export < 3s (1000 records)

Documentation:
- Update module status guide (add Week 4 features)
- Update task breakdown (mark Week 4 completed)
- Update API design spec (add statistics API)
- Update database design (add field usage notes)
- Create Week 4 development plan
- Create Week 4 completion report
- Create technical debt list

Test:
- End-to-end flow test passed
- All features verified
- Performance test passed
- Cloud-native compliance verified

Ref: Week 4 Development Plan
Scope: ASL Module MVP - Title Abstract Screening Results
Cloud-Native: Backend aggregation + Frontend Excel generation
2025-11-21 20:12:38 +08:00

108 lines
2.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# DC - 数据清洗整理
> **模块代号:** DC (Data Cleaning)
> **开发状态:** ⏳ 规划中
> **商业价值:** ⭐⭐⭐⭐⭐ 可独立售卖
> **独立性:** ⭐⭐⭐⭐⭐
> **优先级:** P1
---
## 📋 模块概述
数据清洗整理模块提供专业工具处理医院导出的海量百万行级、多表格的Excel数据。
**核心价值:** 核心差异化功能,解决医学科研痛点
---
## 🎯 核心功能
### 1. 表格ETL重点
- 多张Excel表格导入
- 按"患者ID"和"时间"自动JOIN
- 重组为干净的分析宽表
### 2. 文本提取NER重点
- 从病理报告提取结构化字段
- 从住院小结提取关键信息
- TNM分期自动识别
### 3. 数据质量报告
- 缺失值统计
- 异常值检测
- 数据质量评分
### 4. 导出标准化数据
- Excel导出
- SPSS格式
- R语言格式
---
## 📂 文档结构
```
DC-数据清洗整理/
├── [AI对接] DC快速上下文.md # ⏳ 待创建
├── 00-项目概述/
│ └── 01-产品需求文档(PRD).md # ⏳ 待创建
├── 01-设计文档/
│ ├── 01-ETL引擎设计.md # ⏳ 待创建
│ └── 02-医学NLP设计.md # ⏳ 待创建
└── README.md # ✅ 当前文档
```
---
## 🔗 依赖的通用能力
- **LLM网关** - 医学NER提取云端版
- **文档处理引擎** - Excel/Docx读取
- **ETL引擎** - 数据清洗和转换
- **医学NLP引擎** - 实体识别(单机版)
---
## 🎯 商业模式
**目标客户:** 临床科室、数据管理员
**售卖方式:** 独立产品
**定价策略:** 按项目数或一次性License
---
## ⚠️ 技术难点
1. **大数据处理** - 百万行数据的内存管理
2. **隐私保护** - 单机版必须100%本地化
3. **NER准确率** - 医学术语复杂
---
**最后更新:** 2025-11-06
**维护人:** 技术架构师