Files
AIclinicalresearch/docs/03-业务模块/ASL-AI智能文献/02-技术设计/01-数据库设计.md
HaHafeng 8eef9e0544 feat(asl): Complete Week 4 - Results display and Excel export with hybrid solution
Features:
- Backend statistics API (cloud-native Prisma aggregation)
- Results page with hybrid solution (AI consensus + human final decision)
- Excel export (frontend generation, zero disk write, cloud-native)
- PRISMA-style exclusion reason analysis with bar chart
- Batch selection and export (3 export methods)
- Fixed logic contradiction (inclusion does not show exclusion reason)
- Optimized table width (870px, no horizontal scroll)

Components:
- Backend: screeningController.ts - add getProjectStatistics API
- Frontend: ScreeningResults.tsx - complete results page (hybrid solution)
- Frontend: excelExport.ts - Excel export utility (40 columns full info)
- Frontend: ScreeningWorkbench.tsx - add navigation button
- Utils: get-test-projects.mjs - quick test tool

Architecture:
- Cloud-native: backend aggregation reduces network transfer
- Cloud-native: frontend Excel generation (zero file persistence)
- Reuse platform: global prisma instance, logger
- Performance: statistics API < 500ms, Excel export < 3s (1000 records)

Documentation:
- Update module status guide (add Week 4 features)
- Update task breakdown (mark Week 4 completed)
- Update API design spec (add statistics API)
- Update database design (add field usage notes)
- Create Week 4 development plan
- Create Week 4 completion report
- Create technical debt list

Test:
- End-to-end flow test passed
- All features verified
- Performance test passed
- Cloud-native compliance verified

Ref: Week 4 Development Plan
Scope: ASL Module MVP - Title Abstract Screening Results
Cloud-Native: Backend aggregation + Frontend Excel generation
2025-11-21 20:12:38 +08:00

544 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AI智能文献模块 - 数据库设计
> **文档版本:** v2.2
> **创建日期:** 2025-10-29
> **维护者:** AI智能文献开发团队
> **最后更新:** 2025-11-21Week 4完成
> **更新说明:** Week 4统计功能完成混合方案实现排除原因字段说明
---
## 📋 文档说明
本文档描述AI智能文献模块的数据库设计包括数据表结构、关系设计、索引设计等。
**技术栈**:
- 数据库PostgreSQL 16+
- ORMPrisma
- Schema隔离`asl_schema`
- 关联用户表:`platform_schema.users`
---
## 🏗️ Schema架构
ASL模块使用独立的 `asl_schema` 进行数据隔离,确保模块独立性和数据安全。
```
platform_schema
└── users (用户表)
asl_schema
├── screening_projects (筛选项目)
├── literatures (文献条目)
├── screening_results (筛选结果)
└── screening_tasks (筛选任务)
```
---
## 🗄️ 核心数据表
### 1. 筛选项目表 (screening_projects)
**Prisma模型名**: `AslScreeningProject`
**表名**: `asl_schema.screening_projects`
```prisma
model AslScreeningProject {
id String @id @default(uuid())
userId String @map("user_id")
user User @relation("AslProjects", fields: [userId], references: [id], onDelete: Cascade)
projectName String @map("project_name")
// PICO标准
picoCriteria Json @map("pico_criteria")
// ⚠️ 格式兼容性说明:
// 前端使用: { P, I, C, O, S }
// 后端兼容: { P, I, C, O, S } 或 { population, intervention, comparison, outcome, studyDesign }
// screeningService.ts 中有字段映射逻辑
// 筛选标准
inclusionCriteria String @map("inclusion_criteria") @db.Text
exclusionCriteria String @map("exclusion_criteria") @db.Text
// 状态
status String @default("draft")
// 可选值: draft, screening, completed
// 筛选配置
screeningConfig Json? @map("screening_config")
// 结构: { models: ["DeepSeek-V3", "Qwen-Max"], style: "standard" }
// ⚠️ 模型名称映射:
// 前端展示名: DeepSeek-V3 → API名: deepseek-chat
// 前端展示名: Qwen-Max → API名: qwen-max
// screeningService.ts 中有模型名映射逻辑
// 关联
literatures AslLiterature[]
screeningTasks AslScreeningTask[]
screeningResults AslScreeningResult[]
createdAt DateTime @default(now()) @map("created_at")
updatedAt DateTime @updatedAt @map("updated_at")
@@map("screening_projects")
@@schema("asl_schema")
@@index([userId])
@@index([status])
}
```
**SQL表结构**:
```sql
CREATE TABLE asl_schema.screening_projects (
id TEXT PRIMARY KEY,
user_id TEXT NOT NULL,
project_name TEXT NOT NULL,
pico_criteria JSONB NOT NULL,
inclusion_criteria TEXT NOT NULL,
exclusion_criteria TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'draft',
screening_config JSONB,
created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT fk_user FOREIGN KEY (user_id)
REFERENCES platform_schema.users(id) ON DELETE CASCADE
);
CREATE INDEX idx_screening_projects_user_id ON asl_schema.screening_projects(user_id);
CREATE INDEX idx_screening_projects_status ON asl_schema.screening_projects(status);
```
---
### 2. 文献条目表 (literatures)
**Prisma模型名**: `AslLiterature`
**表名**: `asl_schema.literatures`
```prisma
model AslLiterature {
id String @id @default(uuid())
projectId String @map("project_id")
project AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade)
// 文献基本信息
pmid String?
title String @db.Text
abstract String @db.Text
authors String?
journal String?
publicationYear Int? @map("publication_year")
doi String?
// 云原生存储字段V1.0 阶段使用MVP阶段预留
pdfUrl String? @map("pdf_url") // PDF访问URL
pdfOssKey String? @map("pdf_oss_key") // OSS存储Key用于删除
pdfFileSize Int? @map("pdf_file_size") // 文件大小(字节)
// 关联
screeningResults AslScreeningResult[]
createdAt DateTime @default(now()) @map("created_at")
updatedAt DateTime @updatedAt @map("updated_at")
@@map("literatures")
@@schema("asl_schema")
@@index([projectId])
@@index([doi])
@@unique([projectId, pmid]) // 同一项目中PMID唯一
}
```
**SQL表结构**:
```sql
CREATE TABLE asl_schema.literatures (
id TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
pmid TEXT,
title TEXT NOT NULL,
abstract TEXT NOT NULL,
authors TEXT,
journal TEXT,
publication_year INTEGER,
doi TEXT,
pdf_url TEXT,
pdf_oss_key TEXT,
pdf_file_size INTEGER,
created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT fk_project FOREIGN KEY (project_id)
REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE,
CONSTRAINT unique_project_pmid UNIQUE (project_id, pmid)
);
CREATE INDEX idx_literatures_project_id ON asl_schema.literatures(project_id);
CREATE INDEX idx_literatures_doi ON asl_schema.literatures(doi);
```
---
### 3. 筛选结果表 (screening_results)
**Prisma模型名**: `AslScreeningResult`
**表名**: `asl_schema.screening_results`
**设计亮点**支持双模型DeepSeek + Qwen并行验证包含完整的判断、证据和冲突检测。
```prisma
model AslScreeningResult {
id String @id @default(uuid())
projectId String @map("project_id")
project AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade)
literatureId String @map("literature_id")
literature AslLiterature @relation(fields: [literatureId], references: [id], onDelete: Cascade)
// DeepSeek模型判断
dsModelName String @map("ds_model_name") // "deepseek-chat"
dsPJudgment String? @map("ds_p_judgment") // "match" | "partial" | "mismatch"
dsIJudgment String? @map("ds_i_judgment")
dsCJudgment String? @map("ds_c_judgment")
dsSJudgment String? @map("ds_s_judgment")
dsConclusion String? @map("ds_conclusion") // "include" | "exclude" | "uncertain"
dsConfidence Float? @map("ds_confidence") // 0-1
// DeepSeek模型证据
dsPEvidence String? @map("ds_p_evidence") @db.Text
dsIEvidence String? @map("ds_i_evidence") @db.Text
dsCEvidence String? @map("ds_c_evidence") @db.Text
dsSEvidence String? @map("ds_s_evidence") @db.Text
dsReason String? @map("ds_reason") @db.Text
// Qwen模型判断
qwenModelName String @map("qwen_model_name") // "qwen-max"
qwenPJudgment String? @map("qwen_p_judgment")
qwenIJudgment String? @map("qwen_i_judgment")
qwenCJudgment String? @map("qwen_c_judgment")
qwenSJudgment String? @map("qwen_s_judgment")
qwenConclusion String? @map("qwen_conclusion")
qwenConfidence Float? @map("qwen_confidence")
// Qwen模型证据
qwenPEvidence String? @map("qwen_p_evidence") @db.Text
qwenIEvidence String? @map("qwen_i_evidence") @db.Text
qwenCEvidence String? @map("qwen_c_evidence") @db.Text
qwenSEvidence String? @map("qwen_s_evidence") @db.Text
qwenReason String? @map("qwen_reason") @db.Text
// 冲突状态
conflictStatus String @default("none") @map("conflict_status")
// 可选值: none, conflict, resolved
conflictFields Json? @map("conflict_fields")
// 示例: ["P", "I", "conclusion"]
// 最终决策Week 4 混合方案使用)
finalDecision String? @map("final_decision") // "include" | "exclude" | null
// ⭐ Week 4 说明:人工复核后设置此字段,作为最终决策
// - include: 人工决定纳入可能推翻AI建议
// - exclude: 人工决定排除可能推翻AI建议
// - null: 未复核使用AI决策
finalDecisionBy String? @map("final_decision_by") // userId
finalDecisionAt DateTime? @map("final_decision_at")
exclusionReason String? @map("exclusion_reason") @db.Text
// ⭐ Week 4 说明人工填写的排除原因优先级高于AI提取
// - 如果finalDecision=exclude此字段存储人工填写的原因
// - 如果为null前端自动从AI判断中提取dsPJudgment/dsIJudgment等
// - Week 4 初筛结果页使用此字段显示排除原因
// AI处理状态
aiProcessingStatus String @default("pending") @map("ai_processing_status")
// 可选值: pending, processing, completed, failed
aiProcessedAt DateTime? @map("ai_processed_at")
aiErrorMessage String? @map("ai_error_message") @db.Text
// 可追溯信息
promptVersion String @default("v1.0.0") @map("prompt_version")
rawOutput Json? @map("raw_output") // 原始LLM输出备份
createdAt DateTime @default(now()) @map("created_at")
updatedAt DateTime @updatedAt @map("updated_at")
@@map("screening_results")
@@schema("asl_schema")
@@index([projectId])
@@index([literatureId])
@@index([conflictStatus])
@@index([finalDecision])
@@unique([projectId, literatureId]) // 一篇文献在一个项目中只有一个筛选结果
}
```
**SQL表结构**(简化版):
```sql
CREATE TABLE asl_schema.screening_results (
id TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
literature_id TEXT NOT NULL,
-- DeepSeek判断
ds_model_name TEXT NOT NULL,
ds_p_judgment TEXT,
ds_i_judgment TEXT,
ds_c_judgment TEXT,
ds_s_judgment TEXT,
ds_conclusion TEXT,
ds_confidence DOUBLE PRECISION,
ds_p_evidence TEXT,
ds_i_evidence TEXT,
ds_c_evidence TEXT,
ds_s_evidence TEXT,
ds_reason TEXT,
-- Qwen判断
qwen_model_name TEXT NOT NULL,
qwen_p_judgment TEXT,
qwen_i_judgment TEXT,
qwen_c_judgment TEXT,
qwen_s_judgment TEXT,
qwen_conclusion TEXT,
qwen_confidence DOUBLE PRECISION,
qwen_p_evidence TEXT,
qwen_i_evidence TEXT,
qwen_c_evidence TEXT,
qwen_s_evidence TEXT,
qwen_reason TEXT,
-- 冲突状态
conflict_status TEXT NOT NULL DEFAULT 'none',
conflict_fields JSONB,
-- 最终决策
final_decision TEXT,
final_decision_by TEXT,
final_decision_at TIMESTAMP(3),
exclusion_reason TEXT,
-- AI处理状态
ai_processing_status TEXT NOT NULL DEFAULT 'pending',
ai_processed_at TIMESTAMP(3),
ai_error_message TEXT,
-- 可追溯信息
prompt_version TEXT NOT NULL DEFAULT 'v1.0.0',
raw_output JSONB,
created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT fk_project_result FOREIGN KEY (project_id)
REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE,
CONSTRAINT fk_literature FOREIGN KEY (literature_id)
REFERENCES asl_schema.literatures(id) ON DELETE CASCADE,
CONSTRAINT unique_project_literature UNIQUE (project_id, literature_id)
);
CREATE INDEX idx_screening_results_project_id ON asl_schema.screening_results(project_id);
CREATE INDEX idx_screening_results_literature_id ON asl_schema.screening_results(literature_id);
CREATE INDEX idx_screening_results_conflict_status ON asl_schema.screening_results(conflict_status);
CREATE INDEX idx_screening_results_final_decision ON asl_schema.screening_results(final_decision);
```
---
### 4. 筛选任务表 (screening_tasks)
**Prisma模型名**: `AslScreeningTask`
**表名**: `asl_schema.screening_tasks`
```prisma
model AslScreeningTask {
id String @id @default(uuid())
projectId String @map("project_id")
project AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade)
taskType String @map("task_type") // "title_abstract" | "full_text"
status String @default("pending")
// 可选值: pending, running, completed, failed
// 进度统计
totalItems Int @map("total_items")
processedItems Int @default(0) @map("processed_items")
successItems Int @default(0) @map("success_items")
failedItems Int @default(0) @map("failed_items")
conflictItems Int @default(0) @map("conflict_items")
// 时间信息
startedAt DateTime? @map("started_at")
completedAt DateTime? @map("completed_at")
estimatedEndAt DateTime? @map("estimated_end_at")
// 错误信息
errorMessage String? @map("error_message") @db.Text
createdAt DateTime @default(now()) @map("created_at")
updatedAt DateTime @updatedAt @map("updated_at")
@@map("screening_tasks")
@@schema("asl_schema")
@@index([projectId])
@@index([status])
}
```
**SQL表结构**:
```sql
CREATE TABLE asl_schema.screening_tasks (
id TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
task_type TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
total_items INTEGER NOT NULL,
processed_items INTEGER NOT NULL DEFAULT 0,
success_items INTEGER NOT NULL DEFAULT 0,
failed_items INTEGER NOT NULL DEFAULT 0,
conflict_items INTEGER NOT NULL DEFAULT 0,
started_at TIMESTAMP(3),
completed_at TIMESTAMP(3),
estimated_end_at TIMESTAMP(3),
error_message TEXT,
created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT fk_project_task FOREIGN KEY (project_id)
REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE
);
CREATE INDEX idx_screening_tasks_project_id ON asl_schema.screening_tasks(project_id);
CREATE INDEX idx_screening_tasks_status ON asl_schema.screening_tasks(status);
```
---
## 📊 数据关系图
```
platform_schema.users (1)
asl_schema.screening_projects (N)
├─→ literatures (N)
│ └─→ screening_results (1)
├─→ screening_results (N)
└─→ screening_tasks (N)
```
**关系说明**:
- 一个用户可以有多个筛选项目1:N
- 一个项目可以有多个文献1:N
- 一篇文献对应一个筛选结果1:1
- 一个项目可以有多个筛选任务1:N
- 使用级联删除保证数据一致性
---
## 🔍 索引设计汇总
| 表名 | 索引字段 | 索引类型 | 说明 |
|------|---------|---------|------|
| screening_projects | user_id | B-tree | 用户项目查询 |
| screening_projects | status | B-tree | 状态筛选 |
| literatures | project_id | B-tree | 项目文献查询 |
| literatures | doi | B-tree | DOI查重 |
| literatures | (project_id, pmid) | Unique | 防止重复导入 |
| screening_results | project_id | B-tree | 项目结果查询 |
| screening_results | literature_id | B-tree | 文献结果查询 |
| screening_results | conflict_status | B-tree | 冲突筛选 |
| screening_results | final_decision | B-tree | 决策筛选 |
| screening_results | (project_id, literature_id) | Unique | 唯一性约束 |
| screening_tasks | project_id | B-tree | 项目任务查询 |
| screening_tasks | status | B-tree | 任务状态筛选 |
**索引总数**: 12个
**唯一约束**: 3个
---
## 💾 数据字典
### PICO标准 (picoCriteria JSON)
```json
{
"population": "研究人群2型糖尿病成人患者",
"intervention": "干预措施SGLT2抑制剂",
"comparison": "对照,如:安慰剂或常规疗法",
"outcome": "结局指标,如:心血管结局",
"studyDesign": "研究设计,如:随机对照试验 (RCT)"
}
```
### 筛选配置 (screeningConfig JSON)
```json
{
"models": ["deepseek-chat", "qwen-max"],
"temperature": 0,
"maxRetries": 3
}
```
### 冲突字段 (conflictFields JSON)
```json
["P", "I", "C", "S", "conclusion"]
```
### 原始输出 (rawOutput JSON)
```json
{
"deepseek": { "判断": {...}, "证据": {...} },
"qwen": { "判断": {...}, "证据": {...} }
}
```
---
## 🔒 数据安全
### Schema隔离
- 使用 `asl_schema` 与其他模块数据隔离
- 用户表在 `platform_schema`,统一管理
### 级联删除
- 删除用户 → 自动删除所有筛选项目及关联数据
- 删除项目 → 自动删除文献、结果、任务
- 删除文献 → 自动删除筛选结果
### 唯一性约束
- 同一项目中PMID唯一允许无PMID
- 同一项目中一篇文献只有一个筛选结果
---
## 📈 数据量预估
| 项目规模 | 文献数 | 筛选结果 | 存储空间 |
|---------|--------|---------|----------|
| 小型 | 100-500 | 100-500 | < 10 MB |
| 中型 | 500-2000 | 500-2000 | 10-50 MB |
| 大型 | 2000-5000 | 2000-5000 | 50-200 MB |
| 超大型 | 5000+ | 5000+ | 200 MB+ |
**单条记录大小估算**:
- 文献条目:~2-5 KB
- 筛选结果:~5-10 KB含双模型判断和证据
---
## ⏳ 后续规划
### Phase 2 (全文复筛)
- [ ] 添加全文复筛结果表
- [ ] PDF文件元数据表
- [ ] 全文解析结果表
### Phase 3 (数据提取)
- [ ] 数据提取模板表
- [ ] 提取结果表
- [ ] 质量评估表
---
**文档版本:** v2.0
**最后更新:** 2025-11-18
**维护者:** AI智能文献开发团队