Files
AIclinicalresearch/docs/03-业务模块/ASL-AI智能文献/02-技术设计/01-数据库设计.md
HaHafeng 3634933ece refactor(asl): ASL frontend architecture refactoring with left navigation
- feat: Create ASLLayout component with 7-module left navigation
- feat: Implement Title Screening Settings page with optimized PICOS layout
- feat: Add placeholder pages for Workbench and Results
- fix: Fix nested routing structure for React Router v6
- fix: Resolve Spin component warning in MainLayout
- fix: Add QueryClientProvider to App.tsx
- style: Optimize PICOS form layout (P+I left, C+O+S right)
- style: Align Inclusion/Exclusion criteria side-by-side
- docs: Add architecture refactoring and routing fix reports

Ref: Week 2 Frontend Development
Scope: ASL module MVP - Title Abstract Screening
2025-11-18 21:51:51 +08:00

16 KiB
Raw Blame History

AI智能文献模块 - 数据库设计

文档版本: v2.0
创建日期: 2025-10-29
维护者: AI智能文献开发团队
最后更新: 2025-11-18
更新说明: 基于实际实现代码更新,采用 asl_schema 隔离架构


📋 文档说明

本文档描述AI智能文献模块的数据库设计包括数据表结构、关系设计、索引设计等。

技术栈:

  • 数据库PostgreSQL 16+
  • ORMPrisma
  • Schema隔离asl_schema
  • 关联用户表:platform_schema.users

🏗️ Schema架构

ASL模块使用独立的 asl_schema 进行数据隔离,确保模块独立性和数据安全。

platform_schema
  └── users (用户表)
       ↓
asl_schema
  ├── screening_projects (筛选项目)
  ├── literatures (文献条目)
  ├── screening_results (筛选结果)
  └── screening_tasks (筛选任务)

🗄️ 核心数据表

1. 筛选项目表 (screening_projects)

Prisma模型名: AslScreeningProject
表名: asl_schema.screening_projects

model AslScreeningProject {
  id                String   @id @default(uuid())
  userId            String   @map("user_id")
  user              User     @relation("AslProjects", fields: [userId], references: [id], onDelete: Cascade)
  
  projectName       String   @map("project_name")
  
  // PICO标准
  picoCriteria      Json     @map("pico_criteria")
  // 结构: { population, intervention, comparison, outcome, studyDesign }
  
  // 筛选标准
  inclusionCriteria String   @map("inclusion_criteria") @db.Text
  exclusionCriteria String   @map("exclusion_criteria") @db.Text
  
  // 状态
  status            String   @default("draft")
  // 可选值: draft, screening, completed
  
  // 筛选配置
  screeningConfig   Json?    @map("screening_config")
  // 结构: { models: ["deepseek-chat", "qwen-max"], temperature: 0 }
  
  // 关联
  literatures       AslLiterature[]
  screeningTasks    AslScreeningTask[]
  screeningResults  AslScreeningResult[]
  
  createdAt         DateTime @default(now()) @map("created_at")
  updatedAt         DateTime @updatedAt @map("updated_at")
  
  @@map("screening_projects")
  @@schema("asl_schema")
  @@index([userId])
  @@index([status])
}

SQL表结构:

CREATE TABLE asl_schema.screening_projects (
  id TEXT PRIMARY KEY,
  user_id TEXT NOT NULL,
  project_name TEXT NOT NULL,
  pico_criteria JSONB NOT NULL,
  inclusion_criteria TEXT NOT NULL,
  exclusion_criteria TEXT NOT NULL,
  status TEXT NOT NULL DEFAULT 'draft',
  screening_config JSONB,
  created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  CONSTRAINT fk_user FOREIGN KEY (user_id) 
    REFERENCES platform_schema.users(id) ON DELETE CASCADE
);

CREATE INDEX idx_screening_projects_user_id ON asl_schema.screening_projects(user_id);
CREATE INDEX idx_screening_projects_status ON asl_schema.screening_projects(status);

2. 文献条目表 (literatures)

Prisma模型名: AslLiterature
表名: asl_schema.literatures

model AslLiterature {
  id                String   @id @default(uuid())
  projectId         String   @map("project_id")
  project           AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade)
  
  // 文献基本信息
  pmid              String?
  title             String   @db.Text
  abstract          String   @db.Text
  authors           String?
  journal           String?
  publicationYear   Int?     @map("publication_year")
  doi               String?
  
  // 云原生存储字段V1.0 阶段使用MVP阶段预留
  pdfUrl            String?  @map("pdf_url")        // PDF访问URL
  pdfOssKey         String?  @map("pdf_oss_key")    // OSS存储Key用于删除
  pdfFileSize       Int?     @map("pdf_file_size")  // 文件大小(字节)
  
  // 关联
  screeningResults  AslScreeningResult[]
  
  createdAt         DateTime @default(now()) @map("created_at")
  updatedAt         DateTime @updatedAt @map("updated_at")
  
  @@map("literatures")
  @@schema("asl_schema")
  @@index([projectId])
  @@index([doi])
  @@unique([projectId, pmid])  // 同一项目中PMID唯一
}

SQL表结构:

CREATE TABLE asl_schema.literatures (
  id TEXT PRIMARY KEY,
  project_id TEXT NOT NULL,
  pmid TEXT,
  title TEXT NOT NULL,
  abstract TEXT NOT NULL,
  authors TEXT,
  journal TEXT,
  publication_year INTEGER,
  doi TEXT,
  pdf_url TEXT,
  pdf_oss_key TEXT,
  pdf_file_size INTEGER,
  created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  CONSTRAINT fk_project FOREIGN KEY (project_id) 
    REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE,
  CONSTRAINT unique_project_pmid UNIQUE (project_id, pmid)
);

CREATE INDEX idx_literatures_project_id ON asl_schema.literatures(project_id);
CREATE INDEX idx_literatures_doi ON asl_schema.literatures(doi);

3. 筛选结果表 (screening_results)

Prisma模型名: AslScreeningResult
表名: asl_schema.screening_results

设计亮点支持双模型DeepSeek + Qwen并行验证包含完整的判断、证据和冲突检测。

model AslScreeningResult {
  id                String   @id @default(uuid())
  projectId         String   @map("project_id")
  project           AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade)
  literatureId      String   @map("literature_id")
  literature        AslLiterature @relation(fields: [literatureId], references: [id], onDelete: Cascade)
  
  // DeepSeek模型判断
  dsModelName       String   @map("ds_model_name") // "deepseek-chat"
  dsPJudgment       String?  @map("ds_p_judgment") // "match" | "partial" | "mismatch"
  dsIJudgment       String?  @map("ds_i_judgment")
  dsCJudgment       String?  @map("ds_c_judgment")
  dsSJudgment       String?  @map("ds_s_judgment")
  dsConclusion      String?  @map("ds_conclusion") // "include" | "exclude" | "uncertain"
  dsConfidence      Float?   @map("ds_confidence") // 0-1
  
  // DeepSeek模型证据
  dsPEvidence       String?  @map("ds_p_evidence") @db.Text
  dsIEvidence       String?  @map("ds_i_evidence") @db.Text
  dsCEvidence       String?  @map("ds_c_evidence") @db.Text
  dsSEvidence       String?  @map("ds_s_evidence") @db.Text
  dsReason          String?  @map("ds_reason") @db.Text
  
  // Qwen模型判断
  qwenModelName     String   @map("qwen_model_name") // "qwen-max"
  qwenPJudgment     String?  @map("qwen_p_judgment")
  qwenIJudgment     String?  @map("qwen_i_judgment")
  qwenCJudgment     String?  @map("qwen_c_judgment")
  qwenSJudgment     String?  @map("qwen_s_judgment")
  qwenConclusion    String?  @map("qwen_conclusion")
  qwenConfidence    Float?   @map("qwen_confidence")
  
  // Qwen模型证据
  qwenPEvidence     String?  @map("qwen_p_evidence") @db.Text
  qwenIEvidence     String?  @map("qwen_i_evidence") @db.Text
  qwenCEvidence     String?  @map("qwen_c_evidence") @db.Text
  qwenSEvidence     String?  @map("qwen_s_evidence") @db.Text
  qwenReason        String?  @map("qwen_reason") @db.Text
  
  // 冲突状态
  conflictStatus    String   @default("none") @map("conflict_status")
  // 可选值: none, conflict, resolved
  conflictFields    Json?    @map("conflict_fields")
  // 示例: ["P", "I", "conclusion"]
  
  // 最终决策
  finalDecision     String?  @map("final_decision") // "include" | "exclude" | "pending"
  finalDecisionBy   String?  @map("final_decision_by") // userId
  finalDecisionAt   DateTime? @map("final_decision_at")
  exclusionReason   String?  @map("exclusion_reason") @db.Text
  
  // AI处理状态
  aiProcessingStatus String  @default("pending") @map("ai_processing_status")
  // 可选值: pending, processing, completed, failed
  aiProcessedAt     DateTime? @map("ai_processed_at")
  aiErrorMessage    String?  @map("ai_error_message") @db.Text
  
  // 可追溯信息
  promptVersion     String   @default("v1.0.0") @map("prompt_version")
  rawOutput         Json?    @map("raw_output") // 原始LLM输出备份
  
  createdAt         DateTime @default(now()) @map("created_at")
  updatedAt         DateTime @updatedAt @map("updated_at")
  
  @@map("screening_results")
  @@schema("asl_schema")
  @@index([projectId])
  @@index([literatureId])
  @@index([conflictStatus])
  @@index([finalDecision])
  @@unique([projectId, literatureId])  // 一篇文献在一个项目中只有一个筛选结果
}

SQL表结构(简化版):

CREATE TABLE asl_schema.screening_results (
  id TEXT PRIMARY KEY,
  project_id TEXT NOT NULL,
  literature_id TEXT NOT NULL,
  
  -- DeepSeek判断
  ds_model_name TEXT NOT NULL,
  ds_p_judgment TEXT,
  ds_i_judgment TEXT,
  ds_c_judgment TEXT,
  ds_s_judgment TEXT,
  ds_conclusion TEXT,
  ds_confidence DOUBLE PRECISION,
  ds_p_evidence TEXT,
  ds_i_evidence TEXT,
  ds_c_evidence TEXT,
  ds_s_evidence TEXT,
  ds_reason TEXT,
  
  -- Qwen判断
  qwen_model_name TEXT NOT NULL,
  qwen_p_judgment TEXT,
  qwen_i_judgment TEXT,
  qwen_c_judgment TEXT,
  qwen_s_judgment TEXT,
  qwen_conclusion TEXT,
  qwen_confidence DOUBLE PRECISION,
  qwen_p_evidence TEXT,
  qwen_i_evidence TEXT,
  qwen_c_evidence TEXT,
  qwen_s_evidence TEXT,
  qwen_reason TEXT,
  
  -- 冲突状态
  conflict_status TEXT NOT NULL DEFAULT 'none',
  conflict_fields JSONB,
  
  -- 最终决策
  final_decision TEXT,
  final_decision_by TEXT,
  final_decision_at TIMESTAMP(3),
  exclusion_reason TEXT,
  
  -- AI处理状态
  ai_processing_status TEXT NOT NULL DEFAULT 'pending',
  ai_processed_at TIMESTAMP(3),
  ai_error_message TEXT,
  
  -- 可追溯信息
  prompt_version TEXT NOT NULL DEFAULT 'v1.0.0',
  raw_output JSONB,
  
  created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  
  CONSTRAINT fk_project_result FOREIGN KEY (project_id) 
    REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE,
  CONSTRAINT fk_literature FOREIGN KEY (literature_id) 
    REFERENCES asl_schema.literatures(id) ON DELETE CASCADE,
  CONSTRAINT unique_project_literature UNIQUE (project_id, literature_id)
);

CREATE INDEX idx_screening_results_project_id ON asl_schema.screening_results(project_id);
CREATE INDEX idx_screening_results_literature_id ON asl_schema.screening_results(literature_id);
CREATE INDEX idx_screening_results_conflict_status ON asl_schema.screening_results(conflict_status);
CREATE INDEX idx_screening_results_final_decision ON asl_schema.screening_results(final_decision);

4. 筛选任务表 (screening_tasks)

Prisma模型名: AslScreeningTask
表名: asl_schema.screening_tasks

model AslScreeningTask {
  id                String   @id @default(uuid())
  projectId         String   @map("project_id")
  project           AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade)
  
  taskType          String   @map("task_type") // "title_abstract" | "full_text"
  status            String   @default("pending")
  // 可选值: pending, running, completed, failed
  
  // 进度统计
  totalItems        Int      @map("total_items")
  processedItems    Int      @default(0) @map("processed_items")
  successItems      Int      @default(0) @map("success_items")
  failedItems       Int      @default(0) @map("failed_items")
  conflictItems     Int      @default(0) @map("conflict_items")
  
  // 时间信息
  startedAt         DateTime? @map("started_at")
  completedAt       DateTime? @map("completed_at")
  estimatedEndAt    DateTime? @map("estimated_end_at")
  
  // 错误信息
  errorMessage      String?  @map("error_message") @db.Text
  
  createdAt         DateTime @default(now()) @map("created_at")
  updatedAt         DateTime @updatedAt @map("updated_at")
  
  @@map("screening_tasks")
  @@schema("asl_schema")
  @@index([projectId])
  @@index([status])
}

SQL表结构:

CREATE TABLE asl_schema.screening_tasks (
  id TEXT PRIMARY KEY,
  project_id TEXT NOT NULL,
  task_type TEXT NOT NULL,
  status TEXT NOT NULL DEFAULT 'pending',
  total_items INTEGER NOT NULL,
  processed_items INTEGER NOT NULL DEFAULT 0,
  success_items INTEGER NOT NULL DEFAULT 0,
  failed_items INTEGER NOT NULL DEFAULT 0,
  conflict_items INTEGER NOT NULL DEFAULT 0,
  started_at TIMESTAMP(3),
  completed_at TIMESTAMP(3),
  estimated_end_at TIMESTAMP(3),
  error_message TEXT,
  created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  CONSTRAINT fk_project_task FOREIGN KEY (project_id) 
    REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE
);

CREATE INDEX idx_screening_tasks_project_id ON asl_schema.screening_tasks(project_id);
CREATE INDEX idx_screening_tasks_status ON asl_schema.screening_tasks(status);

📊 数据关系图

platform_schema.users (1)
  ↓
asl_schema.screening_projects (N)
  ├─→ literatures (N)
  │     └─→ screening_results (1)
  ├─→ screening_results (N)
  └─→ screening_tasks (N)

关系说明:

  • 一个用户可以有多个筛选项目1:N
  • 一个项目可以有多个文献1:N
  • 一篇文献对应一个筛选结果1:1
  • 一个项目可以有多个筛选任务1:N
  • 使用级联删除保证数据一致性

🔍 索引设计汇总

表名 索引字段 索引类型 说明
screening_projects user_id B-tree 用户项目查询
screening_projects status B-tree 状态筛选
literatures project_id B-tree 项目文献查询
literatures doi B-tree DOI查重
literatures (project_id, pmid) Unique 防止重复导入
screening_results project_id B-tree 项目结果查询
screening_results literature_id B-tree 文献结果查询
screening_results conflict_status B-tree 冲突筛选
screening_results final_decision B-tree 决策筛选
screening_results (project_id, literature_id) Unique 唯一性约束
screening_tasks project_id B-tree 项目任务查询
screening_tasks status B-tree 任务状态筛选

索引总数: 12个
唯一约束: 3个


💾 数据字典

PICO标准 (picoCriteria JSON)

{
  "population": "研究人群2型糖尿病成人患者",
  "intervention": "干预措施SGLT2抑制剂",
  "comparison": "对照,如:安慰剂或常规疗法",
  "outcome": "结局指标,如:心血管结局",
  "studyDesign": "研究设计,如:随机对照试验 (RCT)"
}

筛选配置 (screeningConfig JSON)

{
  "models": ["deepseek-chat", "qwen-max"],
  "temperature": 0,
  "maxRetries": 3
}

冲突字段 (conflictFields JSON)

["P", "I", "C", "S", "conclusion"]

原始输出 (rawOutput JSON)

{
  "deepseek": { "判断": {...}, "证据": {...} },
  "qwen": { "判断": {...}, "证据": {...} }
}

🔒 数据安全

Schema隔离

  • 使用 asl_schema 与其他模块数据隔离
  • 用户表在 platform_schema,统一管理

级联删除

  • 删除用户 → 自动删除所有筛选项目及关联数据
  • 删除项目 → 自动删除文献、结果、任务
  • 删除文献 → 自动删除筛选结果

唯一性约束

  • 同一项目中PMID唯一允许无PMID
  • 同一项目中一篇文献只有一个筛选结果

📈 数据量预估

项目规模 文献数 筛选结果 存储空间
小型 100-500 100-500 < 10 MB
中型 500-2000 500-2000 10-50 MB
大型 2000-5000 2000-5000 50-200 MB
超大型 5000+ 5000+ 200 MB+

单条记录大小估算:

  • 文献条目:~2-5 KB
  • 筛选结果:~5-10 KB含双模型判断和证据

后续规划

Phase 2 (全文复筛)

  • 添加全文复筛结果表
  • PDF文件元数据表
  • 全文解析结果表

Phase 3 (数据提取)

  • 数据提取模板表
  • 提取结果表
  • 质量评估表

文档版本: v2.0
最后更新: 2025-11-18
维护者: AI智能文献开发团队