Files
AIclinicalresearch/docs/03-业务模块/ASL-AI智能文献/02-技术设计/01-数据库设计.md
HaHafeng 88cc049fb3 feat(asl): Complete Day 5 - Fulltext Screening Backend API Development
- Implement 5 core API endpoints (create task, get progress, get results, update decision, export Excel)
- Add FulltextScreeningController with Zod validation (652 lines)
- Implement ExcelExporter service with 4-sheet report generation (352 lines)
- Register routes under /api/v1/asl/fulltext-screening
- Create 31 REST Client test cases
- Add automated integration test script
- Fix PDF extraction fallback mechanism in LLM12FieldsService
- Update API design documentation to v3.0
- Update development plan to v1.2
- Create Day 5 development record
- Clean up temporary test files
2025-11-23 10:52:07 +08:00

36 KiB
Raw Blame History

AI智能文献模块 - 数据库设计

文档版本: v3.0
创建日期: 2025-10-29
维护者: AI智能文献开发团队
最后更新: 2025-11-22Day 4全文复筛数据库设计
更新说明: 新增全文复筛相关表(AslLiterature扩展、AslFulltextScreeningTaskAslFulltextScreeningResult


📋 文档说明

本文档描述AI智能文献模块的数据库设计包括数据表结构、关系设计、索引设计等。

技术栈:

  • 数据库PostgreSQL 16+
  • ORMPrisma
  • Schema隔离asl_schema
  • 关联用户表:platform_schema.users

🏗️ Schema架构

ASL模块使用独立的 asl_schema 进行数据隔离,确保模块独立性和数据安全。

platform_schema
  └── users (用户表)
       ↓
asl_schema
  ├── screening_projects (筛选项目)
  ├── literatures (文献条目)
  ├── screening_results (标题初筛结果)
  ├── screening_tasks (标题初筛任务)
  ├── fulltext_screening_tasks (全文复筛任务) ⭐ Day 4新增
  └── fulltext_screening_results (全文复筛结果) ⭐ Day 4新增

v3.0 更新说明2025-11-22

  • 扩展 literatures支持全文生命周期管理、PDF存储、全文内容引用
  • 新增 fulltext_screening_tasks 表:管理全文复筛批处理任务
  • 新增 fulltext_screening_results存储12字段评估结果
  • 符合云原生规范:全文内容存储引用而非直接存储

🗄️ 核心数据表

1. 筛选项目表 (screening_projects)

Prisma模型名: AslScreeningProject
表名: asl_schema.screening_projects

model AslScreeningProject {
  id                String   @id @default(uuid())
  userId            String   @map("user_id")
  user              User     @relation("AslProjects", fields: [userId], references: [id], onDelete: Cascade)
  
  projectName       String   @map("project_name")
  
  // PICO标准
  picoCriteria      Json     @map("pico_criteria")
  // ⚠️ 格式兼容性说明:
  // 前端使用: { P, I, C, O, S }
  // 后端兼容: { P, I, C, O, S } 或 { population, intervention, comparison, outcome, studyDesign }
  // screeningService.ts 中有字段映射逻辑
  
  // 筛选标准
  inclusionCriteria String   @map("inclusion_criteria") @db.Text
  exclusionCriteria String   @map("exclusion_criteria") @db.Text
  
  // 状态
  status            String   @default("draft")
  // 可选值: draft, screening, completed
  
  // 筛选配置
  screeningConfig   Json?    @map("screening_config")
  // 结构: { models: ["DeepSeek-V3", "Qwen-Max"], style: "standard" }
  // ⚠️ 模型名称映射:
  // 前端展示名: DeepSeek-V3 → API名: deepseek-chat
  // 前端展示名: Qwen-Max → API名: qwen-max
  // screeningService.ts 中有模型名映射逻辑
  
  // 关联
  literatures       AslLiterature[]
  screeningTasks    AslScreeningTask[]
  screeningResults  AslScreeningResult[]
  
  createdAt         DateTime @default(now()) @map("created_at")
  updatedAt         DateTime @updatedAt @map("updated_at")
  
  @@map("screening_projects")
  @@schema("asl_schema")
  @@index([userId])
  @@index([status])
}

SQL表结构:

CREATE TABLE asl_schema.screening_projects (
  id TEXT PRIMARY KEY,
  user_id TEXT NOT NULL,
  project_name TEXT NOT NULL,
  pico_criteria JSONB NOT NULL,
  inclusion_criteria TEXT NOT NULL,
  exclusion_criteria TEXT NOT NULL,
  status TEXT NOT NULL DEFAULT 'draft',
  screening_config JSONB,
  created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  CONSTRAINT fk_user FOREIGN KEY (user_id) 
    REFERENCES platform_schema.users(id) ON DELETE CASCADE
);

CREATE INDEX idx_screening_projects_user_id ON asl_schema.screening_projects(user_id);
CREATE INDEX idx_screening_projects_status ON asl_schema.screening_projects(status);

2. 文献条目表 (literatures) v3.0更新

Prisma模型名: AslLiterature
表名: asl_schema.literatures

v3.0 更新说明

  • 新增 stage 字段追踪文献生命周期imported → title_screened → pdf_acquired → fulltext_screened → data_extracted
  • 新增 PDF存储字段支持Dify/OSS双适配pdfStorageType, pdfStorageRef, pdfStatus
  • 新增 全文存储字段:符合云原生规范,存储引用而非内容fullTextStorageRef, fullTextUrl
  • 新增索引:stage, hasPdf, pdfStatus 提升查询性能
model AslLiterature {
  id                String   @id @default(uuid())
  projectId         String   @map("project_id")
  project           AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade)
  
  // 文献基本信息
  pmid              String?
  title             String   @db.Text
  abstract          String   @db.Text
  authors           String?
  journal           String?
  publicationYear   Int?     @map("publication_year")
  doi               String?
  
  // ⭐ v3.0 新增:文献阶段(生命周期管理)
  stage             String   @default("imported") @map("stage")
  // imported | title_screened | title_included | pdf_acquired | fulltext_screened | data_extracted
  
  // 云原生存储字段V1.0 阶段使用MVP阶段预留
  pdfUrl            String?  @map("pdf_url")        // PDF访问URL
  pdfOssKey         String?  @map("pdf_oss_key")    // OSS存储Key用于删除
  pdfFileSize       Int?     @map("pdf_file_size")  // 文件大小(字节)
  
  // ⭐ v3.0 新增PDF存储Dify/OSS双适配
  hasPdf            Boolean  @default(false) @map("has_pdf")
  pdfStorageType    String?  @map("pdf_storage_type")  // "dify" | "oss"
  pdfStorageRef     String?  @map("pdf_storage_ref")   // Dify: document_id, OSS: object_key
  pdfStatus         String?  @map("pdf_status")        // "uploading" | "ready" | "failed"
  pdfUploadedAt     DateTime? @map("pdf_uploaded_at")
  
  // ⭐ v3.0 新增:全文内容存储(云原生:存储引用而非内容)
  fullTextStorageType String?  @map("full_text_storage_type")  // "dify" | "oss"
  fullTextStorageRef  String?  @map("full_text_storage_ref")   // document_id 或 object_key
  fullTextUrl         String?  @map("full_text_url")           // 访问URL
  fullTextFormat      String?  @map("full_text_format")        // "markdown" | "plaintext"
  fullTextSource      String?  @map("full_text_source")        // "nougat" | "pymupdf"
  fullTextTokenCount  Int?     @map("full_text_token_count")
  fullTextExtractedAt DateTime? @map("full_text_extracted_at")
  
  // 关联
  screeningResults  AslScreeningResult[]
  fulltextScreeningResults AslFulltextScreeningResult[] // ⭐ v3.0 新增
  
  createdAt         DateTime @default(now()) @map("created_at")
  updatedAt         DateTime @updatedAt @map("updated_at")
  
  @@map("literatures")
  @@schema("asl_schema")
  @@index([projectId])
  @@index([doi])
  @@index([stage])          // ⭐ v3.0 新增
  @@index([hasPdf])         // ⭐ v3.0 新增
  @@index([pdfStatus])      // ⭐ v3.0 新增
  @@unique([projectId, pmid])
}

SQL表结构v3.0:

CREATE TABLE asl_schema.literatures (
  id TEXT PRIMARY KEY,
  project_id TEXT NOT NULL,
  
  -- 文献基本信息
  pmid TEXT,
  title TEXT NOT NULL,
  abstract TEXT NOT NULL,
  authors TEXT,
  journal TEXT,
  publication_year INTEGER,
  doi TEXT,
  
  -- 文献阶段
  stage TEXT NOT NULL DEFAULT 'imported',
  
  -- PDF存储旧字段V1.0预留)
  pdf_url TEXT,
  pdf_oss_key TEXT,
  pdf_file_size INTEGER,
  
  -- PDF存储新字段Dify/OSS双适配
  has_pdf BOOLEAN NOT NULL DEFAULT false,
  pdf_storage_type TEXT,
  pdf_storage_ref TEXT,
  pdf_status TEXT,
  pdf_uploaded_at TIMESTAMP(3),
  
  -- 全文内容存储(引用)
  full_text_storage_type TEXT,
  full_text_storage_ref TEXT,
  full_text_url TEXT,
  full_text_format TEXT,
  full_text_source TEXT,
  full_text_token_count INTEGER,
  full_text_extracted_at TIMESTAMP(3),
  
  created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  
  CONSTRAINT fk_project FOREIGN KEY (project_id) 
    REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE,
  CONSTRAINT unique_project_pmid UNIQUE (project_id, pmid)
);

CREATE INDEX idx_literatures_project_id ON asl_schema.literatures(project_id);
CREATE INDEX idx_literatures_doi ON asl_schema.literatures(doi);
CREATE INDEX idx_literatures_stage ON asl_schema.literatures(stage);
CREATE INDEX idx_literatures_has_pdf ON asl_schema.literatures(has_pdf);
CREATE INDEX idx_literatures_pdf_status ON asl_schema.literatures(pdf_status);

字段说明

字段 类型 说明 设计理由
stage String 文献阶段 追踪文献在整个流程中的位置
pdfStorageType String PDF存储类型 "dify"|"oss",支持双适配器
pdfStorageRef String PDF存储引用 Dify的document_id或OSS的object_key
fullTextStorageType String 全文存储类型 云原生:不直接存全文,存引用
fullTextStorageRef String 全文存储引用 指向Dify或OSS中的全文文档
fullTextUrl String 全文访问URL 直接访问全文的URL
fullTextTokenCount Int Token数量 用于成本估算和LLM调用优化

云原生设计亮点

  • 全文内容存储在OSS/Dify数据库只存引用符合云原生规范
  • 支持Dify → OSS无缝迁移只需切换storageType
  • 数据库轻量避免大量TEXT字段

3. 筛选结果表 (screening_results)

Prisma模型名: AslScreeningResult
表名: asl_schema.screening_results

设计亮点支持双模型DeepSeek + Qwen并行验证包含完整的判断、证据和冲突检测。

model AslScreeningResult {
  id                String   @id @default(uuid())
  projectId         String   @map("project_id")
  project           AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade)
  literatureId      String   @map("literature_id")
  literature        AslLiterature @relation(fields: [literatureId], references: [id], onDelete: Cascade)
  
  // DeepSeek模型判断
  dsModelName       String   @map("ds_model_name") // "deepseek-chat"
  dsPJudgment       String?  @map("ds_p_judgment") // "match" | "partial" | "mismatch"
  dsIJudgment       String?  @map("ds_i_judgment")
  dsCJudgment       String?  @map("ds_c_judgment")
  dsSJudgment       String?  @map("ds_s_judgment")
  dsConclusion      String?  @map("ds_conclusion") // "include" | "exclude" | "uncertain"
  dsConfidence      Float?   @map("ds_confidence") // 0-1
  
  // DeepSeek模型证据
  dsPEvidence       String?  @map("ds_p_evidence") @db.Text
  dsIEvidence       String?  @map("ds_i_evidence") @db.Text
  dsCEvidence       String?  @map("ds_c_evidence") @db.Text
  dsSEvidence       String?  @map("ds_s_evidence") @db.Text
  dsReason          String?  @map("ds_reason") @db.Text
  
  // Qwen模型判断
  qwenModelName     String   @map("qwen_model_name") // "qwen-max"
  qwenPJudgment     String?  @map("qwen_p_judgment")
  qwenIJudgment     String?  @map("qwen_i_judgment")
  qwenCJudgment     String?  @map("qwen_c_judgment")
  qwenSJudgment     String?  @map("qwen_s_judgment")
  qwenConclusion    String?  @map("qwen_conclusion")
  qwenConfidence    Float?   @map("qwen_confidence")
  
  // Qwen模型证据
  qwenPEvidence     String?  @map("qwen_p_evidence") @db.Text
  qwenIEvidence     String?  @map("qwen_i_evidence") @db.Text
  qwenCEvidence     String?  @map("qwen_c_evidence") @db.Text
  qwenSEvidence     String?  @map("qwen_s_evidence") @db.Text
  qwenReason        String?  @map("qwen_reason") @db.Text
  
  // 冲突状态
  conflictStatus    String   @default("none") @map("conflict_status")
  // 可选值: none, conflict, resolved
  conflictFields    Json?    @map("conflict_fields")
  // 示例: ["P", "I", "conclusion"]
  
  // 最终决策Week 4 混合方案使用)
  finalDecision     String?  @map("final_decision") // "include" | "exclude" | null
  // ⭐ Week 4 说明:人工复核后设置此字段,作为最终决策
  // - include: 人工决定纳入可能推翻AI建议
  // - exclude: 人工决定排除可能推翻AI建议
  // - null: 未复核使用AI决策
  
  finalDecisionBy   String?  @map("final_decision_by") // userId
  finalDecisionAt   DateTime? @map("final_decision_at")
  
  exclusionReason   String?  @map("exclusion_reason") @db.Text
  // ⭐ Week 4 说明人工填写的排除原因优先级高于AI提取
  // - 如果finalDecision=exclude此字段存储人工填写的原因
  // - 如果为null前端自动从AI判断中提取dsPJudgment/dsIJudgment等
  // - Week 4 初筛结果页使用此字段显示排除原因
  
  // AI处理状态
  aiProcessingStatus String  @default("pending") @map("ai_processing_status")
  // 可选值: pending, processing, completed, failed
  aiProcessedAt     DateTime? @map("ai_processed_at")
  aiErrorMessage    String?  @map("ai_error_message") @db.Text
  
  // 可追溯信息
  promptVersion     String   @default("v1.0.0") @map("prompt_version")
  rawOutput         Json?    @map("raw_output") // 原始LLM输出备份
  
  createdAt         DateTime @default(now()) @map("created_at")
  updatedAt         DateTime @updatedAt @map("updated_at")
  
  @@map("screening_results")
  @@schema("asl_schema")
  @@index([projectId])
  @@index([literatureId])
  @@index([conflictStatus])
  @@index([finalDecision])
  @@unique([projectId, literatureId])  // 一篇文献在一个项目中只有一个筛选结果
}

SQL表结构(简化版):

CREATE TABLE asl_schema.screening_results (
  id TEXT PRIMARY KEY,
  project_id TEXT NOT NULL,
  literature_id TEXT NOT NULL,
  
  -- DeepSeek判断
  ds_model_name TEXT NOT NULL,
  ds_p_judgment TEXT,
  ds_i_judgment TEXT,
  ds_c_judgment TEXT,
  ds_s_judgment TEXT,
  ds_conclusion TEXT,
  ds_confidence DOUBLE PRECISION,
  ds_p_evidence TEXT,
  ds_i_evidence TEXT,
  ds_c_evidence TEXT,
  ds_s_evidence TEXT,
  ds_reason TEXT,
  
  -- Qwen判断
  qwen_model_name TEXT NOT NULL,
  qwen_p_judgment TEXT,
  qwen_i_judgment TEXT,
  qwen_c_judgment TEXT,
  qwen_s_judgment TEXT,
  qwen_conclusion TEXT,
  qwen_confidence DOUBLE PRECISION,
  qwen_p_evidence TEXT,
  qwen_i_evidence TEXT,
  qwen_c_evidence TEXT,
  qwen_s_evidence TEXT,
  qwen_reason TEXT,
  
  -- 冲突状态
  conflict_status TEXT NOT NULL DEFAULT 'none',
  conflict_fields JSONB,
  
  -- 最终决策
  final_decision TEXT,
  final_decision_by TEXT,
  final_decision_at TIMESTAMP(3),
  exclusion_reason TEXT,
  
  -- AI处理状态
  ai_processing_status TEXT NOT NULL DEFAULT 'pending',
  ai_processed_at TIMESTAMP(3),
  ai_error_message TEXT,
  
  -- 可追溯信息
  prompt_version TEXT NOT NULL DEFAULT 'v1.0.0',
  raw_output JSONB,
  
  created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  
  CONSTRAINT fk_project_result FOREIGN KEY (project_id) 
    REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE,
  CONSTRAINT fk_literature FOREIGN KEY (literature_id) 
    REFERENCES asl_schema.literatures(id) ON DELETE CASCADE,
  CONSTRAINT unique_project_literature UNIQUE (project_id, literature_id)
);

CREATE INDEX idx_screening_results_project_id ON asl_schema.screening_results(project_id);
CREATE INDEX idx_screening_results_literature_id ON asl_schema.screening_results(literature_id);
CREATE INDEX idx_screening_results_conflict_status ON asl_schema.screening_results(conflict_status);
CREATE INDEX idx_screening_results_final_decision ON asl_schema.screening_results(final_decision);

4. 筛选任务表 (screening_tasks)

Prisma模型名: AslScreeningTask
表名: asl_schema.screening_tasks

model AslScreeningTask {
  id                String   @id @default(uuid())
  projectId         String   @map("project_id")
  project           AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade)
  
  taskType          String   @map("task_type") // "title_abstract" | "full_text"
  status            String   @default("pending")
  // 可选值: pending, running, completed, failed
  
  // 进度统计
  totalItems        Int      @map("total_items")
  processedItems    Int      @default(0) @map("processed_items")
  successItems      Int      @default(0) @map("success_items")
  failedItems       Int      @default(0) @map("failed_items")
  conflictItems     Int      @default(0) @map("conflict_items")
  
  // 时间信息
  startedAt         DateTime? @map("started_at")
  completedAt       DateTime? @map("completed_at")
  estimatedEndAt    DateTime? @map("estimated_end_at")
  
  // 错误信息
  errorMessage      String?  @map("error_message") @db.Text
  
  createdAt         DateTime @default(now()) @map("created_at")
  updatedAt         DateTime @updatedAt @map("updated_at")
  
  @@map("screening_tasks")
  @@schema("asl_schema")
  @@index([projectId])
  @@index([status])
}

SQL表结构:

CREATE TABLE asl_schema.screening_tasks (
  id TEXT PRIMARY KEY,
  project_id TEXT NOT NULL,
  task_type TEXT NOT NULL,
  status TEXT NOT NULL DEFAULT 'pending',
  total_items INTEGER NOT NULL,
  processed_items INTEGER NOT NULL DEFAULT 0,
  success_items INTEGER NOT NULL DEFAULT 0,
  failed_items INTEGER NOT NULL DEFAULT 0,
  conflict_items INTEGER NOT NULL DEFAULT 0,
  started_at TIMESTAMP(3),
  completed_at TIMESTAMP(3),
  estimated_end_at TIMESTAMP(3),
  error_message TEXT,
  created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  CONSTRAINT fk_project_task FOREIGN KEY (project_id) 
    REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE
);

CREATE INDEX idx_screening_tasks_project_id ON asl_schema.screening_tasks(project_id);
CREATE INDEX idx_screening_tasks_status ON asl_schema.screening_tasks(status);

5. 全文复筛任务表 (fulltext_screening_tasks) v3.0新增

Prisma模型名: AslFulltextScreeningTask
表名: asl_schema.fulltext_screening_tasks

设计目标:管理全文复筛的批处理任务,支持双模型并行调用、成本追踪、降级模式

model AslFulltextScreeningTask {
  id                String   @id @default(uuid())
  projectId         String   @map("project_id")
  project           AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade)
  
  // 任务配置
  modelA            String   @map("model_a")          // "deepseek-v3"
  modelB            String   @map("model_b")          // "qwen-max"
  promptVersion     String   @default("v1.0.0") @map("prompt_version")
  
  // 任务状态
  status            String   @default("pending")
  // "pending" | "running" | "completed" | "failed" | "cancelled"
  
  // 进度统计
  totalCount        Int      @map("total_count")
  processedCount    Int      @default(0) @map("processed_count")
  successCount      Int      @default(0) @map("success_count")
  failedCount       Int      @default(0) @map("failed_count")
  degradedCount     Int      @default(0) @map("degraded_count")  // 单模型成功
  
  // 成本统计
  totalTokens       Int      @default(0) @map("total_tokens")
  totalCost         Float    @default(0) @map("total_cost")
  
  // 时间信息
  startedAt         DateTime? @map("started_at")
  completedAt       DateTime? @map("completed_at")
  estimatedEndAt    DateTime? @map("estimated_end_at")
  
  // 错误信息
  errorMessage      String?  @map("error_message") @db.Text
  errorStack        String?  @map("error_stack") @db.Text
  
  // 关联
  results           AslFulltextScreeningResult[]
  
  createdAt         DateTime @default(now()) @map("created_at")
  updatedAt         DateTime @updatedAt @map("updated_at")
  
  @@map("fulltext_screening_tasks")
  @@schema("asl_schema")
  @@index([projectId])
  @@index([status])
  @@index([createdAt])
}

SQL表结构:

CREATE TABLE asl_schema.fulltext_screening_tasks (
  id TEXT PRIMARY KEY,
  project_id TEXT NOT NULL,
  
  -- 任务配置
  model_a TEXT NOT NULL,
  model_b TEXT NOT NULL,
  prompt_version TEXT NOT NULL DEFAULT 'v1.0.0',
  
  -- 任务状态
  status TEXT NOT NULL DEFAULT 'pending',
  
  -- 进度统计
  total_count INTEGER NOT NULL,
  processed_count INTEGER NOT NULL DEFAULT 0,
  success_count INTEGER NOT NULL DEFAULT 0,
  failed_count INTEGER NOT NULL DEFAULT 0,
  degraded_count INTEGER NOT NULL DEFAULT 0,
  
  -- 成本统计
  total_tokens INTEGER NOT NULL DEFAULT 0,
  total_cost DOUBLE PRECISION NOT NULL DEFAULT 0,
  
  -- 时间信息
  started_at TIMESTAMP(3),
  completed_at TIMESTAMP(3),
  estimated_end_at TIMESTAMP(3),
  
  -- 错误信息
  error_message TEXT,
  error_stack TEXT,
  
  created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  
  CONSTRAINT fk_project_fulltext_task FOREIGN KEY (project_id) 
    REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE
);

CREATE INDEX idx_fulltext_screening_tasks_project_id ON asl_schema.fulltext_screening_tasks(project_id);
CREATE INDEX idx_fulltext_screening_tasks_status ON asl_schema.fulltext_screening_tasks(status);
CREATE INDEX idx_fulltext_screening_tasks_created_at ON asl_schema.fulltext_screening_tasks(created_at);

字段说明

字段 类型 说明
modelA / modelB String 双模型名称deepseek-v3 + qwen-max
degradedCount Int 单模型成功的任务数(容错机制)
totalTokens Int 累计Token使用量
totalCost Float 累计成本(元)
promptVersion String Prompt版本可追溯

6. 全文复筛结果表 (fulltext_screening_results) v3.0新增

Prisma模型名: AslFulltextScreeningResult
表名: asl_schema.fulltext_screening_results

设计目标存储12字段详细评估结果支持双模型对比、验证结果、冲突检测

设计亮点

  • 完整的双模型结果fields + overall + logs
  • 医学逻辑验证和证据链验证结果
  • 冲突检测和复核优先级
  • 降级模式支持(单模型成功)
  • JSON存储12字段评估符合云原生规范
model AslFulltextScreeningResult {
  id                String   @id @default(uuid())
  taskId            String   @map("task_id")
  task              AslFulltextScreeningTask @relation(fields: [taskId], references: [id], onDelete: Cascade)
  projectId         String   @map("project_id")
  project           AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade)
  literatureId      String   @map("literature_id")
  literature        AslLiterature @relation(fields: [literatureId], references: [id], onDelete: Cascade)
  
  // ====== 模型A结果DeepSeek-V3======
  modelAName        String   @map("model_a_name")
  modelAStatus      String   @map("model_a_status")      // "success" | "failed"
  modelAFields      Json     @map("model_a_fields")      // 12字段评估 { field1: {...}, field2: {...}, ... }
  modelAOverall     Json     @map("model_a_overall")     // 总体评估 { decision, confidence, keyIssues }
  modelAProcessingLog Json?  @map("model_a_processing_log")
  modelAVerification Json?   @map("model_a_verification")
  modelATokens      Int?     @map("model_a_tokens")
  modelACost        Float?   @map("model_a_cost")
  modelAError       String?  @map("model_a_error") @db.Text
  
  // ====== 模型B结果Qwen-Max======
  modelBName        String   @map("model_b_name")
  modelBStatus      String   @map("model_b_status")
  modelBFields      Json     @map("model_b_fields")
  modelBOverall     Json     @map("model_b_overall")
  modelBProcessingLog Json?  @map("model_b_processing_log")
  modelBVerification Json?   @map("model_b_verification")
  modelBTokens      Int?     @map("model_b_tokens")
  modelBCost        Float?   @map("model_b_cost")
  modelBError       String?  @map("model_b_error") @db.Text
  
  // ====== 验证结果 ======
  medicalLogicIssues Json?   @map("medical_logic_issues")  // MedicalLogicValidator输出
  evidenceChainIssues Json?  @map("evidence_chain_issues") // EvidenceChainValidator输出
  
  // ====== 冲突检测 ======
  isConflict        Boolean  @default(false) @map("is_conflict")
  conflictSeverity  String?  @map("conflict_severity")        // "high" | "medium" | "low"
  conflictFields    String[] @map("conflict_fields")          // ["field1", "field9", "overall"]
  conflictDetails   Json?    @map("conflict_details")
  reviewPriority    Int?     @map("review_priority")          // 0-100复核优先级
  reviewDeadline    DateTime? @map("review_deadline")
  
  // ====== 最终决策 ======
  finalDecision     String?  @map("final_decision")           // "include" | "exclude" | null
  finalDecisionBy   String?  @map("final_decision_by")
  finalDecisionAt   DateTime? @map("final_decision_at")
  exclusionReason   String?  @map("exclusion_reason") @db.Text
  reviewNotes       String?  @map("review_notes") @db.Text
  
  // ====== 处理状态 ======
  processingStatus  String   @default("pending") @map("processing_status")
  // "pending" | "processing" | "completed" | "failed" | "degraded"
  isDegraded        Boolean  @default(false) @map("is_degraded")
  degradedModel     String?  @map("degraded_model")               // "modelA" | "modelB"
  
  processedAt       DateTime? @map("processed_at")
  
  // ====== 可追溯信息 ======
  promptVersion     String   @default("v1.0.0") @map("prompt_version")
  rawOutputA        Json?    @map("raw_output_a")
  rawOutputB        Json?    @map("raw_output_b")
  
  createdAt         DateTime @default(now()) @map("created_at")
  updatedAt         DateTime @updatedAt @map("updated_at")
  
  @@map("fulltext_screening_results")
  @@schema("asl_schema")
  @@index([taskId])
  @@index([projectId])
  @@index([literatureId])
  @@index([isConflict])
  @@index([finalDecision])
  @@index([reviewPriority])
  @@unique([projectId, literatureId])  // 一篇文献只有一个全文复筛结果
}

SQL表结构(简化版,实际包含所有字段):

CREATE TABLE asl_schema.fulltext_screening_results (
  id TEXT PRIMARY KEY,
  task_id TEXT NOT NULL,
  project_id TEXT NOT NULL,
  literature_id TEXT NOT NULL,
  
  -- 模型A结果
  model_a_name TEXT NOT NULL,
  model_a_status TEXT NOT NULL,
  model_a_fields JSONB NOT NULL,
  model_a_overall JSONB NOT NULL,
  model_a_processing_log JSONB,
  model_a_verification JSONB,
  model_a_tokens INTEGER,
  model_a_cost DOUBLE PRECISION,
  model_a_error TEXT,
  
  -- 模型B结果同上
  model_b_name TEXT NOT NULL,
  model_b_status TEXT NOT NULL,
  model_b_fields JSONB NOT NULL,
  model_b_overall JSONB NOT NULL,
  model_b_processing_log JSONB,
  model_b_verification JSONB,
  model_b_tokens INTEGER,
  model_b_cost DOUBLE PRECISION,
  model_b_error TEXT,
  
  -- 验证结果
  medical_logic_issues JSONB,
  evidence_chain_issues JSONB,
  
  -- 冲突检测
  is_conflict BOOLEAN NOT NULL DEFAULT false,
  conflict_severity TEXT,
  conflict_fields TEXT[],
  conflict_details JSONB,
  review_priority INTEGER,
  review_deadline TIMESTAMP(3),
  
  -- 最终决策
  final_decision TEXT,
  final_decision_by TEXT,
  final_decision_at TIMESTAMP(3),
  exclusion_reason TEXT,
  review_notes TEXT,
  
  -- 处理状态
  processing_status TEXT NOT NULL DEFAULT 'pending',
  is_degraded BOOLEAN NOT NULL DEFAULT false,
  degraded_model TEXT,
  processed_at TIMESTAMP(3),
  
  -- 可追溯信息
  prompt_version TEXT NOT NULL DEFAULT 'v1.0.0',
  raw_output_a JSONB,
  raw_output_b JSONB,
  
  created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  
  CONSTRAINT fk_task FOREIGN KEY (task_id) 
    REFERENCES asl_schema.fulltext_screening_tasks(id) ON DELETE CASCADE,
  CONSTRAINT fk_project_fulltext_result FOREIGN KEY (project_id) 
    REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE,
  CONSTRAINT fk_literature_fulltext FOREIGN KEY (literature_id) 
    REFERENCES asl_schema.literatures(id) ON DELETE CASCADE,
  CONSTRAINT unique_project_literature_fulltext UNIQUE (project_id, literature_id)
);

CREATE INDEX idx_fulltext_screening_results_task_id ON asl_schema.fulltext_screening_results(task_id);
CREATE INDEX idx_fulltext_screening_results_project_id ON asl_schema.fulltext_screening_results(project_id);
CREATE INDEX idx_fulltext_screening_results_literature_id ON asl_schema.fulltext_screening_results(literature_id);
CREATE INDEX idx_fulltext_screening_results_is_conflict ON asl_schema.fulltext_screening_results(is_conflict);
CREATE INDEX idx_fulltext_screening_results_final_decision ON asl_schema.fulltext_screening_results(final_decision);
CREATE INDEX idx_fulltext_screening_results_review_priority ON asl_schema.fulltext_screening_results(review_priority);

JSON字段示例

modelAFields (12字段评估):

{
  "field1": {
    "present": true,
    "completeness": "完整",
    "extractable": true,
    "quote": "第一作者Zhang et al., 发表于 JAMA 2023...",
    "location": "Title page, Methods section",
    "note": "文献来源信息完整"
  },
  "field2": { ... },
  // ... field3-field12
}

modelAOverall (总体评估):

{
  "decision": "include",
  "confidence": 0.92,
  "keyIssues": [
    "随机化方法描述完整",
    "盲法实施清晰",
    "结局指标可提取"
  ]
}

medicalLogicIssues (医学逻辑验证):

{
  "hasIssues": false,
  "issues": []
}

conflictDetails (冲突详情):

{
  "field9": {
    "modelA": "完整",
    "modelB": "不完整",
    "severity": "high"
  }
}

📊 数据关系图v3.0更新)

literature_screening_projects (1) ──< (N) literature_items
literature_screening_projects (1) ──< (N) title_abstract_screening_results
literature_items (1) ──< (1) title_abstract_screening_results
literature_screening_projects (1) ──< (N) screening_tasks

🔍 索引设计汇总v3.0更新)

表名 索引字段 索引类型 说明
screening_projects user_id B-tree 用户项目查询
screening_projects status B-tree 状态筛选
literatures project_id B-tree 项目文献查询
literatures doi B-tree DOI查重
literatures stage B-tree 文献阶段查询 v3.0
literatures has_pdf B-tree PDF获取状态 v3.0
literatures pdf_status B-tree PDF上传状态 v3.0
literatures (project_id, pmid) Unique 防止重复导入
screening_results project_id B-tree 项目结果查询
screening_results literature_id B-tree 文献结果查询
screening_results conflict_status B-tree 冲突筛选
screening_results final_decision B-tree 决策筛选
screening_results (project_id, literature_id) Unique 唯一性约束
screening_tasks project_id B-tree 项目任务查询
screening_tasks status B-tree 任务状态筛选
fulltext_screening_tasks project_id B-tree 全文任务查询 v3.0
fulltext_screening_tasks status B-tree 任务状态筛选 v3.0
fulltext_screening_tasks created_at B-tree 时间排序 v3.0
fulltext_screening_results task_id B-tree 任务结果查询 v3.0
fulltext_screening_results project_id B-tree 项目结果查询 v3.0
fulltext_screening_results literature_id B-tree 文献结果查询 v3.0
fulltext_screening_results is_conflict B-tree 冲突筛选 v3.0
fulltext_screening_results final_decision B-tree 决策筛选 v3.0
fulltext_screening_results review_priority B-tree 复核优先级 v3.0
fulltext_screening_results (project_id, literature_id) Unique 唯一性约束 v3.0

索引总数: 25个v3.0新增13个
唯一约束: 4个v3.0新增1个

v3.0索引优化说明

  • literatures.stage: 快速查询特定阶段的文献(如"pdf_acquired"待全文复筛)
  • fulltext_screening_results.review_priority: 优化人工复核队列排序
  • fulltext_screening_tasks.created_at: 任务历史查询优化

💾 数据字典

PICO标准 (picoCriteria JSON)

{
  "population": "研究人群2型糖尿病成人患者",
  "intervention": "干预措施SGLT2抑制剂",
  "comparison": "对照,如:安慰剂或常规疗法",
  "outcome": "结局指标,如:心血管结局",
  "studyDesign": "研究设计,如:随机对照试验 (RCT)"
}

筛选配置 (screeningConfig JSON)

{
  "models": ["deepseek-chat", "qwen-max"],
  "temperature": 0,
  "maxRetries": 3
}

冲突字段 (conflictFields JSON)

["P", "I", "C", "S", "conclusion"]

原始输出 (rawOutput JSON)

{
  "deepseek": { "判断": {...}, "证据": {...} },
  "qwen": { "判断": {...}, "证据": {...} }
}

🔒 数据安全

Schema隔离

  • 使用 asl_schema 与其他模块数据隔离
  • 用户表在 platform_schema,统一管理

级联删除

  • 删除用户 → 自动删除所有筛选项目及关联数据
  • 删除项目 → 自动删除文献、结果、任务
  • 删除文献 → 自动删除筛选结果

唯一性约束

  • 同一项目中PMID唯一允许无PMID
  • 同一项目中一篇文献只有一个筛选结果

📈 数据量预估

项目规模 文献数 筛选结果 存储空间
小型 100-500 100-500 < 10 MB
中型 500-2000 500-2000 10-50 MB
大型 2000-5000 2000-5000 50-200 MB
超大型 5000+ 5000+ 200 MB+

单条记录大小估算:

  • 文献条目:~2-5 KB
  • 筛选结果:~5-10 KB含双模型判断和证据

后续规划

Phase 2 (全文复筛) v3.0已完成

  • 扩展 literatures 表(生命周期管理)
  • 添加 fulltext_screening_tasks
  • 添加 fulltext_screening_results12字段

Phase 3 (数据提取) 待开发

  • 复用 fulltext_screening_tasks 表(切换模式)
  • 复用 fulltext_screening_results 表(存储提取数据)
  • 或新增 data_extraction_results 表(如需独立)

Phase 4 (质量评估) 待规划

  • 质量评估结果表
  • 偏倚风险评估表
  • GRADE证据质量表

📝 v3.0 设计决策记录

决策1: 全文内容存储引用而非直接存储

问题:全文内容是否存储在数据库?

方案对比

方案 优点 缺点
存TEXT LLM调用快 违背云原生规范,数据库臃肿
存引用 符合规范,轻量 LLM调用增加100-200ms

决策 采用方案2存引用

  • 符合云原生存储与计算分离原则
  • 支持超大文献(>1MB
  • RDS存储成本是OSS的5-10倍

决策2: 12字段使用JSON存储

问题12字段是拆分为列还是JSON存储

决策 使用PostgreSQL JSONB

  • 不需要单独查询某个字段内部
  • 字段结构复杂6个子字段
  • JSONB性能优秀且支持GIN索引

决策3: 独立全文复筛结果表

问题:是否复用 screening_results 表?

决策 新增独立表 fulltext_screening_results

  • 数据结构完全不同PICOS vs 12字段
  • 避免字段冗余和逻辑耦合
  • 便于独立维护和优化

文档版本: v3.0
最后更新: 2025-11-22Day 4全文复筛数据库设计
维护者: AI智能文献开发团队

版本历史

  • v3.0 (2025-11-22): 全文复筛数据库设计新增3个表和相关字段
  • v2.2 (2025-11-21): Week 4统计功能完成
  • v2.0 (2025-11-18): 标题初筛数据库设计
  • v1.0 (2025-10-29): 初始版本