# AI智能文献模块 - 数据库设计 > **文档版本:** v3.0 > **创建日期:** 2025-10-29 > **维护者:** AI智能文献开发团队 > **最后更新:** 2025-11-22(Day 4:全文复筛数据库设计) > **更新说明:** 新增全文复筛相关表(`AslLiterature`扩展、`AslFulltextScreeningTask`、`AslFulltextScreeningResult`) --- ## 📋 文档说明 本文档描述AI智能文献模块的数据库设计,包括数据表结构、关系设计、索引设计等。 **技术栈**: - 数据库:PostgreSQL 16+ - ORM:Prisma - Schema隔离:`asl_schema` - 关联用户表:`platform_schema.users` --- ## 🏗️ Schema架构 ASL模块使用独立的 `asl_schema` 进行数据隔离,确保模块独立性和数据安全。 ``` platform_schema └── users (用户表) ↓ asl_schema ├── screening_projects (筛选项目) ├── literatures (文献条目) ├── screening_results (标题初筛结果) ├── screening_tasks (标题初筛任务) ├── fulltext_screening_tasks (全文复筛任务) ⭐ Day 4新增 └── fulltext_screening_results (全文复筛结果) ⭐ Day 4新增 ``` **v3.0 更新说明(2025-11-22)**: - ✅ 扩展 `literatures` 表:支持全文生命周期管理、PDF存储、全文内容引用 - ✅ 新增 `fulltext_screening_tasks` 表:管理全文复筛批处理任务 - ✅ 新增 `fulltext_screening_results` 表:存储12字段评估结果 - ✅ 符合云原生规范:全文内容存储引用而非直接存储 --- ## 🗄️ 核心数据表 ### 1. 筛选项目表 (screening_projects) **Prisma模型名**: `AslScreeningProject` **表名**: `asl_schema.screening_projects` ```prisma model AslScreeningProject { id String @id @default(uuid()) userId String @map("user_id") user User @relation("AslProjects", fields: [userId], references: [id], onDelete: Cascade) projectName String @map("project_name") // PICO标准 picoCriteria Json @map("pico_criteria") // ⚠️ 格式兼容性说明: // 前端使用: { P, I, C, O, S } // 后端兼容: { P, I, C, O, S } 或 { population, intervention, comparison, outcome, studyDesign } // screeningService.ts 中有字段映射逻辑 // 筛选标准 inclusionCriteria String @map("inclusion_criteria") @db.Text exclusionCriteria String @map("exclusion_criteria") @db.Text // 状态 status String @default("draft") // 可选值: draft, screening, completed // 筛选配置 screeningConfig Json? @map("screening_config") // 结构: { models: ["DeepSeek-V3", "Qwen-Max"], style: "standard" } // ⚠️ 模型名称映射: // 前端展示名: DeepSeek-V3 → API名: deepseek-chat // 前端展示名: Qwen-Max → API名: qwen-max // screeningService.ts 中有模型名映射逻辑 // 关联 literatures AslLiterature[] screeningTasks AslScreeningTask[] screeningResults AslScreeningResult[] createdAt DateTime @default(now()) @map("created_at") updatedAt DateTime @updatedAt @map("updated_at") @@map("screening_projects") @@schema("asl_schema") @@index([userId]) @@index([status]) } ``` **SQL表结构**: ```sql CREATE TABLE asl_schema.screening_projects ( id TEXT PRIMARY KEY, user_id TEXT NOT NULL, project_name TEXT NOT NULL, pico_criteria JSONB NOT NULL, inclusion_criteria TEXT NOT NULL, exclusion_criteria TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'draft', screening_config JSONB, created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES platform_schema.users(id) ON DELETE CASCADE ); CREATE INDEX idx_screening_projects_user_id ON asl_schema.screening_projects(user_id); CREATE INDEX idx_screening_projects_status ON asl_schema.screening_projects(status); ``` --- ### 2. 文献条目表 (literatures) ⭐ v3.0更新 **Prisma模型名**: `AslLiterature` **表名**: `asl_schema.literatures` **v3.0 更新说明**: - ✅ 新增 `stage` 字段:追踪文献生命周期(imported → title_screened → pdf_acquired → fulltext_screened → data_extracted) - ✅ 新增 PDF存储字段:支持Dify/OSS双适配(`pdfStorageType`, `pdfStorageRef`, `pdfStatus`) - ✅ 新增 全文存储字段:**符合云原生规范,存储引用而非内容**(`fullTextStorageRef`, `fullTextUrl`) - ✅ 新增索引:`stage`, `hasPdf`, `pdfStatus` 提升查询性能 ```prisma model AslLiterature { id String @id @default(uuid()) projectId String @map("project_id") project AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade) // 文献基本信息 pmid String? title String @db.Text abstract String @db.Text authors String? journal String? publicationYear Int? @map("publication_year") doi String? // ⭐ v3.0 新增:文献阶段(生命周期管理) stage String @default("imported") @map("stage") // imported | title_screened | title_included | pdf_acquired | fulltext_screened | data_extracted // 云原生存储字段(V1.0 阶段使用,MVP阶段预留) pdfUrl String? @map("pdf_url") // PDF访问URL pdfOssKey String? @map("pdf_oss_key") // OSS存储Key(用于删除) pdfFileSize Int? @map("pdf_file_size") // 文件大小(字节) // ⭐ v3.0 新增:PDF存储(Dify/OSS双适配) hasPdf Boolean @default(false) @map("has_pdf") pdfStorageType String? @map("pdf_storage_type") // "dify" | "oss" pdfStorageRef String? @map("pdf_storage_ref") // Dify: document_id, OSS: object_key pdfStatus String? @map("pdf_status") // "uploading" | "ready" | "failed" pdfUploadedAt DateTime? @map("pdf_uploaded_at") // ⭐ v3.0 新增:全文内容存储(云原生:存储引用而非内容) fullTextStorageType String? @map("full_text_storage_type") // "dify" | "oss" fullTextStorageRef String? @map("full_text_storage_ref") // document_id 或 object_key fullTextUrl String? @map("full_text_url") // 访问URL fullTextFormat String? @map("full_text_format") // "markdown" | "plaintext" fullTextSource String? @map("full_text_source") // "nougat" | "pymupdf" fullTextTokenCount Int? @map("full_text_token_count") fullTextExtractedAt DateTime? @map("full_text_extracted_at") // 关联 screeningResults AslScreeningResult[] fulltextScreeningResults AslFulltextScreeningResult[] // ⭐ v3.0 新增 createdAt DateTime @default(now()) @map("created_at") updatedAt DateTime @updatedAt @map("updated_at") @@map("literatures") @@schema("asl_schema") @@index([projectId]) @@index([doi]) @@index([stage]) // ⭐ v3.0 新增 @@index([hasPdf]) // ⭐ v3.0 新增 @@index([pdfStatus]) // ⭐ v3.0 新增 @@unique([projectId, pmid]) } ``` **SQL表结构**(v3.0): ```sql CREATE TABLE asl_schema.literatures ( id TEXT PRIMARY KEY, project_id TEXT NOT NULL, -- 文献基本信息 pmid TEXT, title TEXT NOT NULL, abstract TEXT NOT NULL, authors TEXT, journal TEXT, publication_year INTEGER, doi TEXT, -- 文献阶段 stage TEXT NOT NULL DEFAULT 'imported', -- PDF存储(旧字段,V1.0预留) pdf_url TEXT, pdf_oss_key TEXT, pdf_file_size INTEGER, -- PDF存储(新字段,Dify/OSS双适配) has_pdf BOOLEAN NOT NULL DEFAULT false, pdf_storage_type TEXT, pdf_storage_ref TEXT, pdf_status TEXT, pdf_uploaded_at TIMESTAMP(3), -- 全文内容存储(引用) full_text_storage_type TEXT, full_text_storage_ref TEXT, full_text_url TEXT, full_text_format TEXT, full_text_source TEXT, full_text_token_count INTEGER, full_text_extracted_at TIMESTAMP(3), created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, CONSTRAINT fk_project FOREIGN KEY (project_id) REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE, CONSTRAINT unique_project_pmid UNIQUE (project_id, pmid) ); CREATE INDEX idx_literatures_project_id ON asl_schema.literatures(project_id); CREATE INDEX idx_literatures_doi ON asl_schema.literatures(doi); CREATE INDEX idx_literatures_stage ON asl_schema.literatures(stage); CREATE INDEX idx_literatures_has_pdf ON asl_schema.literatures(has_pdf); CREATE INDEX idx_literatures_pdf_status ON asl_schema.literatures(pdf_status); ``` **字段说明**: | 字段 | 类型 | 说明 | 设计理由 | |------|------|------|----------| | `stage` | String | 文献阶段 | 追踪文献在整个流程中的位置 | | `pdfStorageType` | String | PDF存储类型 | "dify"\|"oss",支持双适配器 | | `pdfStorageRef` | String | PDF存储引用 | Dify的document_id或OSS的object_key | | `fullTextStorageType` | String | 全文存储类型 | 云原生:不直接存全文,存引用 ✅ | | `fullTextStorageRef` | String | 全文存储引用 | 指向Dify或OSS中的全文文档 ✅ | | `fullTextUrl` | String | 全文访问URL | 直接访问全文的URL | | `fullTextTokenCount` | Int | Token数量 | 用于成本估算和LLM调用优化 | **云原生设计亮点** ⭐: - ✅ 全文内容存储在OSS/Dify,数据库只存引用(符合云原生规范) - ✅ 支持Dify → OSS无缝迁移(只需切换storageType) - ✅ 数据库轻量,避免大量TEXT字段 --- ### 3. 筛选结果表 (screening_results) **Prisma模型名**: `AslScreeningResult` **表名**: `asl_schema.screening_results` **设计亮点**:支持双模型(DeepSeek + Qwen)并行验证,包含完整的判断、证据和冲突检测。 ```prisma model AslScreeningResult { id String @id @default(uuid()) projectId String @map("project_id") project AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade) literatureId String @map("literature_id") literature AslLiterature @relation(fields: [literatureId], references: [id], onDelete: Cascade) // DeepSeek模型判断 dsModelName String @map("ds_model_name") // "deepseek-chat" dsPJudgment String? @map("ds_p_judgment") // "match" | "partial" | "mismatch" dsIJudgment String? @map("ds_i_judgment") dsCJudgment String? @map("ds_c_judgment") dsSJudgment String? @map("ds_s_judgment") dsConclusion String? @map("ds_conclusion") // "include" | "exclude" | "uncertain" dsConfidence Float? @map("ds_confidence") // 0-1 // DeepSeek模型证据 dsPEvidence String? @map("ds_p_evidence") @db.Text dsIEvidence String? @map("ds_i_evidence") @db.Text dsCEvidence String? @map("ds_c_evidence") @db.Text dsSEvidence String? @map("ds_s_evidence") @db.Text dsReason String? @map("ds_reason") @db.Text // Qwen模型判断 qwenModelName String @map("qwen_model_name") // "qwen-max" qwenPJudgment String? @map("qwen_p_judgment") qwenIJudgment String? @map("qwen_i_judgment") qwenCJudgment String? @map("qwen_c_judgment") qwenSJudgment String? @map("qwen_s_judgment") qwenConclusion String? @map("qwen_conclusion") qwenConfidence Float? @map("qwen_confidence") // Qwen模型证据 qwenPEvidence String? @map("qwen_p_evidence") @db.Text qwenIEvidence String? @map("qwen_i_evidence") @db.Text qwenCEvidence String? @map("qwen_c_evidence") @db.Text qwenSEvidence String? @map("qwen_s_evidence") @db.Text qwenReason String? @map("qwen_reason") @db.Text // 冲突状态 conflictStatus String @default("none") @map("conflict_status") // 可选值: none, conflict, resolved conflictFields Json? @map("conflict_fields") // 示例: ["P", "I", "conclusion"] // 最终决策(Week 4 混合方案使用) finalDecision String? @map("final_decision") // "include" | "exclude" | null // ⭐ Week 4 说明:人工复核后设置此字段,作为最终决策 // - include: 人工决定纳入(可能推翻AI建议) // - exclude: 人工决定排除(可能推翻AI建议) // - null: 未复核,使用AI决策 finalDecisionBy String? @map("final_decision_by") // userId finalDecisionAt DateTime? @map("final_decision_at") exclusionReason String? @map("exclusion_reason") @db.Text // ⭐ Week 4 说明:人工填写的排除原因(优先级高于AI提取) // - 如果finalDecision=exclude,此字段存储人工填写的原因 // - 如果为null,前端自动从AI判断中提取(dsPJudgment/dsIJudgment等) // - Week 4 初筛结果页使用此字段显示排除原因 // AI处理状态 aiProcessingStatus String @default("pending") @map("ai_processing_status") // 可选值: pending, processing, completed, failed aiProcessedAt DateTime? @map("ai_processed_at") aiErrorMessage String? @map("ai_error_message") @db.Text // 可追溯信息 promptVersion String @default("v1.0.0") @map("prompt_version") rawOutput Json? @map("raw_output") // 原始LLM输出(备份) createdAt DateTime @default(now()) @map("created_at") updatedAt DateTime @updatedAt @map("updated_at") @@map("screening_results") @@schema("asl_schema") @@index([projectId]) @@index([literatureId]) @@index([conflictStatus]) @@index([finalDecision]) @@unique([projectId, literatureId]) // 一篇文献在一个项目中只有一个筛选结果 } ``` **SQL表结构**(简化版): ```sql CREATE TABLE asl_schema.screening_results ( id TEXT PRIMARY KEY, project_id TEXT NOT NULL, literature_id TEXT NOT NULL, -- DeepSeek判断 ds_model_name TEXT NOT NULL, ds_p_judgment TEXT, ds_i_judgment TEXT, ds_c_judgment TEXT, ds_s_judgment TEXT, ds_conclusion TEXT, ds_confidence DOUBLE PRECISION, ds_p_evidence TEXT, ds_i_evidence TEXT, ds_c_evidence TEXT, ds_s_evidence TEXT, ds_reason TEXT, -- Qwen判断 qwen_model_name TEXT NOT NULL, qwen_p_judgment TEXT, qwen_i_judgment TEXT, qwen_c_judgment TEXT, qwen_s_judgment TEXT, qwen_conclusion TEXT, qwen_confidence DOUBLE PRECISION, qwen_p_evidence TEXT, qwen_i_evidence TEXT, qwen_c_evidence TEXT, qwen_s_evidence TEXT, qwen_reason TEXT, -- 冲突状态 conflict_status TEXT NOT NULL DEFAULT 'none', conflict_fields JSONB, -- 最终决策 final_decision TEXT, final_decision_by TEXT, final_decision_at TIMESTAMP(3), exclusion_reason TEXT, -- AI处理状态 ai_processing_status TEXT NOT NULL DEFAULT 'pending', ai_processed_at TIMESTAMP(3), ai_error_message TEXT, -- 可追溯信息 prompt_version TEXT NOT NULL DEFAULT 'v1.0.0', raw_output JSONB, created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, CONSTRAINT fk_project_result FOREIGN KEY (project_id) REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE, CONSTRAINT fk_literature FOREIGN KEY (literature_id) REFERENCES asl_schema.literatures(id) ON DELETE CASCADE, CONSTRAINT unique_project_literature UNIQUE (project_id, literature_id) ); CREATE INDEX idx_screening_results_project_id ON asl_schema.screening_results(project_id); CREATE INDEX idx_screening_results_literature_id ON asl_schema.screening_results(literature_id); CREATE INDEX idx_screening_results_conflict_status ON asl_schema.screening_results(conflict_status); CREATE INDEX idx_screening_results_final_decision ON asl_schema.screening_results(final_decision); ``` --- ### 4. 筛选任务表 (screening_tasks) **Prisma模型名**: `AslScreeningTask` **表名**: `asl_schema.screening_tasks` ```prisma model AslScreeningTask { id String @id @default(uuid()) projectId String @map("project_id") project AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade) taskType String @map("task_type") // "title_abstract" | "full_text" status String @default("pending") // 可选值: pending, running, completed, failed // 进度统计 totalItems Int @map("total_items") processedItems Int @default(0) @map("processed_items") successItems Int @default(0) @map("success_items") failedItems Int @default(0) @map("failed_items") conflictItems Int @default(0) @map("conflict_items") // 时间信息 startedAt DateTime? @map("started_at") completedAt DateTime? @map("completed_at") estimatedEndAt DateTime? @map("estimated_end_at") // 错误信息 errorMessage String? @map("error_message") @db.Text createdAt DateTime @default(now()) @map("created_at") updatedAt DateTime @updatedAt @map("updated_at") @@map("screening_tasks") @@schema("asl_schema") @@index([projectId]) @@index([status]) } ``` **SQL表结构**: ```sql CREATE TABLE asl_schema.screening_tasks ( id TEXT PRIMARY KEY, project_id TEXT NOT NULL, task_type TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'pending', total_items INTEGER NOT NULL, processed_items INTEGER NOT NULL DEFAULT 0, success_items INTEGER NOT NULL DEFAULT 0, failed_items INTEGER NOT NULL DEFAULT 0, conflict_items INTEGER NOT NULL DEFAULT 0, started_at TIMESTAMP(3), completed_at TIMESTAMP(3), estimated_end_at TIMESTAMP(3), error_message TEXT, created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, CONSTRAINT fk_project_task FOREIGN KEY (project_id) REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE ); CREATE INDEX idx_screening_tasks_project_id ON asl_schema.screening_tasks(project_id); CREATE INDEX idx_screening_tasks_status ON asl_schema.screening_tasks(status); ``` --- ### 5. 全文复筛任务表 (fulltext_screening_tasks) ⭐ v3.0新增 **Prisma模型名**: `AslFulltextScreeningTask` **表名**: `asl_schema.fulltext_screening_tasks` **设计目标**:管理全文复筛的批处理任务,支持双模型并行调用、成本追踪、降级模式 ```prisma model AslFulltextScreeningTask { id String @id @default(uuid()) projectId String @map("project_id") project AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade) // 任务配置 modelA String @map("model_a") // "deepseek-v3" modelB String @map("model_b") // "qwen-max" promptVersion String @default("v1.0.0") @map("prompt_version") // 任务状态 status String @default("pending") // "pending" | "running" | "completed" | "failed" | "cancelled" // 进度统计 totalCount Int @map("total_count") processedCount Int @default(0) @map("processed_count") successCount Int @default(0) @map("success_count") failedCount Int @default(0) @map("failed_count") degradedCount Int @default(0) @map("degraded_count") // 单模型成功 // 成本统计 totalTokens Int @default(0) @map("total_tokens") totalCost Float @default(0) @map("total_cost") // 时间信息 startedAt DateTime? @map("started_at") completedAt DateTime? @map("completed_at") estimatedEndAt DateTime? @map("estimated_end_at") // 错误信息 errorMessage String? @map("error_message") @db.Text errorStack String? @map("error_stack") @db.Text // 关联 results AslFulltextScreeningResult[] createdAt DateTime @default(now()) @map("created_at") updatedAt DateTime @updatedAt @map("updated_at") @@map("fulltext_screening_tasks") @@schema("asl_schema") @@index([projectId]) @@index([status]) @@index([createdAt]) } ``` **SQL表结构**: ```sql CREATE TABLE asl_schema.fulltext_screening_tasks ( id TEXT PRIMARY KEY, project_id TEXT NOT NULL, -- 任务配置 model_a TEXT NOT NULL, model_b TEXT NOT NULL, prompt_version TEXT NOT NULL DEFAULT 'v1.0.0', -- 任务状态 status TEXT NOT NULL DEFAULT 'pending', -- 进度统计 total_count INTEGER NOT NULL, processed_count INTEGER NOT NULL DEFAULT 0, success_count INTEGER NOT NULL DEFAULT 0, failed_count INTEGER NOT NULL DEFAULT 0, degraded_count INTEGER NOT NULL DEFAULT 0, -- 成本统计 total_tokens INTEGER NOT NULL DEFAULT 0, total_cost DOUBLE PRECISION NOT NULL DEFAULT 0, -- 时间信息 started_at TIMESTAMP(3), completed_at TIMESTAMP(3), estimated_end_at TIMESTAMP(3), -- 错误信息 error_message TEXT, error_stack TEXT, created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, CONSTRAINT fk_project_fulltext_task FOREIGN KEY (project_id) REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE ); CREATE INDEX idx_fulltext_screening_tasks_project_id ON asl_schema.fulltext_screening_tasks(project_id); CREATE INDEX idx_fulltext_screening_tasks_status ON asl_schema.fulltext_screening_tasks(status); CREATE INDEX idx_fulltext_screening_tasks_created_at ON asl_schema.fulltext_screening_tasks(created_at); ``` **字段说明**: | 字段 | 类型 | 说明 | |------|------|------| | `modelA / modelB` | String | 双模型名称(deepseek-v3 + qwen-max) | | `degradedCount` | Int | 单模型成功的任务数(容错机制) | | `totalTokens` | Int | 累计Token使用量 | | `totalCost` | Float | 累计成本(元) | | `promptVersion` | String | Prompt版本(可追溯) | --- ### 6. 全文复筛结果表 (fulltext_screening_results) ⭐ v3.0新增 **Prisma模型名**: `AslFulltextScreeningResult` **表名**: `asl_schema.fulltext_screening_results` **设计目标**:存储12字段详细评估结果,支持双模型对比、验证结果、冲突检测 **设计亮点**: - ✅ 完整的双模型结果(fields + overall + logs) - ✅ 医学逻辑验证和证据链验证结果 - ✅ 冲突检测和复核优先级 - ✅ 降级模式支持(单模型成功) - ✅ JSON存储12字段评估(符合云原生规范) ```prisma model AslFulltextScreeningResult { id String @id @default(uuid()) taskId String @map("task_id") task AslFulltextScreeningTask @relation(fields: [taskId], references: [id], onDelete: Cascade) projectId String @map("project_id") project AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade) literatureId String @map("literature_id") literature AslLiterature @relation(fields: [literatureId], references: [id], onDelete: Cascade) // ====== 模型A结果(DeepSeek-V3)====== modelAName String @map("model_a_name") modelAStatus String @map("model_a_status") // "success" | "failed" modelAFields Json @map("model_a_fields") // 12字段评估 { field1: {...}, field2: {...}, ... } modelAOverall Json @map("model_a_overall") // 总体评估 { decision, confidence, keyIssues } modelAProcessingLog Json? @map("model_a_processing_log") modelAVerification Json? @map("model_a_verification") modelATokens Int? @map("model_a_tokens") modelACost Float? @map("model_a_cost") modelAError String? @map("model_a_error") @db.Text // ====== 模型B结果(Qwen-Max)====== modelBName String @map("model_b_name") modelBStatus String @map("model_b_status") modelBFields Json @map("model_b_fields") modelBOverall Json @map("model_b_overall") modelBProcessingLog Json? @map("model_b_processing_log") modelBVerification Json? @map("model_b_verification") modelBTokens Int? @map("model_b_tokens") modelBCost Float? @map("model_b_cost") modelBError String? @map("model_b_error") @db.Text // ====== 验证结果 ====== medicalLogicIssues Json? @map("medical_logic_issues") // MedicalLogicValidator输出 evidenceChainIssues Json? @map("evidence_chain_issues") // EvidenceChainValidator输出 // ====== 冲突检测 ====== isConflict Boolean @default(false) @map("is_conflict") conflictSeverity String? @map("conflict_severity") // "high" | "medium" | "low" conflictFields String[] @map("conflict_fields") // ["field1", "field9", "overall"] conflictDetails Json? @map("conflict_details") reviewPriority Int? @map("review_priority") // 0-100复核优先级 reviewDeadline DateTime? @map("review_deadline") // ====== 最终决策 ====== finalDecision String? @map("final_decision") // "include" | "exclude" | null finalDecisionBy String? @map("final_decision_by") finalDecisionAt DateTime? @map("final_decision_at") exclusionReason String? @map("exclusion_reason") @db.Text reviewNotes String? @map("review_notes") @db.Text // ====== 处理状态 ====== processingStatus String @default("pending") @map("processing_status") // "pending" | "processing" | "completed" | "failed" | "degraded" isDegraded Boolean @default(false) @map("is_degraded") degradedModel String? @map("degraded_model") // "modelA" | "modelB" processedAt DateTime? @map("processed_at") // ====== 可追溯信息 ====== promptVersion String @default("v1.0.0") @map("prompt_version") rawOutputA Json? @map("raw_output_a") rawOutputB Json? @map("raw_output_b") createdAt DateTime @default(now()) @map("created_at") updatedAt DateTime @updatedAt @map("updated_at") @@map("fulltext_screening_results") @@schema("asl_schema") @@index([taskId]) @@index([projectId]) @@index([literatureId]) @@index([isConflict]) @@index([finalDecision]) @@index([reviewPriority]) @@unique([projectId, literatureId]) // 一篇文献只有一个全文复筛结果 } ``` **SQL表结构**(简化版,实际包含所有字段): ```sql CREATE TABLE asl_schema.fulltext_screening_results ( id TEXT PRIMARY KEY, task_id TEXT NOT NULL, project_id TEXT NOT NULL, literature_id TEXT NOT NULL, -- 模型A结果 model_a_name TEXT NOT NULL, model_a_status TEXT NOT NULL, model_a_fields JSONB NOT NULL, model_a_overall JSONB NOT NULL, model_a_processing_log JSONB, model_a_verification JSONB, model_a_tokens INTEGER, model_a_cost DOUBLE PRECISION, model_a_error TEXT, -- 模型B结果(同上) model_b_name TEXT NOT NULL, model_b_status TEXT NOT NULL, model_b_fields JSONB NOT NULL, model_b_overall JSONB NOT NULL, model_b_processing_log JSONB, model_b_verification JSONB, model_b_tokens INTEGER, model_b_cost DOUBLE PRECISION, model_b_error TEXT, -- 验证结果 medical_logic_issues JSONB, evidence_chain_issues JSONB, -- 冲突检测 is_conflict BOOLEAN NOT NULL DEFAULT false, conflict_severity TEXT, conflict_fields TEXT[], conflict_details JSONB, review_priority INTEGER, review_deadline TIMESTAMP(3), -- 最终决策 final_decision TEXT, final_decision_by TEXT, final_decision_at TIMESTAMP(3), exclusion_reason TEXT, review_notes TEXT, -- 处理状态 processing_status TEXT NOT NULL DEFAULT 'pending', is_degraded BOOLEAN NOT NULL DEFAULT false, degraded_model TEXT, processed_at TIMESTAMP(3), -- 可追溯信息 prompt_version TEXT NOT NULL DEFAULT 'v1.0.0', raw_output_a JSONB, raw_output_b JSONB, created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, CONSTRAINT fk_task FOREIGN KEY (task_id) REFERENCES asl_schema.fulltext_screening_tasks(id) ON DELETE CASCADE, CONSTRAINT fk_project_fulltext_result FOREIGN KEY (project_id) REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE, CONSTRAINT fk_literature_fulltext FOREIGN KEY (literature_id) REFERENCES asl_schema.literatures(id) ON DELETE CASCADE, CONSTRAINT unique_project_literature_fulltext UNIQUE (project_id, literature_id) ); CREATE INDEX idx_fulltext_screening_results_task_id ON asl_schema.fulltext_screening_results(task_id); CREATE INDEX idx_fulltext_screening_results_project_id ON asl_schema.fulltext_screening_results(project_id); CREATE INDEX idx_fulltext_screening_results_literature_id ON asl_schema.fulltext_screening_results(literature_id); CREATE INDEX idx_fulltext_screening_results_is_conflict ON asl_schema.fulltext_screening_results(is_conflict); CREATE INDEX idx_fulltext_screening_results_final_decision ON asl_schema.fulltext_screening_results(final_decision); CREATE INDEX idx_fulltext_screening_results_review_priority ON asl_schema.fulltext_screening_results(review_priority); ``` **JSON字段示例**: **modelAFields (12字段评估)**: ```json { "field1": { "present": true, "completeness": "完整", "extractable": true, "quote": "第一作者:Zhang et al., 发表于 JAMA 2023...", "location": "Title page, Methods section", "note": "文献来源信息完整" }, "field2": { ... }, // ... field3-field12 } ``` **modelAOverall (总体评估)**: ```json { "decision": "include", "confidence": 0.92, "keyIssues": [ "随机化方法描述完整", "盲法实施清晰", "结局指标可提取" ] } ``` **medicalLogicIssues (医学逻辑验证)**: ```json { "hasIssues": false, "issues": [] } ``` **conflictDetails (冲突详情)**: ```json { "field9": { "modelA": "完整", "modelB": "不完整", "severity": "high" } } ``` --- ## 📊 数据关系图(v3.0更新) ``` literature_screening_projects (1) ──< (N) literature_items literature_screening_projects (1) ──< (N) title_abstract_screening_results literature_items (1) ──< (1) title_abstract_screening_results literature_screening_projects (1) ──< (N) screening_tasks ``` --- ## 🔍 索引设计汇总(v3.0更新) | 表名 | 索引字段 | 索引类型 | 说明 | |------|---------|---------|------| | screening_projects | user_id | B-tree | 用户项目查询 | | screening_projects | status | B-tree | 状态筛选 | | literatures | project_id | B-tree | 项目文献查询 | | literatures | doi | B-tree | DOI查重 | | literatures | stage ⭐ | B-tree | 文献阶段查询 v3.0 | | literatures | has_pdf ⭐ | B-tree | PDF获取状态 v3.0 | | literatures | pdf_status ⭐ | B-tree | PDF上传状态 v3.0 | | literatures | (project_id, pmid) | Unique | 防止重复导入 | | screening_results | project_id | B-tree | 项目结果查询 | | screening_results | literature_id | B-tree | 文献结果查询 | | screening_results | conflict_status | B-tree | 冲突筛选 | | screening_results | final_decision | B-tree | 决策筛选 | | screening_results | (project_id, literature_id) | Unique | 唯一性约束 | | screening_tasks | project_id | B-tree | 项目任务查询 | | screening_tasks | status | B-tree | 任务状态筛选 | | fulltext_screening_tasks ⭐ | project_id | B-tree | 全文任务查询 v3.0 | | fulltext_screening_tasks ⭐ | status | B-tree | 任务状态筛选 v3.0 | | fulltext_screening_tasks ⭐ | created_at | B-tree | 时间排序 v3.0 | | fulltext_screening_results ⭐ | task_id | B-tree | 任务结果查询 v3.0 | | fulltext_screening_results ⭐ | project_id | B-tree | 项目结果查询 v3.0 | | fulltext_screening_results ⭐ | literature_id | B-tree | 文献结果查询 v3.0 | | fulltext_screening_results ⭐ | is_conflict | B-tree | 冲突筛选 v3.0 | | fulltext_screening_results ⭐ | final_decision | B-tree | 决策筛选 v3.0 | | fulltext_screening_results ⭐ | review_priority | B-tree | 复核优先级 v3.0 | | fulltext_screening_results ⭐ | (project_id, literature_id) | Unique | 唯一性约束 v3.0 | **索引总数**: 25个(v3.0新增13个) **唯一约束**: 4个(v3.0新增1个) **v3.0索引优化说明**: - ✅ `literatures.stage`: 快速查询特定阶段的文献(如"pdf_acquired"待全文复筛) - ✅ `fulltext_screening_results.review_priority`: 优化人工复核队列排序 - ✅ `fulltext_screening_tasks.created_at`: 任务历史查询优化 --- ## 💾 数据字典 ### PICO标准 (picoCriteria JSON) ```json { "population": "研究人群,如:2型糖尿病成人患者", "intervention": "干预措施,如:SGLT2抑制剂", "comparison": "对照,如:安慰剂或常规疗法", "outcome": "结局指标,如:心血管结局", "studyDesign": "研究设计,如:随机对照试验 (RCT)" } ``` ### 筛选配置 (screeningConfig JSON) ```json { "models": ["deepseek-chat", "qwen-max"], "temperature": 0, "maxRetries": 3 } ``` ### 冲突字段 (conflictFields JSON) ```json ["P", "I", "C", "S", "conclusion"] ``` ### 原始输出 (rawOutput JSON) ```json { "deepseek": { "判断": {...}, "证据": {...} }, "qwen": { "判断": {...}, "证据": {...} } } ``` --- ## 🔒 数据安全 ### Schema隔离 - 使用 `asl_schema` 与其他模块数据隔离 - 用户表在 `platform_schema`,统一管理 ### 级联删除 - 删除用户 → 自动删除所有筛选项目及关联数据 - 删除项目 → 自动删除文献、结果、任务 - 删除文献 → 自动删除筛选结果 ### 唯一性约束 - 同一项目中PMID唯一(允许无PMID) - 同一项目中一篇文献只有一个筛选结果 --- ## 📈 数据量预估 | 项目规模 | 文献数 | 筛选结果 | 存储空间 | |---------|--------|---------|----------| | 小型 | 100-500 | 100-500 | < 10 MB | | 中型 | 500-2000 | 500-2000 | 10-50 MB | | 大型 | 2000-5000 | 2000-5000 | 50-200 MB | | 超大型 | 5000+ | 5000+ | 200 MB+ | **单条记录大小估算**: - 文献条目:~2-5 KB - 筛选结果:~5-10 KB(含双模型判断和证据) --- ## ⏳ 后续规划 ### Phase 2 (全文复筛) ✅ v3.0已完成 - [x] 扩展 `literatures` 表(生命周期管理) - [x] 添加 `fulltext_screening_tasks` 表 - [x] 添加 `fulltext_screening_results` 表(12字段) ### Phase 3 (数据提取) 待开发 - [ ] 复用 `fulltext_screening_tasks` 表(切换模式) - [ ] 复用 `fulltext_screening_results` 表(存储提取数据) - [ ] 或新增 `data_extraction_results` 表(如需独立) ### Phase 4 (质量评估) 待规划 - [ ] 质量评估结果表 - [ ] 偏倚风险评估表 - [ ] GRADE证据质量表 --- ## 📝 v3.0 设计决策记录 ### 决策1: 全文内容存储引用而非直接存储 ✅ **问题**:全文内容是否存储在数据库? **方案对比**: | 方案 | 优点 | 缺点 | |------|------|------| | 存TEXT | LLM调用快 | 违背云原生规范,数据库臃肿 | | 存引用 | 符合规范,轻量 | LLM调用增加100-200ms | **决策**:✅ 采用方案2(存引用) - 符合云原生存储与计算分离原则 - 支持超大文献(>1MB) - RDS存储成本是OSS的5-10倍 ### 决策2: 12字段使用JSON存储 ✅ **问题**:12字段是拆分为列还是JSON存储? **决策**:✅ 使用PostgreSQL JSONB - 不需要单独查询某个字段内部 - 字段结构复杂(6个子字段) - JSONB性能优秀且支持GIN索引 ### 决策3: 独立全文复筛结果表 ✅ **问题**:是否复用 `screening_results` 表? **决策**:✅ 新增独立表 `fulltext_screening_results` - 数据结构完全不同(PICOS vs 12字段) - 避免字段冗余和逻辑耦合 - 便于独立维护和优化 --- **文档版本:** v3.0 **最后更新:** 2025-11-22(Day 4:全文复筛数据库设计) **维护者:** AI智能文献开发团队 **版本历史**: - v3.0 (2025-11-22): 全文复筛数据库设计,新增3个表和相关字段 - v2.2 (2025-11-21): Week 4统计功能完成 - v2.0 (2025-11-18): 标题初筛数据库设计 - v1.0 (2025-10-29): 初始版本