# AI智能文献模块 - 数据库设计 > **文档版本:** v2.2 > **创建日期:** 2025-10-29 > **维护者:** AI智能文献开发团队 > **最后更新:** 2025-11-21(Week 4完成) > **更新说明:** Week 4统计功能完成,混合方案实现,排除原因字段说明 --- ## 📋 文档说明 本文档描述AI智能文献模块的数据库设计,包括数据表结构、关系设计、索引设计等。 **技术栈**: - 数据库:PostgreSQL 16+ - ORM:Prisma - Schema隔离:`asl_schema` - 关联用户表:`platform_schema.users` --- ## 🏗️ Schema架构 ASL模块使用独立的 `asl_schema` 进行数据隔离,确保模块独立性和数据安全。 ``` platform_schema └── users (用户表) ↓ asl_schema ├── screening_projects (筛选项目) ├── literatures (文献条目) ├── screening_results (筛选结果) └── screening_tasks (筛选任务) ``` --- ## 🗄️ 核心数据表 ### 1. 筛选项目表 (screening_projects) **Prisma模型名**: `AslScreeningProject` **表名**: `asl_schema.screening_projects` ```prisma model AslScreeningProject { id String @id @default(uuid()) userId String @map("user_id") user User @relation("AslProjects", fields: [userId], references: [id], onDelete: Cascade) projectName String @map("project_name") // PICO标准 picoCriteria Json @map("pico_criteria") // ⚠️ 格式兼容性说明: // 前端使用: { P, I, C, O, S } // 后端兼容: { P, I, C, O, S } 或 { population, intervention, comparison, outcome, studyDesign } // screeningService.ts 中有字段映射逻辑 // 筛选标准 inclusionCriteria String @map("inclusion_criteria") @db.Text exclusionCriteria String @map("exclusion_criteria") @db.Text // 状态 status String @default("draft") // 可选值: draft, screening, completed // 筛选配置 screeningConfig Json? @map("screening_config") // 结构: { models: ["DeepSeek-V3", "Qwen-Max"], style: "standard" } // ⚠️ 模型名称映射: // 前端展示名: DeepSeek-V3 → API名: deepseek-chat // 前端展示名: Qwen-Max → API名: qwen-max // screeningService.ts 中有模型名映射逻辑 // 关联 literatures AslLiterature[] screeningTasks AslScreeningTask[] screeningResults AslScreeningResult[] createdAt DateTime @default(now()) @map("created_at") updatedAt DateTime @updatedAt @map("updated_at") @@map("screening_projects") @@schema("asl_schema") @@index([userId]) @@index([status]) } ``` **SQL表结构**: ```sql CREATE TABLE asl_schema.screening_projects ( id TEXT PRIMARY KEY, user_id TEXT NOT NULL, project_name TEXT NOT NULL, pico_criteria JSONB NOT NULL, inclusion_criteria TEXT NOT NULL, exclusion_criteria TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'draft', screening_config JSONB, created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES platform_schema.users(id) ON DELETE CASCADE ); CREATE INDEX idx_screening_projects_user_id ON asl_schema.screening_projects(user_id); CREATE INDEX idx_screening_projects_status ON asl_schema.screening_projects(status); ``` --- ### 2. 文献条目表 (literatures) **Prisma模型名**: `AslLiterature` **表名**: `asl_schema.literatures` ```prisma model AslLiterature { id String @id @default(uuid()) projectId String @map("project_id") project AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade) // 文献基本信息 pmid String? title String @db.Text abstract String @db.Text authors String? journal String? publicationYear Int? @map("publication_year") doi String? // 云原生存储字段(V1.0 阶段使用,MVP阶段预留) pdfUrl String? @map("pdf_url") // PDF访问URL pdfOssKey String? @map("pdf_oss_key") // OSS存储Key(用于删除) pdfFileSize Int? @map("pdf_file_size") // 文件大小(字节) // 关联 screeningResults AslScreeningResult[] createdAt DateTime @default(now()) @map("created_at") updatedAt DateTime @updatedAt @map("updated_at") @@map("literatures") @@schema("asl_schema") @@index([projectId]) @@index([doi]) @@unique([projectId, pmid]) // 同一项目中PMID唯一 } ``` **SQL表结构**: ```sql CREATE TABLE asl_schema.literatures ( id TEXT PRIMARY KEY, project_id TEXT NOT NULL, pmid TEXT, title TEXT NOT NULL, abstract TEXT NOT NULL, authors TEXT, journal TEXT, publication_year INTEGER, doi TEXT, pdf_url TEXT, pdf_oss_key TEXT, pdf_file_size INTEGER, created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, CONSTRAINT fk_project FOREIGN KEY (project_id) REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE, CONSTRAINT unique_project_pmid UNIQUE (project_id, pmid) ); CREATE INDEX idx_literatures_project_id ON asl_schema.literatures(project_id); CREATE INDEX idx_literatures_doi ON asl_schema.literatures(doi); ``` --- ### 3. 筛选结果表 (screening_results) **Prisma模型名**: `AslScreeningResult` **表名**: `asl_schema.screening_results` **设计亮点**:支持双模型(DeepSeek + Qwen)并行验证,包含完整的判断、证据和冲突检测。 ```prisma model AslScreeningResult { id String @id @default(uuid()) projectId String @map("project_id") project AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade) literatureId String @map("literature_id") literature AslLiterature @relation(fields: [literatureId], references: [id], onDelete: Cascade) // DeepSeek模型判断 dsModelName String @map("ds_model_name") // "deepseek-chat" dsPJudgment String? @map("ds_p_judgment") // "match" | "partial" | "mismatch" dsIJudgment String? @map("ds_i_judgment") dsCJudgment String? @map("ds_c_judgment") dsSJudgment String? @map("ds_s_judgment") dsConclusion String? @map("ds_conclusion") // "include" | "exclude" | "uncertain" dsConfidence Float? @map("ds_confidence") // 0-1 // DeepSeek模型证据 dsPEvidence String? @map("ds_p_evidence") @db.Text dsIEvidence String? @map("ds_i_evidence") @db.Text dsCEvidence String? @map("ds_c_evidence") @db.Text dsSEvidence String? @map("ds_s_evidence") @db.Text dsReason String? @map("ds_reason") @db.Text // Qwen模型判断 qwenModelName String @map("qwen_model_name") // "qwen-max" qwenPJudgment String? @map("qwen_p_judgment") qwenIJudgment String? @map("qwen_i_judgment") qwenCJudgment String? @map("qwen_c_judgment") qwenSJudgment String? @map("qwen_s_judgment") qwenConclusion String? @map("qwen_conclusion") qwenConfidence Float? @map("qwen_confidence") // Qwen模型证据 qwenPEvidence String? @map("qwen_p_evidence") @db.Text qwenIEvidence String? @map("qwen_i_evidence") @db.Text qwenCEvidence String? @map("qwen_c_evidence") @db.Text qwenSEvidence String? @map("qwen_s_evidence") @db.Text qwenReason String? @map("qwen_reason") @db.Text // 冲突状态 conflictStatus String @default("none") @map("conflict_status") // 可选值: none, conflict, resolved conflictFields Json? @map("conflict_fields") // 示例: ["P", "I", "conclusion"] // 最终决策(Week 4 混合方案使用) finalDecision String? @map("final_decision") // "include" | "exclude" | null // ⭐ Week 4 说明:人工复核后设置此字段,作为最终决策 // - include: 人工决定纳入(可能推翻AI建议) // - exclude: 人工决定排除(可能推翻AI建议) // - null: 未复核,使用AI决策 finalDecisionBy String? @map("final_decision_by") // userId finalDecisionAt DateTime? @map("final_decision_at") exclusionReason String? @map("exclusion_reason") @db.Text // ⭐ Week 4 说明:人工填写的排除原因(优先级高于AI提取) // - 如果finalDecision=exclude,此字段存储人工填写的原因 // - 如果为null,前端自动从AI判断中提取(dsPJudgment/dsIJudgment等) // - Week 4 初筛结果页使用此字段显示排除原因 // AI处理状态 aiProcessingStatus String @default("pending") @map("ai_processing_status") // 可选值: pending, processing, completed, failed aiProcessedAt DateTime? @map("ai_processed_at") aiErrorMessage String? @map("ai_error_message") @db.Text // 可追溯信息 promptVersion String @default("v1.0.0") @map("prompt_version") rawOutput Json? @map("raw_output") // 原始LLM输出(备份) createdAt DateTime @default(now()) @map("created_at") updatedAt DateTime @updatedAt @map("updated_at") @@map("screening_results") @@schema("asl_schema") @@index([projectId]) @@index([literatureId]) @@index([conflictStatus]) @@index([finalDecision]) @@unique([projectId, literatureId]) // 一篇文献在一个项目中只有一个筛选结果 } ``` **SQL表结构**(简化版): ```sql CREATE TABLE asl_schema.screening_results ( id TEXT PRIMARY KEY, project_id TEXT NOT NULL, literature_id TEXT NOT NULL, -- DeepSeek判断 ds_model_name TEXT NOT NULL, ds_p_judgment TEXT, ds_i_judgment TEXT, ds_c_judgment TEXT, ds_s_judgment TEXT, ds_conclusion TEXT, ds_confidence DOUBLE PRECISION, ds_p_evidence TEXT, ds_i_evidence TEXT, ds_c_evidence TEXT, ds_s_evidence TEXT, ds_reason TEXT, -- Qwen判断 qwen_model_name TEXT NOT NULL, qwen_p_judgment TEXT, qwen_i_judgment TEXT, qwen_c_judgment TEXT, qwen_s_judgment TEXT, qwen_conclusion TEXT, qwen_confidence DOUBLE PRECISION, qwen_p_evidence TEXT, qwen_i_evidence TEXT, qwen_c_evidence TEXT, qwen_s_evidence TEXT, qwen_reason TEXT, -- 冲突状态 conflict_status TEXT NOT NULL DEFAULT 'none', conflict_fields JSONB, -- 最终决策 final_decision TEXT, final_decision_by TEXT, final_decision_at TIMESTAMP(3), exclusion_reason TEXT, -- AI处理状态 ai_processing_status TEXT NOT NULL DEFAULT 'pending', ai_processed_at TIMESTAMP(3), ai_error_message TEXT, -- 可追溯信息 prompt_version TEXT NOT NULL DEFAULT 'v1.0.0', raw_output JSONB, created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, CONSTRAINT fk_project_result FOREIGN KEY (project_id) REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE, CONSTRAINT fk_literature FOREIGN KEY (literature_id) REFERENCES asl_schema.literatures(id) ON DELETE CASCADE, CONSTRAINT unique_project_literature UNIQUE (project_id, literature_id) ); CREATE INDEX idx_screening_results_project_id ON asl_schema.screening_results(project_id); CREATE INDEX idx_screening_results_literature_id ON asl_schema.screening_results(literature_id); CREATE INDEX idx_screening_results_conflict_status ON asl_schema.screening_results(conflict_status); CREATE INDEX idx_screening_results_final_decision ON asl_schema.screening_results(final_decision); ``` --- ### 4. 筛选任务表 (screening_tasks) **Prisma模型名**: `AslScreeningTask` **表名**: `asl_schema.screening_tasks` ```prisma model AslScreeningTask { id String @id @default(uuid()) projectId String @map("project_id") project AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade) taskType String @map("task_type") // "title_abstract" | "full_text" status String @default("pending") // 可选值: pending, running, completed, failed // 进度统计 totalItems Int @map("total_items") processedItems Int @default(0) @map("processed_items") successItems Int @default(0) @map("success_items") failedItems Int @default(0) @map("failed_items") conflictItems Int @default(0) @map("conflict_items") // 时间信息 startedAt DateTime? @map("started_at") completedAt DateTime? @map("completed_at") estimatedEndAt DateTime? @map("estimated_end_at") // 错误信息 errorMessage String? @map("error_message") @db.Text createdAt DateTime @default(now()) @map("created_at") updatedAt DateTime @updatedAt @map("updated_at") @@map("screening_tasks") @@schema("asl_schema") @@index([projectId]) @@index([status]) } ``` **SQL表结构**: ```sql CREATE TABLE asl_schema.screening_tasks ( id TEXT PRIMARY KEY, project_id TEXT NOT NULL, task_type TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'pending', total_items INTEGER NOT NULL, processed_items INTEGER NOT NULL DEFAULT 0, success_items INTEGER NOT NULL DEFAULT 0, failed_items INTEGER NOT NULL DEFAULT 0, conflict_items INTEGER NOT NULL DEFAULT 0, started_at TIMESTAMP(3), completed_at TIMESTAMP(3), estimated_end_at TIMESTAMP(3), error_message TEXT, created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, CONSTRAINT fk_project_task FOREIGN KEY (project_id) REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE ); CREATE INDEX idx_screening_tasks_project_id ON asl_schema.screening_tasks(project_id); CREATE INDEX idx_screening_tasks_status ON asl_schema.screening_tasks(status); ``` --- ## 📊 数据关系图 ``` platform_schema.users (1) ↓ asl_schema.screening_projects (N) ├─→ literatures (N) │ └─→ screening_results (1) ├─→ screening_results (N) └─→ screening_tasks (N) ``` **关系说明**: - 一个用户可以有多个筛选项目(1:N) - 一个项目可以有多个文献(1:N) - 一篇文献对应一个筛选结果(1:1) - 一个项目可以有多个筛选任务(1:N) - 使用级联删除保证数据一致性 --- ## 🔍 索引设计汇总 | 表名 | 索引字段 | 索引类型 | 说明 | |------|---------|---------|------| | screening_projects | user_id | B-tree | 用户项目查询 | | screening_projects | status | B-tree | 状态筛选 | | literatures | project_id | B-tree | 项目文献查询 | | literatures | doi | B-tree | DOI查重 | | literatures | (project_id, pmid) | Unique | 防止重复导入 | | screening_results | project_id | B-tree | 项目结果查询 | | screening_results | literature_id | B-tree | 文献结果查询 | | screening_results | conflict_status | B-tree | 冲突筛选 | | screening_results | final_decision | B-tree | 决策筛选 | | screening_results | (project_id, literature_id) | Unique | 唯一性约束 | | screening_tasks | project_id | B-tree | 项目任务查询 | | screening_tasks | status | B-tree | 任务状态筛选 | **索引总数**: 12个 **唯一约束**: 3个 --- ## 💾 数据字典 ### PICO标准 (picoCriteria JSON) ```json { "population": "研究人群,如:2型糖尿病成人患者", "intervention": "干预措施,如:SGLT2抑制剂", "comparison": "对照,如:安慰剂或常规疗法", "outcome": "结局指标,如:心血管结局", "studyDesign": "研究设计,如:随机对照试验 (RCT)" } ``` ### 筛选配置 (screeningConfig JSON) ```json { "models": ["deepseek-chat", "qwen-max"], "temperature": 0, "maxRetries": 3 } ``` ### 冲突字段 (conflictFields JSON) ```json ["P", "I", "C", "S", "conclusion"] ``` ### 原始输出 (rawOutput JSON) ```json { "deepseek": { "判断": {...}, "证据": {...} }, "qwen": { "判断": {...}, "证据": {...} } } ``` --- ## 🔒 数据安全 ### Schema隔离 - 使用 `asl_schema` 与其他模块数据隔离 - 用户表在 `platform_schema`,统一管理 ### 级联删除 - 删除用户 → 自动删除所有筛选项目及关联数据 - 删除项目 → 自动删除文献、结果、任务 - 删除文献 → 自动删除筛选结果 ### 唯一性约束 - 同一项目中PMID唯一(允许无PMID) - 同一项目中一篇文献只有一个筛选结果 --- ## 📈 数据量预估 | 项目规模 | 文献数 | 筛选结果 | 存储空间 | |---------|--------|---------|----------| | 小型 | 100-500 | 100-500 | < 10 MB | | 中型 | 500-2000 | 500-2000 | 10-50 MB | | 大型 | 2000-5000 | 2000-5000 | 50-200 MB | | 超大型 | 5000+ | 5000+ | 200 MB+ | **单条记录大小估算**: - 文献条目:~2-5 KB - 筛选结果:~5-10 KB(含双模型判断和证据) --- ## ⏳ 后续规划 ### Phase 2 (全文复筛) - [ ] 添加全文复筛结果表 - [ ] PDF文件元数据表 - [ ] 全文解析结果表 ### Phase 3 (数据提取) - [ ] 数据提取模板表 - [ ] 提取结果表 - [ ] 质量评估表 --- **文档版本:** v2.0 **最后更新:** 2025-11-18 **维护者:** AI智能文献开发团队