feat(asl): Complete Day 5 - Fulltext Screening Backend API Development
- Implement 5 core API endpoints (create task, get progress, get results, update decision, export Excel) - Add FulltextScreeningController with Zod validation (652 lines) - Implement ExcelExporter service with 4-sheet report generation (352 lines) - Register routes under /api/v1/asl/fulltext-screening - Create 31 REST Client test cases - Add automated integration test script - Fix PDF extraction fallback mechanism in LLM12FieldsService - Update API design documentation to v3.0 - Update development plan to v1.2 - Create Day 5 development record - Clean up temporary test files
This commit is contained in:
@@ -1,10 +1,10 @@
|
||||
# AI智能文献模块 - 数据库设计
|
||||
|
||||
> **文档版本:** v2.2
|
||||
> **文档版本:** v3.0
|
||||
> **创建日期:** 2025-10-29
|
||||
> **维护者:** AI智能文献开发团队
|
||||
> **最后更新:** 2025-11-21(Week 4完成)
|
||||
> **更新说明:** Week 4统计功能完成,混合方案实现,排除原因字段说明
|
||||
> **最后更新:** 2025-11-22(Day 4:全文复筛数据库设计)
|
||||
> **更新说明:** 新增全文复筛相关表(`AslLiterature`扩展、`AslFulltextScreeningTask`、`AslFulltextScreeningResult`)
|
||||
|
||||
---
|
||||
|
||||
@@ -31,10 +31,18 @@ platform_schema
|
||||
asl_schema
|
||||
├── screening_projects (筛选项目)
|
||||
├── literatures (文献条目)
|
||||
├── screening_results (筛选结果)
|
||||
└── screening_tasks (筛选任务)
|
||||
├── screening_results (标题初筛结果)
|
||||
├── screening_tasks (标题初筛任务)
|
||||
├── fulltext_screening_tasks (全文复筛任务) ⭐ Day 4新增
|
||||
└── fulltext_screening_results (全文复筛结果) ⭐ Day 4新增
|
||||
```
|
||||
|
||||
**v3.0 更新说明(2025-11-22)**:
|
||||
- ✅ 扩展 `literatures` 表:支持全文生命周期管理、PDF存储、全文内容引用
|
||||
- ✅ 新增 `fulltext_screening_tasks` 表:管理全文复筛批处理任务
|
||||
- ✅ 新增 `fulltext_screening_results` 表:存储12字段评估结果
|
||||
- ✅ 符合云原生规范:全文内容存储引用而非直接存储
|
||||
|
||||
---
|
||||
|
||||
## 🗄️ 核心数据表
|
||||
@@ -113,11 +121,17 @@ CREATE INDEX idx_screening_projects_status ON asl_schema.screening_projects(stat
|
||||
|
||||
---
|
||||
|
||||
### 2. 文献条目表 (literatures)
|
||||
### 2. 文献条目表 (literatures) ⭐ v3.0更新
|
||||
|
||||
**Prisma模型名**: `AslLiterature`
|
||||
**表名**: `asl_schema.literatures`
|
||||
|
||||
**v3.0 更新说明**:
|
||||
- ✅ 新增 `stage` 字段:追踪文献生命周期(imported → title_screened → pdf_acquired → fulltext_screened → data_extracted)
|
||||
- ✅ 新增 PDF存储字段:支持Dify/OSS双适配(`pdfStorageType`, `pdfStorageRef`, `pdfStatus`)
|
||||
- ✅ 新增 全文存储字段:**符合云原生规范,存储引用而非内容**(`fullTextStorageRef`, `fullTextUrl`)
|
||||
- ✅ 新增索引:`stage`, `hasPdf`, `pdfStatus` 提升查询性能
|
||||
|
||||
```prisma
|
||||
model AslLiterature {
|
||||
id String @id @default(uuid())
|
||||
@@ -133,13 +147,34 @@ model AslLiterature {
|
||||
publicationYear Int? @map("publication_year")
|
||||
doi String?
|
||||
|
||||
// ⭐ v3.0 新增:文献阶段(生命周期管理)
|
||||
stage String @default("imported") @map("stage")
|
||||
// imported | title_screened | title_included | pdf_acquired | fulltext_screened | data_extracted
|
||||
|
||||
// 云原生存储字段(V1.0 阶段使用,MVP阶段预留)
|
||||
pdfUrl String? @map("pdf_url") // PDF访问URL
|
||||
pdfOssKey String? @map("pdf_oss_key") // OSS存储Key(用于删除)
|
||||
pdfFileSize Int? @map("pdf_file_size") // 文件大小(字节)
|
||||
|
||||
// ⭐ v3.0 新增:PDF存储(Dify/OSS双适配)
|
||||
hasPdf Boolean @default(false) @map("has_pdf")
|
||||
pdfStorageType String? @map("pdf_storage_type") // "dify" | "oss"
|
||||
pdfStorageRef String? @map("pdf_storage_ref") // Dify: document_id, OSS: object_key
|
||||
pdfStatus String? @map("pdf_status") // "uploading" | "ready" | "failed"
|
||||
pdfUploadedAt DateTime? @map("pdf_uploaded_at")
|
||||
|
||||
// ⭐ v3.0 新增:全文内容存储(云原生:存储引用而非内容)
|
||||
fullTextStorageType String? @map("full_text_storage_type") // "dify" | "oss"
|
||||
fullTextStorageRef String? @map("full_text_storage_ref") // document_id 或 object_key
|
||||
fullTextUrl String? @map("full_text_url") // 访问URL
|
||||
fullTextFormat String? @map("full_text_format") // "markdown" | "plaintext"
|
||||
fullTextSource String? @map("full_text_source") // "nougat" | "pymupdf"
|
||||
fullTextTokenCount Int? @map("full_text_token_count")
|
||||
fullTextExtractedAt DateTime? @map("full_text_extracted_at")
|
||||
|
||||
// 关联
|
||||
screeningResults AslScreeningResult[]
|
||||
fulltextScreeningResults AslFulltextScreeningResult[] // ⭐ v3.0 新增
|
||||
|
||||
createdAt DateTime @default(now()) @map("created_at")
|
||||
updatedAt DateTime @updatedAt @map("updated_at")
|
||||
@@ -148,15 +183,20 @@ model AslLiterature {
|
||||
@@schema("asl_schema")
|
||||
@@index([projectId])
|
||||
@@index([doi])
|
||||
@@unique([projectId, pmid]) // 同一项目中PMID唯一
|
||||
@@index([stage]) // ⭐ v3.0 新增
|
||||
@@index([hasPdf]) // ⭐ v3.0 新增
|
||||
@@index([pdfStatus]) // ⭐ v3.0 新增
|
||||
@@unique([projectId, pmid])
|
||||
}
|
||||
```
|
||||
|
||||
**SQL表结构**:
|
||||
**SQL表结构**(v3.0):
|
||||
```sql
|
||||
CREATE TABLE asl_schema.literatures (
|
||||
id TEXT PRIMARY KEY,
|
||||
project_id TEXT NOT NULL,
|
||||
|
||||
-- 文献基本信息
|
||||
pmid TEXT,
|
||||
title TEXT NOT NULL,
|
||||
abstract TEXT NOT NULL,
|
||||
@@ -164,11 +204,34 @@ CREATE TABLE asl_schema.literatures (
|
||||
journal TEXT,
|
||||
publication_year INTEGER,
|
||||
doi TEXT,
|
||||
|
||||
-- 文献阶段
|
||||
stage TEXT NOT NULL DEFAULT 'imported',
|
||||
|
||||
-- PDF存储(旧字段,V1.0预留)
|
||||
pdf_url TEXT,
|
||||
pdf_oss_key TEXT,
|
||||
pdf_file_size INTEGER,
|
||||
|
||||
-- PDF存储(新字段,Dify/OSS双适配)
|
||||
has_pdf BOOLEAN NOT NULL DEFAULT false,
|
||||
pdf_storage_type TEXT,
|
||||
pdf_storage_ref TEXT,
|
||||
pdf_status TEXT,
|
||||
pdf_uploaded_at TIMESTAMP(3),
|
||||
|
||||
-- 全文内容存储(引用)
|
||||
full_text_storage_type TEXT,
|
||||
full_text_storage_ref TEXT,
|
||||
full_text_url TEXT,
|
||||
full_text_format TEXT,
|
||||
full_text_source TEXT,
|
||||
full_text_token_count INTEGER,
|
||||
full_text_extracted_at TIMESTAMP(3),
|
||||
|
||||
created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
|
||||
CONSTRAINT fk_project FOREIGN KEY (project_id)
|
||||
REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE,
|
||||
CONSTRAINT unique_project_pmid UNIQUE (project_id, pmid)
|
||||
@@ -176,8 +239,28 @@ CREATE TABLE asl_schema.literatures (
|
||||
|
||||
CREATE INDEX idx_literatures_project_id ON asl_schema.literatures(project_id);
|
||||
CREATE INDEX idx_literatures_doi ON asl_schema.literatures(doi);
|
||||
CREATE INDEX idx_literatures_stage ON asl_schema.literatures(stage);
|
||||
CREATE INDEX idx_literatures_has_pdf ON asl_schema.literatures(has_pdf);
|
||||
CREATE INDEX idx_literatures_pdf_status ON asl_schema.literatures(pdf_status);
|
||||
```
|
||||
|
||||
**字段说明**:
|
||||
|
||||
| 字段 | 类型 | 说明 | 设计理由 |
|
||||
|------|------|------|----------|
|
||||
| `stage` | String | 文献阶段 | 追踪文献在整个流程中的位置 |
|
||||
| `pdfStorageType` | String | PDF存储类型 | "dify"\|"oss",支持双适配器 |
|
||||
| `pdfStorageRef` | String | PDF存储引用 | Dify的document_id或OSS的object_key |
|
||||
| `fullTextStorageType` | String | 全文存储类型 | 云原生:不直接存全文,存引用 ✅ |
|
||||
| `fullTextStorageRef` | String | 全文存储引用 | 指向Dify或OSS中的全文文档 ✅ |
|
||||
| `fullTextUrl` | String | 全文访问URL | 直接访问全文的URL |
|
||||
| `fullTextTokenCount` | Int | Token数量 | 用于成本估算和LLM调用优化 |
|
||||
|
||||
**云原生设计亮点** ⭐:
|
||||
- ✅ 全文内容存储在OSS/Dify,数据库只存引用(符合云原生规范)
|
||||
- ✅ 支持Dify → OSS无缝迁移(只需切换storageType)
|
||||
- ✅ 数据库轻量,避免大量TEXT字段
|
||||
|
||||
---
|
||||
|
||||
### 3. 筛选结果表 (screening_results)
|
||||
@@ -412,28 +495,357 @@ CREATE INDEX idx_screening_tasks_status ON asl_schema.screening_tasks(status);
|
||||
|
||||
---
|
||||
|
||||
## 📊 数据关系图
|
||||
### 5. 全文复筛任务表 (fulltext_screening_tasks) ⭐ v3.0新增
|
||||
|
||||
```
|
||||
platform_schema.users (1)
|
||||
↓
|
||||
asl_schema.screening_projects (N)
|
||||
├─→ literatures (N)
|
||||
│ └─→ screening_results (1)
|
||||
├─→ screening_results (N)
|
||||
└─→ screening_tasks (N)
|
||||
**Prisma模型名**: `AslFulltextScreeningTask`
|
||||
**表名**: `asl_schema.fulltext_screening_tasks`
|
||||
|
||||
**设计目标**:管理全文复筛的批处理任务,支持双模型并行调用、成本追踪、降级模式
|
||||
|
||||
```prisma
|
||||
model AslFulltextScreeningTask {
|
||||
id String @id @default(uuid())
|
||||
projectId String @map("project_id")
|
||||
project AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade)
|
||||
|
||||
// 任务配置
|
||||
modelA String @map("model_a") // "deepseek-v3"
|
||||
modelB String @map("model_b") // "qwen-max"
|
||||
promptVersion String @default("v1.0.0") @map("prompt_version")
|
||||
|
||||
// 任务状态
|
||||
status String @default("pending")
|
||||
// "pending" | "running" | "completed" | "failed" | "cancelled"
|
||||
|
||||
// 进度统计
|
||||
totalCount Int @map("total_count")
|
||||
processedCount Int @default(0) @map("processed_count")
|
||||
successCount Int @default(0) @map("success_count")
|
||||
failedCount Int @default(0) @map("failed_count")
|
||||
degradedCount Int @default(0) @map("degraded_count") // 单模型成功
|
||||
|
||||
// 成本统计
|
||||
totalTokens Int @default(0) @map("total_tokens")
|
||||
totalCost Float @default(0) @map("total_cost")
|
||||
|
||||
// 时间信息
|
||||
startedAt DateTime? @map("started_at")
|
||||
completedAt DateTime? @map("completed_at")
|
||||
estimatedEndAt DateTime? @map("estimated_end_at")
|
||||
|
||||
// 错误信息
|
||||
errorMessage String? @map("error_message") @db.Text
|
||||
errorStack String? @map("error_stack") @db.Text
|
||||
|
||||
// 关联
|
||||
results AslFulltextScreeningResult[]
|
||||
|
||||
createdAt DateTime @default(now()) @map("created_at")
|
||||
updatedAt DateTime @updatedAt @map("updated_at")
|
||||
|
||||
@@map("fulltext_screening_tasks")
|
||||
@@schema("asl_schema")
|
||||
@@index([projectId])
|
||||
@@index([status])
|
||||
@@index([createdAt])
|
||||
}
|
||||
```
|
||||
|
||||
**关系说明**:
|
||||
- 一个用户可以有多个筛选项目(1:N)
|
||||
- 一个项目可以有多个文献(1:N)
|
||||
- 一篇文献对应一个筛选结果(1:1)
|
||||
- 一个项目可以有多个筛选任务(1:N)
|
||||
- 使用级联删除保证数据一致性
|
||||
**SQL表结构**:
|
||||
```sql
|
||||
CREATE TABLE asl_schema.fulltext_screening_tasks (
|
||||
id TEXT PRIMARY KEY,
|
||||
project_id TEXT NOT NULL,
|
||||
|
||||
-- 任务配置
|
||||
model_a TEXT NOT NULL,
|
||||
model_b TEXT NOT NULL,
|
||||
prompt_version TEXT NOT NULL DEFAULT 'v1.0.0',
|
||||
|
||||
-- 任务状态
|
||||
status TEXT NOT NULL DEFAULT 'pending',
|
||||
|
||||
-- 进度统计
|
||||
total_count INTEGER NOT NULL,
|
||||
processed_count INTEGER NOT NULL DEFAULT 0,
|
||||
success_count INTEGER NOT NULL DEFAULT 0,
|
||||
failed_count INTEGER NOT NULL DEFAULT 0,
|
||||
degraded_count INTEGER NOT NULL DEFAULT 0,
|
||||
|
||||
-- 成本统计
|
||||
total_tokens INTEGER NOT NULL DEFAULT 0,
|
||||
total_cost DOUBLE PRECISION NOT NULL DEFAULT 0,
|
||||
|
||||
-- 时间信息
|
||||
started_at TIMESTAMP(3),
|
||||
completed_at TIMESTAMP(3),
|
||||
estimated_end_at TIMESTAMP(3),
|
||||
|
||||
-- 错误信息
|
||||
error_message TEXT,
|
||||
error_stack TEXT,
|
||||
|
||||
created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
|
||||
CONSTRAINT fk_project_fulltext_task FOREIGN KEY (project_id)
|
||||
REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_fulltext_screening_tasks_project_id ON asl_schema.fulltext_screening_tasks(project_id);
|
||||
CREATE INDEX idx_fulltext_screening_tasks_status ON asl_schema.fulltext_screening_tasks(status);
|
||||
CREATE INDEX idx_fulltext_screening_tasks_created_at ON asl_schema.fulltext_screening_tasks(created_at);
|
||||
```
|
||||
|
||||
**字段说明**:
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| `modelA / modelB` | String | 双模型名称(deepseek-v3 + qwen-max) |
|
||||
| `degradedCount` | Int | 单模型成功的任务数(容错机制) |
|
||||
| `totalTokens` | Int | 累计Token使用量 |
|
||||
| `totalCost` | Float | 累计成本(元) |
|
||||
| `promptVersion` | String | Prompt版本(可追溯) |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 索引设计汇总
|
||||
### 6. 全文复筛结果表 (fulltext_screening_results) ⭐ v3.0新增
|
||||
|
||||
**Prisma模型名**: `AslFulltextScreeningResult`
|
||||
**表名**: `asl_schema.fulltext_screening_results`
|
||||
|
||||
**设计目标**:存储12字段详细评估结果,支持双模型对比、验证结果、冲突检测
|
||||
|
||||
**设计亮点**:
|
||||
- ✅ 完整的双模型结果(fields + overall + logs)
|
||||
- ✅ 医学逻辑验证和证据链验证结果
|
||||
- ✅ 冲突检测和复核优先级
|
||||
- ✅ 降级模式支持(单模型成功)
|
||||
- ✅ JSON存储12字段评估(符合云原生规范)
|
||||
|
||||
```prisma
|
||||
model AslFulltextScreeningResult {
|
||||
id String @id @default(uuid())
|
||||
taskId String @map("task_id")
|
||||
task AslFulltextScreeningTask @relation(fields: [taskId], references: [id], onDelete: Cascade)
|
||||
projectId String @map("project_id")
|
||||
project AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade)
|
||||
literatureId String @map("literature_id")
|
||||
literature AslLiterature @relation(fields: [literatureId], references: [id], onDelete: Cascade)
|
||||
|
||||
// ====== 模型A结果(DeepSeek-V3)======
|
||||
modelAName String @map("model_a_name")
|
||||
modelAStatus String @map("model_a_status") // "success" | "failed"
|
||||
modelAFields Json @map("model_a_fields") // 12字段评估 { field1: {...}, field2: {...}, ... }
|
||||
modelAOverall Json @map("model_a_overall") // 总体评估 { decision, confidence, keyIssues }
|
||||
modelAProcessingLog Json? @map("model_a_processing_log")
|
||||
modelAVerification Json? @map("model_a_verification")
|
||||
modelATokens Int? @map("model_a_tokens")
|
||||
modelACost Float? @map("model_a_cost")
|
||||
modelAError String? @map("model_a_error") @db.Text
|
||||
|
||||
// ====== 模型B结果(Qwen-Max)======
|
||||
modelBName String @map("model_b_name")
|
||||
modelBStatus String @map("model_b_status")
|
||||
modelBFields Json @map("model_b_fields")
|
||||
modelBOverall Json @map("model_b_overall")
|
||||
modelBProcessingLog Json? @map("model_b_processing_log")
|
||||
modelBVerification Json? @map("model_b_verification")
|
||||
modelBTokens Int? @map("model_b_tokens")
|
||||
modelBCost Float? @map("model_b_cost")
|
||||
modelBError String? @map("model_b_error") @db.Text
|
||||
|
||||
// ====== 验证结果 ======
|
||||
medicalLogicIssues Json? @map("medical_logic_issues") // MedicalLogicValidator输出
|
||||
evidenceChainIssues Json? @map("evidence_chain_issues") // EvidenceChainValidator输出
|
||||
|
||||
// ====== 冲突检测 ======
|
||||
isConflict Boolean @default(false) @map("is_conflict")
|
||||
conflictSeverity String? @map("conflict_severity") // "high" | "medium" | "low"
|
||||
conflictFields String[] @map("conflict_fields") // ["field1", "field9", "overall"]
|
||||
conflictDetails Json? @map("conflict_details")
|
||||
reviewPriority Int? @map("review_priority") // 0-100复核优先级
|
||||
reviewDeadline DateTime? @map("review_deadline")
|
||||
|
||||
// ====== 最终决策 ======
|
||||
finalDecision String? @map("final_decision") // "include" | "exclude" | null
|
||||
finalDecisionBy String? @map("final_decision_by")
|
||||
finalDecisionAt DateTime? @map("final_decision_at")
|
||||
exclusionReason String? @map("exclusion_reason") @db.Text
|
||||
reviewNotes String? @map("review_notes") @db.Text
|
||||
|
||||
// ====== 处理状态 ======
|
||||
processingStatus String @default("pending") @map("processing_status")
|
||||
// "pending" | "processing" | "completed" | "failed" | "degraded"
|
||||
isDegraded Boolean @default(false) @map("is_degraded")
|
||||
degradedModel String? @map("degraded_model") // "modelA" | "modelB"
|
||||
|
||||
processedAt DateTime? @map("processed_at")
|
||||
|
||||
// ====== 可追溯信息 ======
|
||||
promptVersion String @default("v1.0.0") @map("prompt_version")
|
||||
rawOutputA Json? @map("raw_output_a")
|
||||
rawOutputB Json? @map("raw_output_b")
|
||||
|
||||
createdAt DateTime @default(now()) @map("created_at")
|
||||
updatedAt DateTime @updatedAt @map("updated_at")
|
||||
|
||||
@@map("fulltext_screening_results")
|
||||
@@schema("asl_schema")
|
||||
@@index([taskId])
|
||||
@@index([projectId])
|
||||
@@index([literatureId])
|
||||
@@index([isConflict])
|
||||
@@index([finalDecision])
|
||||
@@index([reviewPriority])
|
||||
@@unique([projectId, literatureId]) // 一篇文献只有一个全文复筛结果
|
||||
}
|
||||
```
|
||||
|
||||
**SQL表结构**(简化版,实际包含所有字段):
|
||||
```sql
|
||||
CREATE TABLE asl_schema.fulltext_screening_results (
|
||||
id TEXT PRIMARY KEY,
|
||||
task_id TEXT NOT NULL,
|
||||
project_id TEXT NOT NULL,
|
||||
literature_id TEXT NOT NULL,
|
||||
|
||||
-- 模型A结果
|
||||
model_a_name TEXT NOT NULL,
|
||||
model_a_status TEXT NOT NULL,
|
||||
model_a_fields JSONB NOT NULL,
|
||||
model_a_overall JSONB NOT NULL,
|
||||
model_a_processing_log JSONB,
|
||||
model_a_verification JSONB,
|
||||
model_a_tokens INTEGER,
|
||||
model_a_cost DOUBLE PRECISION,
|
||||
model_a_error TEXT,
|
||||
|
||||
-- 模型B结果(同上)
|
||||
model_b_name TEXT NOT NULL,
|
||||
model_b_status TEXT NOT NULL,
|
||||
model_b_fields JSONB NOT NULL,
|
||||
model_b_overall JSONB NOT NULL,
|
||||
model_b_processing_log JSONB,
|
||||
model_b_verification JSONB,
|
||||
model_b_tokens INTEGER,
|
||||
model_b_cost DOUBLE PRECISION,
|
||||
model_b_error TEXT,
|
||||
|
||||
-- 验证结果
|
||||
medical_logic_issues JSONB,
|
||||
evidence_chain_issues JSONB,
|
||||
|
||||
-- 冲突检测
|
||||
is_conflict BOOLEAN NOT NULL DEFAULT false,
|
||||
conflict_severity TEXT,
|
||||
conflict_fields TEXT[],
|
||||
conflict_details JSONB,
|
||||
review_priority INTEGER,
|
||||
review_deadline TIMESTAMP(3),
|
||||
|
||||
-- 最终决策
|
||||
final_decision TEXT,
|
||||
final_decision_by TEXT,
|
||||
final_decision_at TIMESTAMP(3),
|
||||
exclusion_reason TEXT,
|
||||
review_notes TEXT,
|
||||
|
||||
-- 处理状态
|
||||
processing_status TEXT NOT NULL DEFAULT 'pending',
|
||||
is_degraded BOOLEAN NOT NULL DEFAULT false,
|
||||
degraded_model TEXT,
|
||||
processed_at TIMESTAMP(3),
|
||||
|
||||
-- 可追溯信息
|
||||
prompt_version TEXT NOT NULL DEFAULT 'v1.0.0',
|
||||
raw_output_a JSONB,
|
||||
raw_output_b JSONB,
|
||||
|
||||
created_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
|
||||
CONSTRAINT fk_task FOREIGN KEY (task_id)
|
||||
REFERENCES asl_schema.fulltext_screening_tasks(id) ON DELETE CASCADE,
|
||||
CONSTRAINT fk_project_fulltext_result FOREIGN KEY (project_id)
|
||||
REFERENCES asl_schema.screening_projects(id) ON DELETE CASCADE,
|
||||
CONSTRAINT fk_literature_fulltext FOREIGN KEY (literature_id)
|
||||
REFERENCES asl_schema.literatures(id) ON DELETE CASCADE,
|
||||
CONSTRAINT unique_project_literature_fulltext UNIQUE (project_id, literature_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_fulltext_screening_results_task_id ON asl_schema.fulltext_screening_results(task_id);
|
||||
CREATE INDEX idx_fulltext_screening_results_project_id ON asl_schema.fulltext_screening_results(project_id);
|
||||
CREATE INDEX idx_fulltext_screening_results_literature_id ON asl_schema.fulltext_screening_results(literature_id);
|
||||
CREATE INDEX idx_fulltext_screening_results_is_conflict ON asl_schema.fulltext_screening_results(is_conflict);
|
||||
CREATE INDEX idx_fulltext_screening_results_final_decision ON asl_schema.fulltext_screening_results(final_decision);
|
||||
CREATE INDEX idx_fulltext_screening_results_review_priority ON asl_schema.fulltext_screening_results(review_priority);
|
||||
```
|
||||
|
||||
**JSON字段示例**:
|
||||
|
||||
**modelAFields (12字段评估)**:
|
||||
```json
|
||||
{
|
||||
"field1": {
|
||||
"present": true,
|
||||
"completeness": "完整",
|
||||
"extractable": true,
|
||||
"quote": "第一作者:Zhang et al., 发表于 JAMA 2023...",
|
||||
"location": "Title page, Methods section",
|
||||
"note": "文献来源信息完整"
|
||||
},
|
||||
"field2": { ... },
|
||||
// ... field3-field12
|
||||
}
|
||||
```
|
||||
|
||||
**modelAOverall (总体评估)**:
|
||||
```json
|
||||
{
|
||||
"decision": "include",
|
||||
"confidence": 0.92,
|
||||
"keyIssues": [
|
||||
"随机化方法描述完整",
|
||||
"盲法实施清晰",
|
||||
"结局指标可提取"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**medicalLogicIssues (医学逻辑验证)**:
|
||||
```json
|
||||
{
|
||||
"hasIssues": false,
|
||||
"issues": []
|
||||
}
|
||||
```
|
||||
|
||||
**conflictDetails (冲突详情)**:
|
||||
```json
|
||||
{
|
||||
"field9": {
|
||||
"modelA": "完整",
|
||||
"modelB": "不完整",
|
||||
"severity": "high"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 数据关系图(v3.0更新)
|
||||
|
||||
```
|
||||
literature_screening_projects (1) ──< (N) literature_items
|
||||
literature_screening_projects (1) ──< (N) title_abstract_screening_results
|
||||
literature_items (1) ──< (1) title_abstract_screening_results
|
||||
literature_screening_projects (1) ──< (N) screening_tasks
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔍 索引设计汇总(v3.0更新)
|
||||
|
||||
| 表名 | 索引字段 | 索引类型 | 说明 |
|
||||
|------|---------|---------|------|
|
||||
@@ -441,6 +853,9 @@ asl_schema.screening_projects (N)
|
||||
| screening_projects | status | B-tree | 状态筛选 |
|
||||
| literatures | project_id | B-tree | 项目文献查询 |
|
||||
| literatures | doi | B-tree | DOI查重 |
|
||||
| literatures | stage ⭐ | B-tree | 文献阶段查询 v3.0 |
|
||||
| literatures | has_pdf ⭐ | B-tree | PDF获取状态 v3.0 |
|
||||
| literatures | pdf_status ⭐ | B-tree | PDF上传状态 v3.0 |
|
||||
| literatures | (project_id, pmid) | Unique | 防止重复导入 |
|
||||
| screening_results | project_id | B-tree | 项目结果查询 |
|
||||
| screening_results | literature_id | B-tree | 文献结果查询 |
|
||||
@@ -449,9 +864,24 @@ asl_schema.screening_projects (N)
|
||||
| screening_results | (project_id, literature_id) | Unique | 唯一性约束 |
|
||||
| screening_tasks | project_id | B-tree | 项目任务查询 |
|
||||
| screening_tasks | status | B-tree | 任务状态筛选 |
|
||||
| fulltext_screening_tasks ⭐ | project_id | B-tree | 全文任务查询 v3.0 |
|
||||
| fulltext_screening_tasks ⭐ | status | B-tree | 任务状态筛选 v3.0 |
|
||||
| fulltext_screening_tasks ⭐ | created_at | B-tree | 时间排序 v3.0 |
|
||||
| fulltext_screening_results ⭐ | task_id | B-tree | 任务结果查询 v3.0 |
|
||||
| fulltext_screening_results ⭐ | project_id | B-tree | 项目结果查询 v3.0 |
|
||||
| fulltext_screening_results ⭐ | literature_id | B-tree | 文献结果查询 v3.0 |
|
||||
| fulltext_screening_results ⭐ | is_conflict | B-tree | 冲突筛选 v3.0 |
|
||||
| fulltext_screening_results ⭐ | final_decision | B-tree | 决策筛选 v3.0 |
|
||||
| fulltext_screening_results ⭐ | review_priority | B-tree | 复核优先级 v3.0 |
|
||||
| fulltext_screening_results ⭐ | (project_id, literature_id) | Unique | 唯一性约束 v3.0 |
|
||||
|
||||
**索引总数**: 12个
|
||||
**唯一约束**: 3个
|
||||
**索引总数**: 25个(v3.0新增13个)
|
||||
**唯一约束**: 4个(v3.0新增1个)
|
||||
|
||||
**v3.0索引优化说明**:
|
||||
- ✅ `literatures.stage`: 快速查询特定阶段的文献(如"pdf_acquired"待全文复筛)
|
||||
- ✅ `fulltext_screening_results.review_priority`: 优化人工复核队列排序
|
||||
- ✅ `fulltext_screening_tasks.created_at`: 任务历史查询优化
|
||||
|
||||
---
|
||||
|
||||
@@ -526,18 +956,66 @@ asl_schema.screening_projects (N)
|
||||
|
||||
## ⏳ 后续规划
|
||||
|
||||
### Phase 2 (全文复筛)
|
||||
- [ ] 添加全文复筛结果表
|
||||
- [ ] PDF文件元数据表
|
||||
- [ ] 全文解析结果表
|
||||
### Phase 2 (全文复筛) ✅ v3.0已完成
|
||||
- [x] 扩展 `literatures` 表(生命周期管理)
|
||||
- [x] 添加 `fulltext_screening_tasks` 表
|
||||
- [x] 添加 `fulltext_screening_results` 表(12字段)
|
||||
|
||||
### Phase 3 (数据提取)
|
||||
- [ ] 数据提取模板表
|
||||
- [ ] 提取结果表
|
||||
- [ ] 质量评估表
|
||||
### Phase 3 (数据提取) 待开发
|
||||
- [ ] 复用 `fulltext_screening_tasks` 表(切换模式)
|
||||
- [ ] 复用 `fulltext_screening_results` 表(存储提取数据)
|
||||
- [ ] 或新增 `data_extraction_results` 表(如需独立)
|
||||
|
||||
### Phase 4 (质量评估) 待规划
|
||||
- [ ] 质量评估结果表
|
||||
- [ ] 偏倚风险评估表
|
||||
- [ ] GRADE证据质量表
|
||||
|
||||
---
|
||||
|
||||
**文档版本:** v2.0
|
||||
**最后更新:** 2025-11-18
|
||||
## 📝 v3.0 设计决策记录
|
||||
|
||||
### 决策1: 全文内容存储引用而非直接存储 ✅
|
||||
|
||||
**问题**:全文内容是否存储在数据库?
|
||||
|
||||
**方案对比**:
|
||||
| 方案 | 优点 | 缺点 |
|
||||
|------|------|------|
|
||||
| 存TEXT | LLM调用快 | 违背云原生规范,数据库臃肿 |
|
||||
| 存引用 | 符合规范,轻量 | LLM调用增加100-200ms |
|
||||
|
||||
**决策**:✅ 采用方案2(存引用)
|
||||
- 符合云原生存储与计算分离原则
|
||||
- 支持超大文献(>1MB)
|
||||
- RDS存储成本是OSS的5-10倍
|
||||
|
||||
### 决策2: 12字段使用JSON存储 ✅
|
||||
|
||||
**问题**:12字段是拆分为列还是JSON存储?
|
||||
|
||||
**决策**:✅ 使用PostgreSQL JSONB
|
||||
- 不需要单独查询某个字段内部
|
||||
- 字段结构复杂(6个子字段)
|
||||
- JSONB性能优秀且支持GIN索引
|
||||
|
||||
### 决策3: 独立全文复筛结果表 ✅
|
||||
|
||||
**问题**:是否复用 `screening_results` 表?
|
||||
|
||||
**决策**:✅ 新增独立表 `fulltext_screening_results`
|
||||
- 数据结构完全不同(PICOS vs 12字段)
|
||||
- 避免字段冗余和逻辑耦合
|
||||
- 便于独立维护和优化
|
||||
|
||||
---
|
||||
|
||||
**文档版本:** v3.0
|
||||
**最后更新:** 2025-11-22(Day 4:全文复筛数据库设计)
|
||||
**维护者:** AI智能文献开发团队
|
||||
|
||||
**版本历史**:
|
||||
- v3.0 (2025-11-22): 全文复筛数据库设计,新增3个表和相关字段
|
||||
- v2.2 (2025-11-21): Week 4统计功能完成
|
||||
- v2.0 (2025-11-18): 标题初筛数据库设计
|
||||
- v1.0 (2025-10-29): 初始版本
|
||||
|
||||
Reference in New Issue
Block a user