feat(asl): Complete Day 5 - Fulltext Screening Backend API Development
- Implement 5 core API endpoints (create task, get progress, get results, update decision, export Excel) - Add FulltextScreeningController with Zod validation (652 lines) - Implement ExcelExporter service with 4-sheet report generation (352 lines) - Register routes under /api/v1/asl/fulltext-screening - Create 31 REST Client test cases - Add automated integration test script - Fix PDF extraction fallback mechanism in LLM12FieldsService - Update API design documentation to v3.0 - Update development plan to v1.2 - Create Day 5 development record - Clean up temporary test files
This commit is contained in:
@@ -1,9 +1,9 @@
|
||||
# AI智能文献模块 - 当前状态与开发指南
|
||||
|
||||
> **文档版本:** v1.1
|
||||
> **文档版本:** v1.2
|
||||
> **创建日期:** 2025-11-21
|
||||
> **维护者:** AI智能文献开发团队
|
||||
> **最后更新:** 2025-11-22
|
||||
> **最后更新:** 2025-11-23
|
||||
> **文档目的:** 反映模块真实状态,帮助新开发人员快速上手
|
||||
|
||||
---
|
||||
@@ -57,6 +57,17 @@ AI智能文献模块是一个基于大语言模型(LLM)的文献筛选系统
|
||||
- 证据链验证器(引用完整性)
|
||||
- 冲突检测服务(双模型对比)
|
||||
- 集成测试与容错优化
|
||||
- ✅ 2025-11-23:**Day 4上午完成(数据库设计与迁移)**
|
||||
- 数据库Schema设计(云原生架构)
|
||||
- 修改 literatures 表(+13个全文字段)
|
||||
- 创建 fulltext_screening_tasks 表
|
||||
- 创建 fulltext_screening_results 表
|
||||
- 手动SQL迁移脚本(安全执行,不影响其他模块)
|
||||
- 数据库迁移状态文档(详细记录Schema隔离情况)
|
||||
- 🚧 2025-11-23:**Day 4下午进行中(批处理服务)**
|
||||
- AsyncTaskService(异步任务管理)
|
||||
- FulltextScreeningService(批量处理逻辑)
|
||||
- API控制器(RESTful接口)
|
||||
|
||||
---
|
||||
|
||||
@@ -271,19 +282,36 @@ Query参数:
|
||||
索引: user_id, status
|
||||
```
|
||||
|
||||
#### 2. literatures(文献)
|
||||
#### 2. literatures(文献)✨ 已扩展
|
||||
```sql
|
||||
主键: id (UUID)
|
||||
外键: project_id → screening_projects(id) CASCADE
|
||||
标题摘要字段:
|
||||
- title: TEXT(必需)
|
||||
- abstract: TEXT(必需)
|
||||
- authors, journal, publication_year, pmid, doi
|
||||
全文复筛字段(2025-11-23新增):
|
||||
- stage: TEXT(生命周期:imported/title_screened/fulltext_pending/fulltext_screened)
|
||||
- has_pdf: BOOLEAN(是否有PDF)
|
||||
- pdf_storage_type, pdf_storage_ref, pdf_status, pdf_uploaded_at(PDF管理)
|
||||
- full_text_storage_type, full_text_storage_ref, full_text_url(云原生存储)
|
||||
- full_text_format, full_text_source, full_text_token_count(全文元数据)
|
||||
索引: project_id, pmid, doi, stage, has_pdf, pdf_status
|
||||
唯一约束: (project_id, pmid), (project_id, doi)
|
||||
```
|
||||
|
||||
#### 3. screening_tasks(标题摘要筛选任务)
|
||||
```sql
|
||||
主键: id (UUID)
|
||||
外键: project_id → screening_projects(id) CASCADE
|
||||
关键字段:
|
||||
- title: TEXT(必需)
|
||||
- abstract: TEXT(必需)
|
||||
- authors, journal, publication_year, pmid, doi
|
||||
索引: project_id, pmid, doi
|
||||
唯一约束: (project_id, pmid), (project_id, doi)
|
||||
- status: 'pending' | 'running' | 'completed' | 'failed'
|
||||
- total_items, processed_items, success_items, conflict_items
|
||||
- started_at, completed_at
|
||||
索引: project_id, status
|
||||
```
|
||||
|
||||
#### 3. screening_results(筛选结果)
|
||||
#### 4. screening_results(标题摘要筛选结果)
|
||||
```sql
|
||||
主键: id (UUID)
|
||||
外键:
|
||||
@@ -309,13 +337,110 @@ Query参数:
|
||||
唯一约束: (project_id, literature_id)
|
||||
```
|
||||
|
||||
#### 4. screening_tasks(筛选任务)
|
||||
#### 5. fulltext_screening_tasks(全文复筛任务)✨ 新建
|
||||
```sql
|
||||
主键: id (UUID)
|
||||
外键: project_id → screening_projects(id) CASCADE
|
||||
关键字段:
|
||||
- task_type: 'title_abstract' | 'full_text'
|
||||
- model_a, model_b: TEXT(双模型名称)
|
||||
- prompt_version: TEXT(Prompt版本)
|
||||
- status: 'pending' | 'running' | 'completed' | 'failed'
|
||||
- total_count, processed_count, success_count, failed_count, degraded_count
|
||||
- total_tokens, total_cost: 成本统计
|
||||
- started_at, completed_at, estimated_end_at
|
||||
- error_message, error_stack
|
||||
索引: project_id, status, created_at
|
||||
```
|
||||
|
||||
#### 6. fulltext_screening_results(全文复筛结果)✨ 新建
|
||||
```sql
|
||||
主键: id (UUID)
|
||||
外键:
|
||||
- task_id → fulltext_screening_tasks(id) CASCADE
|
||||
- project_id → screening_projects(id) CASCADE
|
||||
- literature_id → literatures(id) CASCADE
|
||||
关键字段:
|
||||
Model A (DeepSeek-V3) 结果:
|
||||
- model_a_name, model_a_status, model_a_fields (JSONB)
|
||||
- model_a_overall, model_a_processing_log, model_a_verification (JSONB)
|
||||
- model_a_tokens, model_a_cost, model_a_error
|
||||
Model B (Qwen-Max) 结果: 同上(model_b_*)
|
||||
验证结果:
|
||||
- medical_logic_issues (JSONB): 医学逻辑验证
|
||||
- evidence_chain_issues (JSONB): 证据链验证
|
||||
冲突检测:
|
||||
- is_conflict, conflict_severity, conflict_fields, conflict_details (JSONB)
|
||||
- review_priority (0-100), review_deadline
|
||||
人工复核:
|
||||
- final_decision: 'include' | 'exclude' | NULL
|
||||
- final_decision_by, final_decision_at
|
||||
- exclusion_reason, review_notes
|
||||
处理状态:
|
||||
- processing_status, is_degraded, degraded_model
|
||||
可追溯性:
|
||||
- raw_output_a (JSONB), raw_output_b (JSONB), prompt_version
|
||||
索引: task_id, project_id, literature_id, is_conflict, final_decision, review_priority
|
||||
唯一约束: (project_id, literature_id)
|
||||
```
|
||||
|
||||
### 数据库Schema隔离状态
|
||||
|
||||
**✅ 完全正确**:
|
||||
- 所有ASL表都在 `asl_schema` 中
|
||||
- 无数据泄漏到 `public` schema
|
||||
- Schema隔离策略执行严格
|
||||
- 详见:[数据库迁移状态说明](./05-开发记录/2025-11-23_数据库迁移状态说明.md)
|
||||
|
||||
---
|
||||
|
||||
## 📊 数据流程(真实)
|
||||
|
||||
### 标题摘要初筛流程
|
||||
|
||||
```
|
||||
用户上传Excel
|
||||
↓
|
||||
解析并导入到 literatures 表
|
||||
↓
|
||||
创建 screening_task
|
||||
↓
|
||||
后台异步处理:
|
||||
- 双模型并行调用(DeepSeek + Qwen)
|
||||
- 保存到 screening_results
|
||||
- 冲突检测
|
||||
- 更新任务进度
|
||||
↓
|
||||
前端轮询任务状态
|
||||
↓
|
||||
用户审阅结果,提交人工复核
|
||||
↓
|
||||
导出Excel(前端生成或后端OSS)
|
||||
```
|
||||
|
||||
### 全文复筛流程(设计中)
|
||||
|
||||
```
|
||||
用户上传PDF(批量)
|
||||
↓
|
||||
PDF提取服务(Nougat优先,PyMuPDF降级)
|
||||
↓
|
||||
更新 literatures 表(全文引用字段)
|
||||
↓
|
||||
创建 fulltext_screening_task
|
||||
↓
|
||||
后台异步批处理:
|
||||
- 双模型并行调用(DeepSeek + Qwen)
|
||||
- 12字段结构化提取
|
||||
- 医学逻辑验证 + 证据链验证
|
||||
- 冲突检测(字段级对比)
|
||||
- 保存到 fulltext_screening_results
|
||||
- 更新任务进度
|
||||
↓
|
||||
前端展示结果(双视图审阅)
|
||||
↓
|
||||
用户复核冲突项,提交最终决策
|
||||
↓
|
||||
导出Excel(12字段详细报告)
|
||||
- total_items: INT
|
||||
- processed_items: INT
|
||||
- success_items: INT
|
||||
|
||||
Reference in New Issue
Block a user