feat(asl): Complete Day 5 - Fulltext Screening Backend API Development

- Implement 5 core API endpoints (create task, get progress, get results, update decision, export Excel)
- Add FulltextScreeningController with Zod validation (652 lines)
- Implement ExcelExporter service with 4-sheet report generation (352 lines)
- Register routes under /api/v1/asl/fulltext-screening
- Create 31 REST Client test cases
- Add automated integration test script
- Fix PDF extraction fallback mechanism in LLM12FieldsService
- Update API design documentation to v3.0
- Update development plan to v1.2
- Create Day 5 development record
- Clean up temporary test files
This commit is contained in:
2025-11-23 10:52:07 +08:00
parent 08aa3f6c28
commit 88cc049fb3
232 changed files with 7780 additions and 441 deletions

View File

@@ -1,9 +1,9 @@
# AI智能文献模块 - 当前状态与开发指南
> **文档版本:** v1.1
> **文档版本:** v1.2
> **创建日期:** 2025-11-21
> **维护者:** AI智能文献开发团队
> **最后更新:** 2025-11-22
> **最后更新:** 2025-11-23
> **文档目的:** 反映模块真实状态,帮助新开发人员快速上手
---
@@ -57,6 +57,17 @@ AI智能文献模块是一个基于大语言模型LLM的文献筛选系统
- 证据链验证器(引用完整性)
- 冲突检测服务(双模型对比)
- 集成测试与容错优化
- ✅ 2025-11-23**Day 4上午完成数据库设计与迁移**
- 数据库Schema设计云原生架构
- 修改 literatures 表(+13个全文字段
- 创建 fulltext_screening_tasks 表
- 创建 fulltext_screening_results 表
- 手动SQL迁移脚本安全执行不影响其他模块
- 数据库迁移状态文档详细记录Schema隔离情况
- 🚧 2025-11-23**Day 4下午进行中批处理服务**
- AsyncTaskService异步任务管理
- FulltextScreeningService批量处理逻辑
- API控制器RESTful接口
---
@@ -271,19 +282,36 @@ Query参数:
: user_id, status
```
#### 2. literatures文献
#### 2. literatures文献✨ 已扩展
```sql
: id (UUID)
: project_id screening_projects(id) CASCADE
:
- title: TEXT
- abstract: TEXT
- authors, journal, publication_year, pmid, doi
2025-11-23:
- stage: TEXTimported/title_screened/fulltext_pending/fulltext_screened
- has_pdf: BOOLEANPDF
- pdf_storage_type, pdf_storage_ref, pdf_status, pdf_uploaded_atPDF管理
- full_text_storage_type, full_text_storage_ref, full_text_url
- full_text_format, full_text_source, full_text_token_count
: project_id, pmid, doi, stage, has_pdf, pdf_status
: (project_id, pmid), (project_id, doi)
```
#### 3. screening_tasks标题摘要筛选任务
```sql
: id (UUID)
: project_id screening_projects(id) CASCADE
:
- title: TEXT
- abstract: TEXT
- authors, journal, publication_year, pmid, doi
: project_id, pmid, doi
: (project_id, pmid), (project_id, doi)
- status: 'pending' | 'running' | 'completed' | 'failed'
- total_items, processed_items, success_items, conflict_items
- started_at, completed_at
: project_id, status
```
#### 3. screening_results筛选结果
#### 4. screening_results标题摘要筛选结果)
```sql
: id (UUID)
:
@@ -309,13 +337,110 @@ Query参数:
: (project_id, literature_id)
```
#### 4. screening_tasks筛选任务)
#### 5. fulltext_screening_tasks全文复筛任务)✨ 新建
```sql
: id (UUID)
: project_id screening_projects(id) CASCADE
:
- task_type: 'title_abstract' | 'full_text'
- model_a, model_b: TEXT
- prompt_version: TEXTPrompt版本
- status: 'pending' | 'running' | 'completed' | 'failed'
- total_count, processed_count, success_count, failed_count, degraded_count
- total_tokens, total_cost:
- started_at, completed_at, estimated_end_at
- error_message, error_stack
: project_id, status, created_at
```
#### 6. fulltext_screening_results全文复筛结果✨ 新建
```sql
: id (UUID)
:
- task_id fulltext_screening_tasks(id) CASCADE
- project_id screening_projects(id) CASCADE
- literature_id literatures(id) CASCADE
:
Model A (DeepSeek-V3) :
- model_a_name, model_a_status, model_a_fields (JSONB)
- model_a_overall, model_a_processing_log, model_a_verification (JSONB)
- model_a_tokens, model_a_cost, model_a_error
Model B (Qwen-Max) : model_b_*
:
- medical_logic_issues (JSONB):
- evidence_chain_issues (JSONB):
:
- is_conflict, conflict_severity, conflict_fields, conflict_details (JSONB)
- review_priority (0-100), review_deadline
:
- final_decision: 'include' | 'exclude' | NULL
- final_decision_by, final_decision_at
- exclusion_reason, review_notes
:
- processing_status, is_degraded, degraded_model
:
- raw_output_a (JSONB), raw_output_b (JSONB), prompt_version
: task_id, project_id, literature_id, is_conflict, final_decision, review_priority
: (project_id, literature_id)
```
### 数据库Schema隔离状态
**✅ 完全正确**
- 所有ASL表都在 `asl_schema`
- 无数据泄漏到 `public` schema
- Schema隔离策略执行严格
- 详见:[数据库迁移状态说明](./05-开发记录/2025-11-23_数据库迁移状态说明.md)
---
## 📊 数据流程(真实)
### 标题摘要初筛流程
```
用户上传Excel
解析并导入到 literatures 表
创建 screening_task
后台异步处理:
- 双模型并行调用DeepSeek + Qwen
- 保存到 screening_results
- 冲突检测
- 更新任务进度
前端轮询任务状态
用户审阅结果,提交人工复核
导出Excel前端生成或后端OSS
```
### 全文复筛流程(设计中)
```
用户上传PDF批量
PDF提取服务Nougat优先PyMuPDF降级
更新 literatures 表(全文引用字段)
创建 fulltext_screening_task
后台异步批处理:
- 双模型并行调用DeepSeek + Qwen
- 12字段结构化提取
- 医学逻辑验证 + 证据链验证
- 冲突检测(字段级对比)
- 保存到 fulltext_screening_results
- 更新任务进度
前端展示结果(双视图审阅)
用户复核冲突项,提交最终决策
导出Excel12字段详细报告
- total_items: INT
- processed_items: INT
- success_items: INT