feat(asl): Complete Day 5 - Fulltext Screening Backend API Development

- Implement 5 core API endpoints (create task, get progress, get results, update decision, export Excel) - Add FulltextScreeningController with Zod validation (652 lines) - Implement ExcelExporter service with 4-sheet report generation (352 lines) - Register routes under /api/v1/asl/fulltext-screening - Create 31 REST Client test cases - Add automated integration test script - Fix PDF extraction fallback mechanism in LLM12FieldsService - Update API design documentation to v3.0 - Update development plan to v1.2 - Create Day 5 development record - Clean up temporary test files
2025-11-23 10:52:07 +08:00
parent 08aa3f6c28
commit 88cc049fb3
232 changed files with 7780 additions and 441 deletions
--- a/docs/03-业务模块/ASL-AI智能文献/02-技术设计/02-API设计规范.md
+++ b/docs/03-业务模块/ASL-AI智能文献/02-技术设计/02-API设计规范.md
@@ -1,10 +1,10 @@
 # AI智能文献模块 - API设计规范

-> **文档版本：** v2.1  
+> **文档版本：** v3.0  
 > **创建日期：** 2025-10-29  
 > **维护者：** AI智能文献开发团队  
-> **最后更新：** 2025-11-21  
-> **更新说明：** 更新实际API格式、字段映射说明、测试数据示例
+> **最后更新：** 2025-11-23  
+> **更新说明：** 新增全文复筛API（5个核心接口）

 ---

@@ -591,6 +591,490 @@ curl -X DELETE http://localhost:3001/api/v1/asl/literatures/{literatureId}

 ---

+### 4. 全文复筛管理 (Fulltext Screening)
+
+> **状态**: ✅ Day 5实现中（2025-11-23）
+
+#### 4.1 创建全文复筛任务
+
+**接口**: `POST /api/v1/asl/fulltext-screening/tasks`  
+**认证**: 需要  
+**说明**: 创建全文复筛任务，对标题初筛通过的文献进行12字段评估
+
+**请求体**:
+```json
+{
+  "projectId": "proj-123",
+  "literatureIds": ["lit-001", "lit-002", "lit-003"],
+  "modelA": "deepseek-v3",
+  "modelB": "qwen-max",
+  "promptVersion": "v1.0.0"
+}
+```
+
+**字段说明**:
+- `projectId`: 项目ID（必填）
+- `literatureIds`: 待筛选文献ID列表（必填，需要是标题初筛通过的文献）
+- `modelA`: 模型A名称（可选，默认: deepseek-v3）
+- `modelB`: 模型B名称（可选，默认: qwen-max）
+- `promptVersion`: Prompt版本（可选，默认: v1.0.0）
+
+**响应示例**:
+```json
+{
+  "success": true,
+  "data": {
+    "taskId": "fst-20251123-001",
+    "projectId": "proj-123",
+    "status": "pending",
+    "totalCount": 3,
+    "modelA": "deepseek-v3",
+    "modelB": "qwen-max",
+    "createdAt": "2025-11-23T10:00:00.000Z",
+    "message": "任务创建成功，正在后台处理"
+  }
+}
+```
+
+**业务规则**:
+1. 验证所有文献是否属于该项目
+2. 检查文献是否有可用的PDF（`pdfStatus === 'ready'`）
+3. 任务创建后立即返回，后台异步处理
+4. 如果部分文献PDF未就绪，仅处理PDF就绪的文献
+
+**错误响应**:
+```json
+{
+  "success": false,
+  "error": "部分文献PDF未就绪，无法开始全文复筛"
+}
+```
+
+**测试命令**:
+```bash
+curl -X POST http://localhost:3001/api/v1/asl/fulltext-screening/tasks \
+  -H "Content-Type: application/json" \
+  -d '{
+    "projectId": "proj-123",
+    "literatureIds": ["lit-001", "lit-002"],
+    "modelA": "deepseek-v3",
+    "modelB": "qwen-max"
+  }'
+```
+
+---
+
+#### 4.2 获取任务进度
+
+**接口**: `GET /api/v1/asl/fulltext-screening/tasks/:taskId`  
+**认证**: 需要  
+**说明**: 获取全文复筛任务的详细进度信息
+
+**路径参数**:
+- `taskId`: 任务ID
+
+**响应示例**:
+```json
+{
+  "success": true,
+  "data": {
+    "taskId": "fst-20251123-001",
+    "projectId": "proj-123",
+    "status": "processing",
+    
+    "progress": {
+      "totalCount": 30,
+      "processedCount": 15,
+      "successCount": 13,
+      "failedCount": 1,
+      "degradedCount": 1,
+      "pendingCount": 15,
+      "progressPercent": 50
+    },
+    
+    "statistics": {
+      "totalTokens": 450000,
+      "totalCost": 2.25,
+      "avgTimePerLit": 18500
+    },
+    
+    "time": {
+      "startedAt": "2025-11-23T10:00:00.000Z",
+      "estimatedEndAt": "2025-11-23T10:12:30.000Z",
+      "elapsedSeconds": 270
+    },
+    
+    "models": {
+      "modelA": "deepseek-v3",
+      "modelB": "qwen-max"
+    },
+    
+    "updatedAt": "2025-11-23T10:04:30.000Z"
+  }
+}
+```
+
+**字段说明**:
+- `status`: 任务状态
+  - `pending`: 待处理
+  - `processing`: 处理中
+  - `completed`: 已完成
+  - `failed`: 失败
+  - `cancelled`: 已取消
+- `successCount`: 双模型都成功的文献数
+- `degradedCount`: 仅一个模型成功的文献数（降级模式）
+- `failedCount`: 双模型都失败的文献数
+- `totalCost`: 累计成本（单位：元）
+
+**测试命令**:
+```bash
+curl http://localhost:3001/api/v1/asl/fulltext-screening/tasks/fst-20251123-001
+```
+
+---
+
+#### 4.3 获取任务结果
+
+**接口**: `GET /api/v1/asl/fulltext-screening/tasks/:taskId/results`  
+**认证**: 需要  
+**说明**: 获取全文复筛任务的详细结果，支持筛选和分页
+
+**路径参数**:
+- `taskId`: 任务ID
+
+**查询参数**:
+- `filter`: 结果筛选（可选）
+  - `all`: 全部（默认）
+  - `conflict`: 仅冲突项
+  - `pending`: 待审核
+  - `reviewed`: 已审核
+- `page`: 页码（默认: 1）
+- `pageSize`: 每页数量（默认: 20，最大: 100）
+- `sortBy`: 排序字段（可选: `priority`, `createdAt`）
+- `sortOrder`: 排序方向（`asc` | `desc`，默认: `desc`）
+
+**响应示例**:
+```json
+{
+  "success": true,
+  "data": {
+    "taskId": "fst-20251123-001",
+    "total": 30,
+    "filtered": 3,
+    
+    "results": [
+      {
+        "resultId": "fsr-001",
+        "literatureId": "lit-001",
+        "literature": {
+          "pmid": "12345678",
+          "title": "Effect of SGLT2 inhibitors on cardiovascular outcomes",
+          "authors": "Smith JA, et al.",
+          "journal": "Lancet",
+          "year": 2023,
+          "doi": "10.1016/..."
+        },
+        
+        "modelAResult": {
+          "modelName": "deepseek-v3",
+          "status": "success",
+          "fields": {
+            "field1_source": {
+              "assessment": "完整",
+              "evidence": "第一作者Smith JA, Lancet 2023",
+              "location": "第1页",
+              "confidence": 0.98
+            },
+            "field2_studyType": {
+              "assessment": "完整",
+              "evidence": "多中心随机对照试验",
+              "location": "Methods第2页",
+              "confidence": 0.95
+            },
+            "field5_population": {
+              "assessment": "完整",
+              "evidence": "纳入500例2型糖尿病患者，年龄58±12岁",
+              "location": "Methods第3页",
+              "confidence": 0.92
+            },
+            "field9_outcomes": {
+              "assessment": "完整",
+              "evidence": "主要结局eGFR变化：-15.2±3.5 ml/min vs -8.1±2.9 ml/min",
+              "location": "Results第5页表2",
+              "confidence": 0.96
+            }
+          },
+          "overall": {
+            "decision": "include",
+            "reason": "12字段完整，关键数据可提取",
+            "dataQuality": "high",
+            "confidence": 0.94
+          },
+          "tokens": 15000,
+          "cost": 0.015
+        },
+        
+        "modelBResult": {
+          "modelName": "qwen-max",
+          "status": "success",
+          "fields": { /* 同上结构 */ },
+          "overall": {
+            "decision": "include",
+            "confidence": 0.92
+          },
+          "tokens": 15200,
+          "cost": 0.061
+        },
+        
+        "validation": {
+          "medicalLogicIssues": [],
+          "evidenceChainIssues": []
+        },
+        
+        "conflict": {
+          "isConflict": false,
+          "severity": "none",
+          "conflictFields": [],
+          "overallConflict": false
+        },
+        
+        "review": {
+          "finalDecision": null,
+          "reviewedBy": null,
+          "reviewedAt": null,
+          "reviewNotes": null,
+          "priority": 50
+        },
+        
+        "processing": {
+          "isDegraded": false,
+          "degradedModel": null,
+          "processedAt": "2025-11-23T10:02:15.000Z"
+        }
+      },
+      
+      {
+        "resultId": "fsr-002",
+        "literatureId": "lit-005",
+        "literature": { /* ... */ },
+        
+        "modelAResult": {
+          "modelName": "deepseek-v3",
+          "status": "success",
+          "fields": {
+            "field9_outcomes": {
+              "assessment": "缺失",
+              "evidence": "未报告具体数值，仅有P值",
+              "location": "Results第4页",
+              "confidence": 0.88
+            }
+          },
+          "overall": {
+            "decision": "exclude",
+            "reason": "关键字段field9数据不完整，无法Meta分析",
+            "confidence": 0.85
+          }
+        },
+        
+        "modelBResult": {
+          "overall": {
+            "decision": "include",
+            "reason": "虽然主要结局在Discussion报告，但数据完整"
+          }
+        },
+        
+        "conflict": {
+          "isConflict": true,
+          "severity": "high",
+          "conflictFields": ["field9"],
+          "overallConflict": true,
+          "details": {
+            "field9": {
+              "modelA": "缺失",
+              "modelB": "完整",
+              "importance": "critical"
+            }
+          }
+        },
+        
+        "review": {
+          "finalDecision": null,
+          "priority": 95
+        }
+      }
+    ],
+    
+    "pagination": {
+      "page": 1,
+      "pageSize": 20,
+      "totalPages": 2
+    },
+    
+    "summary": {
+      "totalResults": 30,
+      "conflictCount": 3,
+      "pendingReview": 3,
+      "reviewed": 27,
+      "avgPriority": 62
+    }
+  }
+}
+```
+
+**12字段说明**:
+- `field1_source`: 文献来源（作者、期刊、年份等）
+- `field2_studyType`: 研究类型（RCT、队列研究等）
+- `field3_studyDesign`: 研究设计细节
+- `field4_diagnosis`: 疾病诊断标准
+- `field5_population`: 人群特征（样本量、基线等）⭐
+- `field6_baseline`: 基线数据⭐
+- `field7_intervention`: 干预措施⭐
+- `field8_control`: 对照措施
+- `field9_outcomes`: 结局指标⭐⭐⭐ 最关键
+- `field10_statistics`: 统计方法
+- `field11_quality`: 质量评价（随机化、盲法等）⭐⭐
+- `field12_other`: 其他信息
+
+**测试命令**:
+```bash
+# 获取所有结果
+curl "http://localhost:3001/api/v1/asl/fulltext-screening/tasks/fst-20251123-001/results"
+
+# 仅获取冲突项
+curl "http://localhost:3001/api/v1/asl/fulltext-screening/tasks/fst-20251123-001/results?filter=conflict"
+
+# 分页查询
+curl "http://localhost:3001/api/v1/asl/fulltext-screening/tasks/fst-20251123-001/results?page=2&pageSize=10"
+```
+
+---
+
+#### 4.4 人工审核决策
+
+**接口**: `PUT /api/v1/asl/fulltext-screening/results/:resultId/decision`  
+**认证**: 需要  
+**说明**: 对单个全文复筛结果进行人工审核决策
+
+**路径参数**:
+- `resultId`: 结果ID
+
+**请求体**:
+```json
+{
+  "finalDecision": "exclude",
+  "exclusionReason": "关键字段field9（结局指标）数据不完整",
+  "reviewNotes": "虽然报告了P<0.05，但缺少均值±SD，无法用于Meta分析"
+}
+```
+
+**字段说明**:
+- `finalDecision`: 最终决策（必填）
+  - `include`: 纳入
+  - `exclude`: 排除
+- `exclusionReason`: 排除原因（`finalDecision === 'exclude'` 时必填）
+- `reviewNotes`: 审核备注（可选）
+
+**响应示例**:
+```json
+{
+  "success": true,
+  "data": {
+    "resultId": "fsr-002",
+    "finalDecision": "exclude",
+    "exclusionReason": "关键字段field9（结局指标）数据不完整",
+    "reviewedBy": "user-001",
+    "reviewedAt": "2025-11-23T10:30:00.000Z"
+  }
+}
+```
+
+**测试命令**:
+```bash
+curl -X PUT http://localhost:3001/api/v1/asl/fulltext-screening/results/fsr-002/decision \
+  -H "Content-Type: application/json" \
+  -d '{
+    "finalDecision": "exclude",
+    "exclusionReason": "结局指标数据不完整",
+    "reviewNotes": "缺少均值和标准差"
+  }'
+```
+
+---
+
+#### 4.5 导出Excel
+
+**接口**: `GET /api/v1/asl/fulltext-screening/tasks/:taskId/export`  
+**认证**: 需要  
+**说明**: 导出全文复筛结果为Excel文件（3个Sheet）
+
+**路径参数**:
+- `taskId`: 任务ID
+
+**响应**: 
+- Content-Type: `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`
+- Content-Disposition: `attachment; filename="fulltext_screening_results_{taskId}.xlsx"`
+
+**Excel结构**:
+
+**Sheet 1: 纳入文献列表**
+| 列名 | 说明 |
+|------|------|
+| 序号 | 1, 2, 3... |
+| PMID | PubMed ID |
+| 文献来源 | 第一作者+年份 |
+| 标题 | 文献标题 |
+| 期刊 | 期刊名称 |
+| 年份 | 发表年份 |
+| DOI | DOI编号 |
+| 最终决策 | 纳入 |
+| 数据质量 | 高/中/低 |
+| 可提取性 | 可提取/部分可提取/不可提取 |
+| 模型一致性 | 一致/不一致 |
+| 是否人工审核 | 是/否 |
+
+**Sheet 2: 排除文献列表**
+| 列名 | 说明 |
+|------|------|
+| 序号 | 1, 2, 3... |
+| PMID | PubMed ID |
+| 文献来源 | 第一作者+年份 |
+| 标题 | 文献标题 |
+| 排除原因 | 详细排除原因 |
+| 排除字段 | field5, field9等 |
+| 是否冲突 | 是/否 |
+| 审核人 | 用户ID |
+| 审核时间 | 2025-11-23 10:30 |
+
+**Sheet 3: PRISMA统计**
+| 统计项 | 数量 | 百分比 |
+|--------|------|--------|
+| 全文复筛总数 | 30 | 100% |
+| 最终纳入 | 18 | 60% |
+| 最终排除 | 12 | 40% |
+| - 结局指标缺失/不完整 | 5 | 16.7% |
+| - 人群特征不符 | 3 | 10% |
+| - 干预措施不明确 | 2 | 6.7% |
+| - 研究质量问题 | 1 | 3.3% |
+| - 其他原因 | 1 | 3.3% |
+| 模型冲突数 | 3 | 10% |
+| 人工审核数 | 3 | 10% |
+
+**成本统计（额外Sheet）**:
+| 项目 | 值 |
+|------|-----|
+| 总Token数 | 450,000 |
+| 总成本（元） | ¥2.25 |
+| 平均成本/篇 | ¥0.075 |
+| 模型组合 | DeepSeek-V3 + Qwen-Max |
+| 处理时间 | 8分30秒 |
+
+**测试命令**:
+```bash
+curl -O -J http://localhost:3001/api/v1/asl/fulltext-screening/tasks/fst-20251123-001/export
+```
+
+---
+
 ## 📋 响应格式规范

 ### 1. 成功响应
@@ -789,6 +1273,19 @@ Body (raw JSON):

 ## 🔄 版本历史

+### v3.0 (2025-11-23)
+- ✅ 新增全文复筛管理API（5个接口）
+  - 创建任务、获取进度、获取结果、人工审核、导出Excel
+- ✅ 支持12字段详细评估
+- ✅ 支持双模型对比和冲突检测
+- ✅ 完整的Excel导出功能（3 Sheets）
+- ✅ 调整文档结构（5大模块）
+
+### v2.1 (2025-11-21)
+- ✅ 新增统计API接口
+- ✅ 更新PICOS格式说明
+- ✅ 添加云原生架构标注
+
 ### v2.0 (2025-11-18)
 - ✅ 实现10个核心API端点
 - ✅ 完成项目管理功能
@@ -820,9 +1317,9 @@ Body (raw JSON):

 ---

-## 🆕 Week 4 新增API
+### 5. 统计API (Statistics)

-### 4.1 获取项目统计数据（云原生：后端聚合）
+#### 5.1 获取项目统计数据（云原生：后端聚合）

 **接口**: `GET /api/v1/asl/projects/:projectId/statistics`  
 **认证**: 需要  
@@ -867,14 +1364,16 @@ curl http://localhost:3001/api/v1/asl/projects/55941145-bba0-4b15-bda4-f0a398d78

 ---

-**文档版本：** v2.2  
-**最后更新：** 2025-11-21（Week 4完成）  
+**文档版本：** v3.0  
+**最后更新：** 2025-11-23（Day 5: 全文复筛API）  
 **维护者：** AI智能文献开发团队

 **本次更新**：
- ✅ 新增统计API接口
- ✅ 更新PICOS格式说明（P/I/C/O/S）
- ✅ 添加云原生架构标注
+- ✅ 新增全文复筛管理API（5个核心接口）
+- ✅ 详细的12字段评估文档
+- ✅ 双模型对比和冲突检测说明
+- ✅ Excel导出格式规范
+- ✅ 完整的请求/响应示例

 ---