feat(asl): Complete Day 5 - Fulltext Screening Backend API Development

- Implement 5 core API endpoints (create task, get progress, get results, update decision, export Excel)
- Add FulltextScreeningController with Zod validation (652 lines)
- Implement ExcelExporter service with 4-sheet report generation (352 lines)
- Register routes under /api/v1/asl/fulltext-screening
- Create 31 REST Client test cases
- Add automated integration test script
- Fix PDF extraction fallback mechanism in LLM12FieldsService
- Update API design documentation to v3.0
- Update development plan to v1.2
- Create Day 5 development record
- Clean up temporary test files
This commit is contained in:
2025-11-23 10:52:07 +08:00
parent 08aa3f6c28
commit 88cc049fb3
232 changed files with 7780 additions and 441 deletions

View File

@@ -1,10 +1,10 @@
# AI智能文献模块 - API设计规范
> **文档版本:** v2.1
> **文档版本:** v3.0
> **创建日期:** 2025-10-29
> **维护者:** AI智能文献开发团队
> **最后更新:** 2025-11-21
> **更新说明:** 更新实际API格式、字段映射说明、测试数据示例
> **最后更新:** 2025-11-23
> **更新说明:** 新增全文复筛API5个核心接口
---
@@ -591,6 +591,490 @@ curl -X DELETE http://localhost:3001/api/v1/asl/literatures/{literatureId}
---
### 4. 全文复筛管理 (Fulltext Screening)
> **状态**: ✅ Day 5实现中2025-11-23
#### 4.1 创建全文复筛任务
**接口**: `POST /api/v1/asl/fulltext-screening/tasks`
**认证**: 需要
**说明**: 创建全文复筛任务对标题初筛通过的文献进行12字段评估
**请求体**:
```json
{
"projectId": "proj-123",
"literatureIds": ["lit-001", "lit-002", "lit-003"],
"modelA": "deepseek-v3",
"modelB": "qwen-max",
"promptVersion": "v1.0.0"
}
```
**字段说明**:
- `projectId`: 项目ID必填
- `literatureIds`: 待筛选文献ID列表必填需要是标题初筛通过的文献
- `modelA`: 模型A名称可选默认: deepseek-v3
- `modelB`: 模型B名称可选默认: qwen-max
- `promptVersion`: Prompt版本可选默认: v1.0.0
**响应示例**:
```json
{
"success": true,
"data": {
"taskId": "fst-20251123-001",
"projectId": "proj-123",
"status": "pending",
"totalCount": 3,
"modelA": "deepseek-v3",
"modelB": "qwen-max",
"createdAt": "2025-11-23T10:00:00.000Z",
"message": "任务创建成功,正在后台处理"
}
}
```
**业务规则**:
1. 验证所有文献是否属于该项目
2. 检查文献是否有可用的PDF`pdfStatus === 'ready'`
3. 任务创建后立即返回,后台异步处理
4. 如果部分文献PDF未就绪仅处理PDF就绪的文献
**错误响应**:
```json
{
"success": false,
"error": "部分文献PDF未就绪无法开始全文复筛"
}
```
**测试命令**:
```bash
curl -X POST http://localhost:3001/api/v1/asl/fulltext-screening/tasks \
-H "Content-Type: application/json" \
-d '{
"projectId": "proj-123",
"literatureIds": ["lit-001", "lit-002"],
"modelA": "deepseek-v3",
"modelB": "qwen-max"
}'
```
---
#### 4.2 获取任务进度
**接口**: `GET /api/v1/asl/fulltext-screening/tasks/:taskId`
**认证**: 需要
**说明**: 获取全文复筛任务的详细进度信息
**路径参数**:
- `taskId`: 任务ID
**响应示例**:
```json
{
"success": true,
"data": {
"taskId": "fst-20251123-001",
"projectId": "proj-123",
"status": "processing",
"progress": {
"totalCount": 30,
"processedCount": 15,
"successCount": 13,
"failedCount": 1,
"degradedCount": 1,
"pendingCount": 15,
"progressPercent": 50
},
"statistics": {
"totalTokens": 450000,
"totalCost": 2.25,
"avgTimePerLit": 18500
},
"time": {
"startedAt": "2025-11-23T10:00:00.000Z",
"estimatedEndAt": "2025-11-23T10:12:30.000Z",
"elapsedSeconds": 270
},
"models": {
"modelA": "deepseek-v3",
"modelB": "qwen-max"
},
"updatedAt": "2025-11-23T10:04:30.000Z"
}
}
```
**字段说明**:
- `status`: 任务状态
- `pending`: 待处理
- `processing`: 处理中
- `completed`: 已完成
- `failed`: 失败
- `cancelled`: 已取消
- `successCount`: 双模型都成功的文献数
- `degradedCount`: 仅一个模型成功的文献数(降级模式)
- `failedCount`: 双模型都失败的文献数
- `totalCost`: 累计成本(单位:元)
**测试命令**:
```bash
curl http://localhost:3001/api/v1/asl/fulltext-screening/tasks/fst-20251123-001
```
---
#### 4.3 获取任务结果
**接口**: `GET /api/v1/asl/fulltext-screening/tasks/:taskId/results`
**认证**: 需要
**说明**: 获取全文复筛任务的详细结果,支持筛选和分页
**路径参数**:
- `taskId`: 任务ID
**查询参数**:
- `filter`: 结果筛选(可选)
- `all`: 全部(默认)
- `conflict`: 仅冲突项
- `pending`: 待审核
- `reviewed`: 已审核
- `page`: 页码(默认: 1
- `pageSize`: 每页数量(默认: 20最大: 100
- `sortBy`: 排序字段(可选: `priority`, `createdAt`
- `sortOrder`: 排序方向(`asc` | `desc`,默认: `desc`
**响应示例**:
```json
{
"success": true,
"data": {
"taskId": "fst-20251123-001",
"total": 30,
"filtered": 3,
"results": [
{
"resultId": "fsr-001",
"literatureId": "lit-001",
"literature": {
"pmid": "12345678",
"title": "Effect of SGLT2 inhibitors on cardiovascular outcomes",
"authors": "Smith JA, et al.",
"journal": "Lancet",
"year": 2023,
"doi": "10.1016/..."
},
"modelAResult": {
"modelName": "deepseek-v3",
"status": "success",
"fields": {
"field1_source": {
"assessment": "完整",
"evidence": "第一作者Smith JA, Lancet 2023",
"location": "第1页",
"confidence": 0.98
},
"field2_studyType": {
"assessment": "完整",
"evidence": "多中心随机对照试验",
"location": "Methods第2页",
"confidence": 0.95
},
"field5_population": {
"assessment": "完整",
"evidence": "纳入500例2型糖尿病患者年龄58±12岁",
"location": "Methods第3页",
"confidence": 0.92
},
"field9_outcomes": {
"assessment": "完整",
"evidence": "主要结局eGFR变化-15.2±3.5 ml/min vs -8.1±2.9 ml/min",
"location": "Results第5页表2",
"confidence": 0.96
}
},
"overall": {
"decision": "include",
"reason": "12字段完整关键数据可提取",
"dataQuality": "high",
"confidence": 0.94
},
"tokens": 15000,
"cost": 0.015
},
"modelBResult": {
"modelName": "qwen-max",
"status": "success",
"fields": { /* */ },
"overall": {
"decision": "include",
"confidence": 0.92
},
"tokens": 15200,
"cost": 0.061
},
"validation": {
"medicalLogicIssues": [],
"evidenceChainIssues": []
},
"conflict": {
"isConflict": false,
"severity": "none",
"conflictFields": [],
"overallConflict": false
},
"review": {
"finalDecision": null,
"reviewedBy": null,
"reviewedAt": null,
"reviewNotes": null,
"priority": 50
},
"processing": {
"isDegraded": false,
"degradedModel": null,
"processedAt": "2025-11-23T10:02:15.000Z"
}
},
{
"resultId": "fsr-002",
"literatureId": "lit-005",
"literature": { /* ... */ },
"modelAResult": {
"modelName": "deepseek-v3",
"status": "success",
"fields": {
"field9_outcomes": {
"assessment": "缺失",
"evidence": "未报告具体数值仅有P值",
"location": "Results第4页",
"confidence": 0.88
}
},
"overall": {
"decision": "exclude",
"reason": "关键字段field9数据不完整无法Meta分析",
"confidence": 0.85
}
},
"modelBResult": {
"overall": {
"decision": "include",
"reason": "虽然主要结局在Discussion报告但数据完整"
}
},
"conflict": {
"isConflict": true,
"severity": "high",
"conflictFields": ["field9"],
"overallConflict": true,
"details": {
"field9": {
"modelA": "缺失",
"modelB": "完整",
"importance": "critical"
}
}
},
"review": {
"finalDecision": null,
"priority": 95
}
}
],
"pagination": {
"page": 1,
"pageSize": 20,
"totalPages": 2
},
"summary": {
"totalResults": 30,
"conflictCount": 3,
"pendingReview": 3,
"reviewed": 27,
"avgPriority": 62
}
}
}
```
**12字段说明**:
- `field1_source`: 文献来源(作者、期刊、年份等)
- `field2_studyType`: 研究类型RCT、队列研究等
- `field3_studyDesign`: 研究设计细节
- `field4_diagnosis`: 疾病诊断标准
- `field5_population`: 人群特征(样本量、基线等)⭐
- `field6_baseline`: 基线数据⭐
- `field7_intervention`: 干预措施⭐
- `field8_control`: 对照措施
- `field9_outcomes`: 结局指标⭐⭐⭐ 最关键
- `field10_statistics`: 统计方法
- `field11_quality`: 质量评价(随机化、盲法等)⭐⭐
- `field12_other`: 其他信息
**测试命令**:
```bash
# 获取所有结果
curl "http://localhost:3001/api/v1/asl/fulltext-screening/tasks/fst-20251123-001/results"
# 仅获取冲突项
curl "http://localhost:3001/api/v1/asl/fulltext-screening/tasks/fst-20251123-001/results?filter=conflict"
# 分页查询
curl "http://localhost:3001/api/v1/asl/fulltext-screening/tasks/fst-20251123-001/results?page=2&pageSize=10"
```
---
#### 4.4 人工审核决策
**接口**: `PUT /api/v1/asl/fulltext-screening/results/:resultId/decision`
**认证**: 需要
**说明**: 对单个全文复筛结果进行人工审核决策
**路径参数**:
- `resultId`: 结果ID
**请求体**:
```json
{
"finalDecision": "exclude",
"exclusionReason": "关键字段field9结局指标数据不完整",
"reviewNotes": "虽然报告了P<0.05但缺少均值±SD无法用于Meta分析"
}
```
**字段说明**:
- `finalDecision`: 最终决策(必填)
- `include`: 纳入
- `exclude`: 排除
- `exclusionReason`: 排除原因(`finalDecision === 'exclude'` 时必填)
- `reviewNotes`: 审核备注(可选)
**响应示例**:
```json
{
"success": true,
"data": {
"resultId": "fsr-002",
"finalDecision": "exclude",
"exclusionReason": "关键字段field9结局指标数据不完整",
"reviewedBy": "user-001",
"reviewedAt": "2025-11-23T10:30:00.000Z"
}
}
```
**测试命令**:
```bash
curl -X PUT http://localhost:3001/api/v1/asl/fulltext-screening/results/fsr-002/decision \
-H "Content-Type: application/json" \
-d '{
"finalDecision": "exclude",
"exclusionReason": "结局指标数据不完整",
"reviewNotes": "缺少均值和标准差"
}'
```
---
#### 4.5 导出Excel
**接口**: `GET /api/v1/asl/fulltext-screening/tasks/:taskId/export`
**认证**: 需要
**说明**: 导出全文复筛结果为Excel文件3个Sheet
**路径参数**:
- `taskId`: 任务ID
**响应**:
- Content-Type: `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`
- Content-Disposition: `attachment; filename="fulltext_screening_results_{taskId}.xlsx"`
**Excel结构**:
**Sheet 1: 纳入文献列表**
| 列名 | 说明 |
|------|------|
| 序号 | 1, 2, 3... |
| PMID | PubMed ID |
| 文献来源 | 第一作者+年份 |
| 标题 | 文献标题 |
| 期刊 | 期刊名称 |
| 年份 | 发表年份 |
| DOI | DOI编号 |
| 最终决策 | 纳入 |
| 数据质量 | 高/中/低 |
| 可提取性 | 可提取/部分可提取/不可提取 |
| 模型一致性 | 一致/不一致 |
| 是否人工审核 | 是/否 |
**Sheet 2: 排除文献列表**
| 列名 | 说明 |
|------|------|
| 序号 | 1, 2, 3... |
| PMID | PubMed ID |
| 文献来源 | 第一作者+年份 |
| 标题 | 文献标题 |
| 排除原因 | 详细排除原因 |
| 排除字段 | field5, field9等 |
| 是否冲突 | 是/否 |
| 审核人 | 用户ID |
| 审核时间 | 2025-11-23 10:30 |
**Sheet 3: PRISMA统计**
| 统计项 | 数量 | 百分比 |
|--------|------|--------|
| 全文复筛总数 | 30 | 100% |
| 最终纳入 | 18 | 60% |
| 最终排除 | 12 | 40% |
| - 结局指标缺失/不完整 | 5 | 16.7% |
| - 人群特征不符 | 3 | 10% |
| - 干预措施不明确 | 2 | 6.7% |
| - 研究质量问题 | 1 | 3.3% |
| - 其他原因 | 1 | 3.3% |
| 模型冲突数 | 3 | 10% |
| 人工审核数 | 3 | 10% |
**成本统计额外Sheet**:
| 项目 | 值 |
|------|-----|
| 总Token数 | 450,000 |
| 总成本(元) | ¥2.25 |
| 平均成本/篇 | ¥0.075 |
| 模型组合 | DeepSeek-V3 + Qwen-Max |
| 处理时间 | 8分30秒 |
**测试命令**:
```bash
curl -O -J http://localhost:3001/api/v1/asl/fulltext-screening/tasks/fst-20251123-001/export
```
---
## 📋 响应格式规范
### 1. 成功响应
@@ -789,6 +1273,19 @@ Body (raw JSON):
## 🔄 版本历史
### v3.0 (2025-11-23)
- ✅ 新增全文复筛管理API5个接口
- 创建任务、获取进度、获取结果、人工审核、导出Excel
- ✅ 支持12字段详细评估
- ✅ 支持双模型对比和冲突检测
- ✅ 完整的Excel导出功能3 Sheets
- ✅ 调整文档结构5大模块
### v2.1 (2025-11-21)
- ✅ 新增统计API接口
- ✅ 更新PICOS格式说明
- ✅ 添加云原生架构标注
### v2.0 (2025-11-18)
- ✅ 实现10个核心API端点
- ✅ 完成项目管理功能
@@ -820,9 +1317,9 @@ Body (raw JSON):
---
## 🆕 Week 4 新增API
### 5. 统计API (Statistics)
### 4.1 获取项目统计数据(云原生:后端聚合)
#### 5.1 获取项目统计数据(云原生:后端聚合)
**接口**: `GET /api/v1/asl/projects/:projectId/statistics`
**认证**: 需要
@@ -867,14 +1364,16 @@ curl http://localhost:3001/api/v1/asl/projects/55941145-bba0-4b15-bda4-f0a398d78
---
**文档版本:** v2.2
**最后更新:** 2025-11-21Week 4完成
**文档版本:** v3.0
**最后更新:** 2025-11-23Day 5: 全文复筛API
**维护者:** AI智能文献开发团队
**本次更新**
- ✅ 新增统计API接口
-更新PICOS格式说明P/I/C/O/S
-添加云原生架构标注
- ✅ 新增全文复筛管理API5个核心接口
-详细的12字段评估文档
-双模型对比和冲突检测说明
- ✅ Excel导出格式规范
- ✅ 完整的请求/响应示例
---