feat(asl): Implement full-text screening core LLM service and validation system (Day 1-3)

Core Components: - PDFStorageService with Dify/OSS adapters - LLM12FieldsService with Nougat-first + dual-model + 3-layer JSON parsing - PromptBuilder for dynamic prompt assembly - MedicalLogicValidator with 5 rules + fault tolerance - EvidenceChainValidator for citation integrity - ConflictDetectionService for dual-model comparison Prompt Engineering: - System Prompt (6601 chars, Section-Aware strategy) - User Prompt template (PICOS context injection) - JSON Schema (12 fields constraints) - Cochrane standards (not loaded in MVP) Key Innovations: - 3-layer JSON parsing (JSON.parse + json-repair + code block extraction) - Promise.allSettled for dual-model fault tolerance - safeGetFieldValue for robust field extraction - Mixed CN/EN token calculation Integration Tests: - integration-test.ts (full test) - quick-test.ts (quick test) - cached-result-test.ts (fault tolerance test) Documentation Updates: - Development record (Day 2-3 summary) - Quality assurance strategy (full-text screening) - Development plan (progress update) - Module status (v1.1 update) - Technical debt (10 new items) Test Results: - JSON parsing success rate: 100% - Medical logic validation: 5/5 passed - Dual-model parallel processing: OK - Cost per PDF: CNY 0.10 Files: 238 changed, 14383 insertions(+), 32 deletions(-) Docs: docs/03-涓氬姟妯″潡/ASL-AI鏅鸿兘鏂囩尞/05-寮€鍙戣褰?2025-11-22_Day2-Day3_LLM鏈嶅姟涓庨獙璇佺郴缁熷紑鍙?md
2025-11-22 22:18:17 +08:00
parent 8eef9e0544
commit beb7f7f559
238 changed files with 20718 additions and 31 deletions
--- a/docs/03-业务模块/ASL-AI智能文献/00-模块当前状态与开发指南.md
+++ b/docs/03-业务模块/ASL-AI智能文献/00-模块当前状态与开发指南.md
@@ -1,9 +1,9 @@
 # AI智能文献模块 - 当前状态与开发指南

-> **文档版本：** v1.0  
+> **文档版本：** v1.1  
 > **创建日期：** 2025-11-21  
 > **维护者：** AI智能文献开发团队  
-> **最后更新：** 2025-11-21  
+> **最后更新：** 2025-11-22  
 > **文档目的：** 反映模块真实状态，帮助新开发人员快速上手

 ---
@@ -26,12 +26,18 @@
 AI智能文献模块是一个基于大语言模型（LLM）的文献筛选系统，用于帮助研究人员根据PICOS标准自动筛选文献。

 ### 当前状态
- **开发阶段**：✅ MVP已完成
- **主要功能**：标题摘要初筛（Title & Abstract Screening）
+- **开发阶段**：🚧 标题摘要初筛MVP已完成，全文复筛开发中
+- **已完成功能**：
+  - ✅ 标题摘要初筛（Title & Abstract Screening）
+  - ✅ 全文复筛核心LLM服务（Day 2-3，后端）
+- **开发中功能**：
+  - 🚧 全文复筛批处理与前端UI（Day 4-6）
 - **模型支持**：DeepSeek-V3 + Qwen-Max 双模型筛选
 - **部署状态**：✅ 本地开发环境运行正常

 ### 关键里程碑
+
+**标题摘要初筛（已完成）**:
 - ✅ 2025-11-18：Prompt v1.0.0-MVP完成，准确率60%
 - ✅ 2025-11-18：LLM集成与测试框架完成
 - ✅ 2025-11-19：前端MVP（设置与启动、审核工作台）完成
@@ -42,6 +48,16 @@ AI智能文献模块是一个基于大语言模型（LLM）的文献筛选系统
  - 初筛结果页面（混合方案）
  - Excel批量导出（云原生）

+**全文复筛（开发中）**:
+- ✅ 2025-11-22：**Day 2-3完成（LLM服务与验证系统）**
+  - 提示词工程体系（System/User Prompt + JSON Schema）
+  - PromptBuilder服务（动态Prompt组装）
+  - LLM12FieldsService（Nougat优先 + 双模型 + 3层JSON解析）
+  - 医学逻辑验证器（5条规则）
+  - 证据链验证器（引用完整性）
+  - 冲突检测服务（双模型对比）
+  - 集成测试与容错优化
+
 ---

 ## 🏗️ 技术架构
@@ -111,27 +127,66 @@ backend/src/modules/asl/
 ├── controllers/
 │   ├── projectController.ts        # 项目管理API
 │   ├── literatureController.ts     # 文献管理API
-│   └── screeningController.ts      # 筛选相关API
+│   └── screeningController.ts      # 筛选相关API（标题摘要初筛）
 ├── services/
 │   ├── screeningService.ts         # 筛选任务服务（核心）
-│   └── llmScreeningService.ts      # LLM调用服务
+│   └── llmScreeningService.ts      # LLM调用服务（标题摘要初筛）
 ├── schemas/
-│   └── screening.schema.ts         # Prompt生成与JSON Schema
+│   └── screening.schema.ts         # Prompt生成与JSON Schema（标题摘要初筛）
 ├── types/
 │   └── index.ts                    # TypeScript类型定义
-└── routes/
-    └── index.ts                    # 路由注册
+├── routes/
+│   └── index.ts                    # 路由注册
+│
+├── common/                          # ✅ 全文复筛通用能力层（NEW）
+│   ├── pdf/                         # PDF存储与提取
+│   │   ├── types.ts
+│   │   ├── PDFStorageService.ts
+│   │   ├── PDFStorageFactory.ts
+│   │   ├── adapters/
+│   │   │   ├── DifyPDFStorageAdapter.ts
+│   │   │   └── OSSPDFStorageAdapter.ts
+│   │   └── __tests__/
+│   ├── llm/                         # LLM 12字段服务（核心）
+│   │   ├── types.ts
+│   │   ├── PromptBuilder.ts         # 动态Prompt组装
+│   │   ├── LLM12FieldsService.ts    # Nougat+双模型+3层JSON解析
+│   │   ├── index.ts
+│   │   └── __tests__/
+│   │       ├── integration-test.ts  # 完整集成测试
+│   │       ├── quick-test.ts        # 快速测试（1篇PDF）
+│   │       └── cached-result-test.ts # 容错验证测试
+│   ├── validation/                  # 验证服务
+│   │   ├── MedicalLogicValidator.ts # 医学逻辑验证（5条规则）
+│   │   ├── EvidenceChainValidator.ts # 证据链验证
+│   │   ├── ConflictDetectionService.ts # 冲突检测
+│   │   ├── index.ts
+│   │   └── __tests__/
+│   │       └── validation-test.ts
+│   ├── utils/
+│   │   └── tokenCalculator.ts      # Token计算与成本估算
+│   └── index.ts
+│
+└── fulltext-screening/              # ✅ 全文复筛模块（NEW）
+    └── prompts/                     # 提示词体系
+        ├── system_prompt.md         # System Prompt（6601字符）
+        ├── user_prompt_template.md  # User Prompt模板（199行）
+        ├── json_schema.json         # JSON Schema（12字段约束）
+        └── cochrane_standards/      # Cochrane标准（MVP暂不加载）
+            ├── 随机化方法.md
+            ├── 盲法.md
+            └── 结果完整性.md

 backend/prisma/
-└── schema.prisma                   # 数据库Schema定义
+└── schema.prisma                    # 数据库Schema定义

 backend/prompts/asl/screening/
-├── v1.0.0-mvp.txt                  # 标准Prompt（当前使用）
-├── v1.1.0-lenient.txt              # 宽松模式
-└── v1.1.0-strict.txt               # 严格模式
+├── v1.0.0-mvp.txt                   # 标准Prompt（标题摘要初筛）
+├── v1.1.0-lenient.txt               # 宽松模式
+└── v1.1.0-strict.txt                # 严格模式

 backend/scripts/
-└── test-llm-screening.ts           # LLM测试脚本
+└── test-llm-screening.ts            # LLM测试脚本（标题摘要初筛）
 ```

 ---
@@ -916,6 +971,11 @@ chore: 更新依赖版本
 4. [开发计划](./04-开发计划/03-任务分解.md)：功能清单与计划

 ### 开发记录
+
+**全文复筛**:
+- [2025-11-22 Day2-Day3 LLM服务与验证系统开发](./05-开发记录/2025-11-22_Day2-Day3_LLM服务与验证系统开发.md) ⭐ **最新**
+
+**标题摘要初筛**:
 - [2025-11-21 真实LLM集成](./05-开发记录/2025-11-21-真实LLM集成完成报告.md)
 - [2025-11-21 字段映射修复](./05-开发记录/2025-11-21-字段映射问题修复.md)
 - [2025-11-21 用户体验优化](./05-开发记录/2025-11-21-用户体验优化.md)
@@ -993,17 +1053,22 @@ Drawer打开: <50ms

 ## 🎯 下一步开发计划

-### 短期（Week 3-4）
+### 当前Sprint（全文复筛MVP）
+1. 🚧 **全文复筛 Day 4**：批处理任务服务（进行中）
+2. ⏳ **全文复筛 Day 5**：前端UI开发（待开始）
+3. ⏳ **全文复筛 Day 6**：API集成与联调（待开始）
+
+### 短期优化（标题摘要初筛）
 1. ⏳ Prompt优化（提升准确率到85%+）
 2. ⏳ 添加任务暂停/取消功能
 3. ⏳ 实现并发处理（3-5个并发）
 4. ⏳ 添加估计剩余时间显示

 ### 中期（Month 2）
-1. ⏳ 全文复筛功能
-2. ⏳ 用户自定义边界情况
-3. ⏳ WebSocket实时推送
-4. ⏳ 数据导出（Excel/PDF）
+1. 🚧 全文复筛功能（开发中）
+2. ⏳ 全文数据提取功能
+3. ⏳ 用户自定义边界情况
+4. ⏳ WebSocket实时推送

 ### 长期（Month 3+）
 1. ⏳ 多用户支持（真实认证）
@@ -1019,14 +1084,15 @@ Drawer打开: <50ms

 ---

-**最后更新**：2025-11-21（Week 4完成）  
+**最后更新**：2025-11-22（全文复筛 Day 2-3完成）  
 **文档状态**：✅ 反映真实状态  
-**下次更新时机**：Prompt优化完成 或 并发处理实现
+**下次更新时机**：全文复筛MVP完成 或 标题摘要Prompt优化完成

-**本次更新内容**：
- ✅ 新增"初筛结果页面"功能清单
- ✅ 新增"统计API"功能清单
- ✅ 更新关键里程碑（Week 4完成）
- ✅ 更新技术债务说明
+**本次更新内容**（v1.1）：
+- ✅ 更新当前状态（新增全文复筛开发进度）
+- ✅ 更新关键里程碑（Day 2-3完成）
+- ✅ 新增后端代码结构（common层 + fulltext-screening层）
+- ✅ 新增开发记录链接（Day 2-3工作总结）
+- ✅ 更新下一步开发计划（当前Sprint）