Features - User Management (Phase 4.1): - Database: Add user_modules table for fine-grained module permissions - Database: Add 4 user permissions (view/create/edit/delete) to role_permissions - Backend: UserService (780 lines) - CRUD with tenant isolation - Backend: UserController + UserRoutes (648 lines) - 13 API endpoints - Backend: Batch import users from Excel - Frontend: UserListPage (412 lines) - list/filter/search/pagination - Frontend: UserFormPage (341 lines) - create/edit with module config - Frontend: UserDetailPage (393 lines) - details/tenant/module management - Frontend: 3 modal components (592 lines) - import/assign/configure - API: GET/POST/PUT/DELETE /api/admin/users/* endpoints Architecture Upgrade - Module Permission System: - Backend: Add getUserModules() method in auth.service - Backend: Login API returns modules array in user object - Frontend: AuthContext adds hasModule() method - Frontend: Navigation filters modules based on user.modules - Frontend: RouteGuard checks requiredModule instead of requiredVersion - Frontend: Remove deprecated version-based permission system - UX: Only show accessible modules in navigation (clean UI) - UX: Smart redirect after login (avoid 403 for regular users) Fixes: - Fix UTF-8 encoding corruption in ~100 docs files - Fix pageSize type conversion in userService (String to Number) - Fix authUser undefined error in TopNavigation - Fix login redirect logic with role-based access check - Update Git commit guidelines v1.2 with UTF-8 safety rules Database Changes: - CREATE TABLE user_modules (user_id, tenant_id, module_code, is_enabled) - ADD UNIQUE CONSTRAINT (user_id, tenant_id, module_code) - INSERT 4 permissions + role assignments - UPDATE PUBLIC tenant with 8 module subscriptions Technical: - Backend: 5 new files (~2400 lines) - Frontend: 10 new files (~2500 lines) - Docs: 1 development record + 2 status updates + 1 guideline update - Total: ~4900 lines of code Status: User management 100% complete, module permission system operational
2516 lines
72 KiB
Markdown
2516 lines
72 KiB
Markdown
# AI智能文献 - 全文复筛开发计划
|
||
|
||
> **文档版本:** V1.2
|
||
> **创建日期:** 2025-11-22
|
||
> **最后更新:** 2025-11-23
|
||
> **适用阶段:** MVP阶段
|
||
> **预计工期:** 2周
|
||
> **维护者:** ASL开发团队
|
||
|
||
---
|
||
|
||
## 📊 开发进度概览
|
||
|
||
**当前状态**:🚧 Day 1-5 已完成(后端全部完成),待前端开发
|
||
|
||
| 阶段 | 时间 | 状态 | 完成度 |
|
||
|------|------|------|---------|
|
||
| **Week 1** | 2025-11-22 ~ 2025-11-23 | ✅ 已完成 | 100% |
|
||
| - Day 1: PDF存储服务 | 2025-11-22 | ✅ 已完成 | 100% |
|
||
| - Day 2: LLM 12字段服务 | 2025-11-22 | ✅ 已完成 | 100% |
|
||
| - Day 3: 验证服务 | 2025-11-22 | ✅ 已完成 | 100% |
|
||
| - Day 4上午: 数据库设计与迁移 | 2025-11-23 | ✅ 已完成 | 100% |
|
||
| - Day 4下午: 批处理服务 | 2025-11-23 | ✅ 已完成 | 100% |
|
||
| - Day 5: API开发 | 2025-11-23 | ✅ 已完成 | 100% |
|
||
| **Week 2** | 2025-11-24 ~ 2025-11-27 | ⏳ 待开始 | 0% |
|
||
| - Day 6-7: 前端开发 | 待开始 | ⏳ 待开始 | 0% |
|
||
| - Day 8: 前后端联调测试 | 待开始 | ⏳ 待开始 | 0% |
|
||
|
||
**已完成核心功能**:
|
||
- ✅ PDF存储与提取服务(包装层)
|
||
- ✅ Prompt工程体系(System/User/Schema)
|
||
- ✅ LLM 12字段服务(Nougat优先 + 双模型 + 3层JSON解析)
|
||
- ✅ 医学逻辑验证器(5条规则)
|
||
- ✅ 证据链验证器
|
||
- ✅ 冲突检测服务
|
||
- ✅ 集成测试框架
|
||
- ✅ 数据库Schema设计(3张表)
|
||
- ✅ 数据库手动迁移完成
|
||
- ✅ FulltextScreeningService(批处理服务)
|
||
- ✅ 5个核心API接口
|
||
- ✅ Excel导出服务(4个Sheet)
|
||
- ✅ Zod参数验证
|
||
- ✅ REST Client测试用例(31个)
|
||
|
||
**下一步**:Day 6 前端UI开发
|
||
|
||
---
|
||
|
||
## 📋 目录
|
||
|
||
- [1. 项目概述](#1-项目概述)
|
||
- [2. 架构设计](#2-架构设计)
|
||
- [3. 通用能力层设计(可复用)](#3-通用能力层设计可复用)
|
||
- [4. 数据库设计](#4-数据库设计)
|
||
- [5. API设计](#5-api设计)
|
||
- [6. 全文复筛业务层设计](#6-全文复筛业务层设计)
|
||
- [7. 前端设计](#7-前端设计)
|
||
- [8. 开发排期](#8-开发排期)
|
||
- [9. 技术要点](#9-技术要点)
|
||
- [10. 风险与注意事项](#10-风险与注意事项)
|
||
|
||
---
|
||
|
||
## 1. 项目概述
|
||
|
||
### 1.1 功能定位
|
||
|
||
**全文复筛**是系统评价/Meta分析的第二阶段筛选,对标题摘要初筛后"初步纳入"的文献,基于**全文内容**进行严格的二次筛选。
|
||
|
||
### 1.2 核心价值
|
||
|
||
| 维度 | 说明 |
|
||
|------|------|
|
||
| **目的** | 判断文献数据的**完整性和可用性**,确定最终纳入 |
|
||
| **依据** | **12字段模板**(基于Cochrane RoB 2.0标准) |
|
||
| **输入** | 标题摘要初筛的"初步纳入"文献(已获取全文PDF) |
|
||
| **输出** | 最终纳入列表 + 排除列表 + PRISMA统计 + 完整证据链 |
|
||
| **后续** | 最终纳入的文献进入"全文数据提取"阶段 |
|
||
| **质量标准** | 准确率≥85%(MVP),≥92%(V1.0),≥96%(V2.0) |
|
||
|
||
### 1.3 12字段模板(基于Cochrane RoB 2.0标准)
|
||
|
||
```
|
||
1. 文献来源(第一作者、年份、期刊、DOI)
|
||
2. 研究类型(RCT、队列研究等)
|
||
3. 研究设计细节(随访时间、数据来源)
|
||
4. 疾病诊断标准
|
||
5. 人群特征(样本量、人口统计学)⭐
|
||
6. 基线数据(功能指标、合并症)⭐
|
||
7. 干预措施(药物、剂量、疗程)⭐
|
||
8. 对照措施
|
||
9. 结局指标(主要/次要结局)⭐⭐⭐ 最关键
|
||
10. 统计方法
|
||
11. 质量评价(随机化、盲法、ITT分析等)⭐⭐ 关键方法学
|
||
12. 其他信息(注册号、利益冲突)
|
||
```
|
||
|
||
**全文复筛的任务**:评估这12个字段的**存在性、完整性、可提取性**,基于**Cochrane偏倚风险评估标准**判断是否纳入。
|
||
|
||
**关键字段重点评估**(参考Cochrane RoB 2.0):
|
||
- 随机化方法(序列生成、分配隐藏)
|
||
- 盲法(施盲对象、施盲方法)
|
||
- 结果完整性(失访率、ITT分析)
|
||
- 选择性报告(注册方案对比)
|
||
|
||
### 1.4 与全文提取的关系
|
||
|
||
| 维度 | 全文复筛 | 全文提取 |
|
||
|------|---------|---------|
|
||
| **目的** | 判断是否可用(纳入/排除) | 提取具体数据(用于分析) |
|
||
| **12字段用法** | 评估**是否完整** | 提取**具体数值** |
|
||
| **输出** | 二分类(纳入/排除)+ 理由 | 详细的字段值(数据表) |
|
||
| **底层技术** | **完全相同**(PDF、LLM、冲突检测等) | **完全相同** |
|
||
| **数据库/API** | **独立设计**(各自演进) | **独立设计** |
|
||
|
||
**核心设计思想**:
|
||
- ✅ **底层技术高度复用**:PDF存储、全文提取、LLM调用、冲突检测等抽象为通用能力层
|
||
- ✅ **应用层独立设计**:全文复筛和全文提取各自的数据库表、API、业务逻辑独立演进
|
||
- ✅ **未来不需拆分**:从一开始就是独立的模块
|
||
|
||
---
|
||
|
||
## 2. 架构设计
|
||
|
||
### 2.1 三层架构
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────┐
|
||
│ 应用层(本次开发重点) │
|
||
│ │
|
||
│ 全文复筛模块 │
|
||
│ ├─ AslFulltextScreeningTask(数据表) │
|
||
│ ├─ AslFulltextScreeningResult(数据表) │
|
||
│ ├─ FulltextScreeningService(业务逻辑) │
|
||
│ ├─ FulltextScreeningController(控制器) │
|
||
│ ├─ fulltext-screening routes(API) │
|
||
│ └─ 前端审核工作台 │
|
||
└────────────┬────────────────────────────────────┘
|
||
│ 调用
|
||
↓
|
||
┌─────────────────────────────────────────────────┐
|
||
│ 通用能力层(本次开发,未来复用)⭐ │
|
||
│ │
|
||
│ ├─ PDFStorageService(PDF上传/存储/下载) │
|
||
│ ├─ FulltextExtractionClient(调用Python微服务)│
|
||
│ ├─ LLM12FieldsService(LLM处理12字段) │
|
||
│ ├─ ConflictDetectionService(冲突检测) │
|
||
│ └─ AsyncTaskService(异步任务+进度追踪) │
|
||
└─────────────────────────────────────────────────┘
|
||
↓ 调用
|
||
┌─────────────────────────────────────────────────┐
|
||
│ 平台基础层(已实现) │
|
||
│ storage | logger | cache | jobQueue | prisma │
|
||
└─────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### 2.2 目录结构
|
||
|
||
```bash
|
||
backend/src/modules/asl/
|
||
├── common/ # 通用能力层 ⭐ 可被extraction复用
|
||
│ ├── pdf/ # ← PDF存储(已完成Day 1)✅
|
||
│ │ ├── PDFStorageService.ts
|
||
│ │ ├── PDFStorageFactory.ts
|
||
│ │ ├── adapters/
|
||
│ │ │ ├── DifyPDFStorageAdapter.ts
|
||
│ │ │ └── OSSPDFStorageAdapter.ts
|
||
│ │ └── types.ts
|
||
│ ├── llm/ # ← LLM服务
|
||
│ │ ├── LLM12FieldsService.ts # LLM处理12字段
|
||
│ │ ├── PromptBuilder.ts # Prompt构建(Section-Aware)⭐
|
||
│ │ └── types.ts
|
||
│ ├── validation/ # ← 验证服务(新增)⭐
|
||
│ │ ├── MedicalLogicValidator.ts # 医学逻辑验证
|
||
│ │ ├── EvidenceChainValidator.ts # 证据链验证
|
||
│ │ └── ConflictDetectionService.ts # 冲突检测
|
||
│ ├── tasks/ # ← 异步任务
|
||
│ │ └── AsyncTaskService.ts
|
||
│ └── utils/
|
||
│ ├── tokenCalculator.ts
|
||
│ └── jsonParser.ts
|
||
│
|
||
├── fulltext-screening/ # 全文复筛模块 ⭐ 本次开发重点
|
||
│ ├── routes/
|
||
│ │ └── fulltext-screening.ts # 5个API接口
|
||
│ ├── controllers/
|
||
│ │ └── FulltextScreeningController.ts
|
||
│ ├── services/
|
||
│ │ └── FulltextScreeningService.ts # 业务逻辑
|
||
│ ├── types/
|
||
│ │ └── screening.types.ts # TypeScript类型
|
||
│ └── prompts/
|
||
│ ├── system_prompt.md # System Prompt(Section-Aware)⭐
|
||
│ ├── user_prompt_template.md # User Prompt模板
|
||
│ ├── cochrane_standards/ # Cochrane标准描述(分字段)⭐
|
||
│ │ ├── 随机化方法.md
|
||
│ │ ├── 盲法.md
|
||
│ │ ├── 结果完整性.md
|
||
│ │ └── ...
|
||
│ └── few_shot_examples/ # Few-shot医学案例库⭐
|
||
│ ├── 高质量RCT.md
|
||
│ ├── 质量不足案例.md
|
||
│ └── 信息在中间位置案例.md # ← 特别重要
|
||
│
|
||
└── title-screening/ # 标题摘要初筛(已完成)
|
||
└── ...
|
||
|
||
frontend-v2/src/modules/asl/
|
||
├── pages/
|
||
│ ├── FulltextScreeningSettings.tsx # 设置与启动
|
||
│ ├── FulltextScreeningWorkbench.tsx # 审核工作台 ⭐ 核心
|
||
│ └── FulltextScreeningResults.tsx # 复筛结果
|
||
│
|
||
├── components/
|
||
│ ├── screening/
|
||
│ │ ├── PICOSPanel.tsx # PICOS标准展示(可复用)
|
||
│ │ ├── ScreeningTable.tsx # 表格化审核(可复用)
|
||
│ │ ├── ReviewModal.tsx # 双视图原文审查
|
||
│ │ └── PDFViewer.tsx # PDF阅读器
|
||
│ └── shared/
|
||
│ ├── LiteratureAcquisitionTable.tsx # 全文获取表格
|
||
│ └── ExcelExporter.tsx # Excel导出
|
||
```
|
||
|
||
---
|
||
|
||
## 3. 通用能力层设计(可复用)
|
||
|
||
> **⭐ 重要说明**:通用能力层是本次开发的重要部分,这些服务**未来将被全文提取模块复用**,因此需要高质量实现。
|
||
|
||
### 3.1 PDFStorageService - PDF存储服务
|
||
|
||
#### 3.1.1 功能定位
|
||
|
||
**⚠️ 注意:这是包装现有能力,不是重新开发**
|
||
|
||
统一的PDF文件管理服务,负责:
|
||
- PDF上传(Dify or OSS)- **复用现有storage服务** ✅
|
||
- 全文提取(调用Python微服务)- **复用ExtractionClient** ✅
|
||
- PDF下载/删除
|
||
- 支持适配器模式(零代码切换)
|
||
|
||
**现有系统已完成**:
|
||
- ✅ Python微服务(PyMuPDF + Nougat)
|
||
- ✅ ExtractionClient.ts(后端集成)
|
||
- ✅ TokenService.ts(Token计算)
|
||
- ✅ storage服务(平台基础层)
|
||
|
||
**本次开发**:只需**包装成统一接口**(约200行代码)
|
||
|
||
#### 3.1.2 核心接口
|
||
|
||
```typescript
|
||
// backend/src/modules/asl/common/services/PDFStorageService.ts
|
||
|
||
export interface PDFUploadResult {
|
||
storageRef: string // 存储引用(Dify: documentId, OSS: key)
|
||
storageType: string // 'dify' or 'oss'
|
||
url: string | null // Dify: null, OSS: 公开URL
|
||
|
||
fullText: string // 提取的全文
|
||
tokenCount: number // Token数量
|
||
charCount: number // 字符数
|
||
|
||
extractionMethod: string // 'pymupdf' or 'nougat'
|
||
extractionQuality: number // 0.0-1.0
|
||
detectedLanguage: string // 'chinese' or 'english'
|
||
}
|
||
|
||
export class PDFStorageService {
|
||
/**
|
||
* 上传PDF并提取全文
|
||
*/
|
||
async uploadAndExtract(
|
||
literatureId: string,
|
||
pdfBuffer: Buffer,
|
||
filename: string
|
||
): Promise<PDFUploadResult>
|
||
|
||
/**
|
||
* 下载PDF
|
||
*/
|
||
async download(storageRef: string): Promise<Buffer>
|
||
|
||
/**
|
||
* 删除PDF
|
||
*/
|
||
async delete(storageRef: string): Promise<void>
|
||
|
||
/**
|
||
* 检查PDF是否存在
|
||
*/
|
||
async exists(storageRef: string): Promise<boolean>
|
||
}
|
||
```
|
||
|
||
#### 3.1.3 MVP阶段存储方案
|
||
|
||
**使用Dify存储(复用PKB模块)**:
|
||
|
||
```typescript
|
||
// backend/src/modules/asl/common/adapters/DifyPDFStorageAdapter.ts
|
||
|
||
export class DifyPDFStorageAdapter implements PDFStorageAdapter {
|
||
|
||
private difyClient: DifyClient // ← 复用现有DifyClient ✅
|
||
|
||
constructor() {
|
||
// 复用common/rag/DifyClient.ts(已实现)✅
|
||
this.difyClient = new DifyClient(
|
||
process.env.DIFY_API_KEY!,
|
||
process.env.DIFY_BASE_URL!
|
||
)
|
||
}
|
||
|
||
async upload(literatureId: string, pdfBuffer: Buffer, filename: string) {
|
||
// 1. 上传到Dify(复用现有PKB功能)✅
|
||
const datasetId = process.env.DIFY_ASL_DATASET_ID!
|
||
const difyDocId = await this.difyClient.uploadDocument(
|
||
datasetId,
|
||
pdfBuffer,
|
||
filename
|
||
)
|
||
|
||
logger.info('PDF uploaded to Dify', { literatureId, difyDocId })
|
||
|
||
return {
|
||
ref: difyDocId,
|
||
type: 'dify',
|
||
url: null // Dify没有公开URL
|
||
}
|
||
}
|
||
|
||
async download(difyDocId: string): Promise<Buffer> {
|
||
return await this.difyClient.downloadDocument(difyDocId)
|
||
}
|
||
}
|
||
```
|
||
|
||
**环境配置**:
|
||
|
||
```bash
|
||
# .env
|
||
PDF_STORAGE_TYPE=dify # MVP阶段使用Dify
|
||
DIFY_API_KEY=app-xxx
|
||
DIFY_BASE_URL=http://localhost:5001/v1
|
||
DIFY_ASL_DATASET_ID=dataset-xxx # ASL专用知识库
|
||
|
||
# 未来迁移OSS(只需改配置)
|
||
# PDF_STORAGE_TYPE=oss
|
||
# OSS_REGION=cn-hangzhou
|
||
# OSS_BUCKET=asl-literatures
|
||
```
|
||
|
||
#### 3.1.4 全文提取集成
|
||
|
||
**⚠️ 注意:直接复用现有ExtractionClient,不需要重新实现**
|
||
|
||
```typescript
|
||
// 复用现有ExtractionClient(已实现)✅
|
||
import { ExtractionClient } from '@/common/document/ExtractionClient'
|
||
|
||
private async extractFulltext(pdfBuffer: Buffer): Promise<ExtractionResult> {
|
||
// ExtractionClient已实现,直接使用 ✅
|
||
const extractionClient = new ExtractionClient(
|
||
process.env.EXTRACTION_SERVICE_URL || 'http://localhost:8000'
|
||
)
|
||
|
||
// Python微服务(PyMuPDF + Nougat)已部署运行 ✅
|
||
const result = await extractionClient.extractPDF(pdfBuffer)
|
||
|
||
return {
|
||
text: result.text,
|
||
method: result.extraction_method, // 'pymupdf' or 'nougat'
|
||
quality: result.extraction_quality, // 0.0-1.0
|
||
language: result.detected_language // 'chinese' or 'english'
|
||
}
|
||
}
|
||
```
|
||
|
||
**现有系统(可直接使用)**:
|
||
- ✅ `backend/src/common/document/ExtractionClient.ts`(已实现)
|
||
- ✅ `extraction_service/`(Python微服务,已部署)
|
||
- ✅ PyMuPDF、Nougat、语言检测(已完成)
|
||
|
||
### 3.2 LLM12FieldsService - LLM处理12字段服务
|
||
|
||
#### 3.2.1 功能定位
|
||
|
||
统一的LLM调用服务,支持:
|
||
- screening模式:评估12字段完整性(基于Cochrane标准)
|
||
- extraction模式:提取12字段详细数据(未来)
|
||
- 双模型并行(DeepSeek-V3 + Qwen3-Max)
|
||
- 结果缓存(降低成本)
|
||
- **Nougat结构化优先**(英文论文)
|
||
|
||
#### 3.2.2 核心接口
|
||
|
||
```typescript
|
||
// backend/src/modules/asl/common/services/LLM12FieldsService.ts
|
||
|
||
export enum LLM12FieldsMode {
|
||
SCREENING = '12fields-screening', // 评估模式(Cochrane标准)
|
||
EXTRACTION = '12fields-extraction' // 提取模式(未来)
|
||
}
|
||
|
||
export interface LLMResult {
|
||
result: any // 解析后的JSON结果
|
||
processingTime: number // 处理时间(毫秒)
|
||
tokenUsage: number // Token使用量
|
||
cost: number // 成本(人民币)
|
||
extractionMethod: string // 'nougat' | 'pymupdf'
|
||
structuredFormat: boolean // 是否为结构化格式(Markdown)
|
||
}
|
||
|
||
export class LLM12FieldsService {
|
||
/**
|
||
* 处理12字段(screening or extraction)
|
||
*
|
||
* ⚠️ 策略:全文一次性输入,但通过Prompt工程优化
|
||
*/
|
||
async process12Fields(
|
||
mode: LLM12FieldsMode,
|
||
model: string, // 'deepseek-v3' | 'qwen-max'
|
||
fullTextMarkdown: string, // ⭐ Nougat提取的Markdown格式全文
|
||
context: any // 研究方案上下文 + PICOS标准
|
||
): Promise<LLMResult>
|
||
|
||
/**
|
||
* 双模型并行调用
|
||
*/
|
||
async processDualModels(
|
||
mode: LLM12FieldsMode,
|
||
modelA: string, // 默认 'deepseek-v3'
|
||
modelB: string, // 默认 'qwen-max'
|
||
fullTextMarkdown: string,
|
||
context: any
|
||
): Promise<{ resultA: LLMResult, resultB: LLMResult }>
|
||
}
|
||
```
|
||
|
||
#### 3.2.3 Nougat优先策略
|
||
|
||
```typescript
|
||
/**
|
||
* PDF全文提取策略
|
||
*
|
||
* 优先使用Nougat(结构化Markdown),降级使用PyMuPDF
|
||
*/
|
||
async extractFullTextStructured(
|
||
pdfBuffer: Buffer,
|
||
filename: string
|
||
): Promise<{ text: string; method: string; structured: boolean }> {
|
||
|
||
// Step 1: 检测语言
|
||
const language = await detectLanguage(pdfBuffer);
|
||
|
||
// Step 2: 英文论文优先用Nougat
|
||
if (language === 'english') {
|
||
try {
|
||
const nougatResult = await extractionClient.extractPdf(
|
||
pdfBuffer, filename, 'nougat'
|
||
);
|
||
|
||
if (nougatResult.quality > 0.8) {
|
||
logger.info('✅ 使用Nougat提取(结构化Markdown)');
|
||
return {
|
||
text: nougatResult.text,
|
||
method: 'nougat',
|
||
structured: true // ⭐ Markdown格式
|
||
};
|
||
}
|
||
} catch (error) {
|
||
logger.warn('⚠️ Nougat失败,降级到PyMuPDF');
|
||
}
|
||
}
|
||
|
||
// Step 3: 降级使用PyMuPDF
|
||
const pymupdfResult = await extractionClient.extractPdf(
|
||
pdfBuffer, filename, 'pymupdf'
|
||
);
|
||
|
||
return {
|
||
text: pymupdfResult.text,
|
||
method: 'pymupdf',
|
||
structured: false // 纯文本
|
||
};
|
||
}
|
||
```
|
||
|
||
**Nougat的优势**:
|
||
- ✅ 输出Markdown格式,保留章节结构(# Methods、## Randomization)
|
||
- ✅ 表格转换为Markdown表格,LLM可直接理解
|
||
- ✅ 公式识别为LaTeX
|
||
- ✅ 多栏布局智能处理
|
||
|
||
#### 3.2.4 缓存策略
|
||
|
||
```typescript
|
||
// LLM响应缓存(降低成本)
|
||
async process12Fields(mode, model, fullText, context): Promise<LLMResult> {
|
||
// 1. 生成缓存key
|
||
const cacheKey = `llm:${mode}:${model}:${hash(fullText + JSON.stringify(context))}`
|
||
|
||
// 2. 检查缓存
|
||
const cached = await cache.get(cacheKey)
|
||
if (cached) {
|
||
logger.info('LLM cache hit', { mode, model })
|
||
return cached
|
||
}
|
||
|
||
// 3. 调用LLM(全文一次性输入)
|
||
const result = await this.callLLM(mode, model, fullText, context)
|
||
|
||
// 4. 缓存1小时
|
||
await cache.set(cacheKey, result, 3600)
|
||
|
||
return result
|
||
}
|
||
```
|
||
|
||
### 3.3 MedicalLogicValidator - 医学逻辑验证服务(新增)⭐
|
||
|
||
#### 3.3.1 功能定位
|
||
|
||
基于循证医学标准的自动逻辑验证,检查:
|
||
- RCT研究必须有随机化描述
|
||
- 双盲研究必须说明盲法
|
||
- 样本量与基线数据一致性
|
||
- 基线不平衡需要调整分析
|
||
- ITT分析完整性
|
||
|
||
#### 3.3.2 核心接口
|
||
|
||
```typescript
|
||
// backend/src/modules/asl/common/services/MedicalLogicValidator.ts
|
||
|
||
export interface LogicViolation {
|
||
ruleId: string;
|
||
ruleName: string;
|
||
severity: 'error' | 'warning';
|
||
message: string;
|
||
field: string;
|
||
suggestedAction: string;
|
||
}
|
||
|
||
export interface LogicReport {
|
||
totalRules: number;
|
||
passedRules: number;
|
||
violations: LogicViolation[];
|
||
overallValidity: boolean;
|
||
}
|
||
|
||
export class MedicalLogicValidator {
|
||
/**
|
||
* 验证医学逻辑一致性
|
||
*/
|
||
validate(extractedData: ExtractionResult): LogicReport {
|
||
const violations = [];
|
||
|
||
// 规则1:RCT必须有随机化
|
||
if (this.isRCT(extractedData) && !this.hasRandomization(extractedData)) {
|
||
violations.push({
|
||
ruleId: 'rule_001',
|
||
ruleName: 'RCT必须有随机化',
|
||
severity: 'error',
|
||
message: '研究声称是RCT但未找到随机化方法描述',
|
||
field: '随机化方法',
|
||
suggestedAction: 'flag_for_urgent_review'
|
||
});
|
||
}
|
||
|
||
// 规则2-5:其他验证规则...
|
||
|
||
return {
|
||
totalRules: MEDICAL_LOGIC_RULES.length,
|
||
passedRules: MEDICAL_LOGIC_RULES.length - violations.length,
|
||
violations,
|
||
overallValidity: violations.filter(v => v.severity === 'error').length === 0
|
||
};
|
||
}
|
||
}
|
||
```
|
||
|
||
### 3.4 ConflictDetectionService - 冲突检测服务
|
||
|
||
#### 3.4.1 功能定位
|
||
|
||
双模型结果冲突检测,支持:
|
||
- screening冲突:12字段完整性评估冲突
|
||
- extraction冲突:数值差异冲突(未来)
|
||
- 冲突严重程度分级(基于字段重要性)
|
||
|
||
#### 3.4.2 核心接口
|
||
|
||
```typescript
|
||
// backend/src/modules/asl/common/services/ConflictDetectionService.ts
|
||
|
||
export interface ConflictAnalysis {
|
||
hasConflict: boolean // 是否存在冲突
|
||
conflictFields: string[] // 冲突的字段列表
|
||
overallConflict: boolean // 总体决策是否冲突
|
||
severity: string // 'low' | 'medium' | 'high'
|
||
criticalFieldConflicts: string[] // 关键字段冲突(随机化、盲法、结果完整性)
|
||
needUrgentReview: boolean // 是否需要紧急人工复核
|
||
}
|
||
|
||
export class ConflictDetectionService {
|
||
/**
|
||
* 检测screening冲突(12字段完整性评估)
|
||
*/
|
||
detectScreeningConflict(
|
||
modelAResult: ScreeningResult,
|
||
modelBResult: ScreeningResult
|
||
): ConflictAnalysis
|
||
|
||
/**
|
||
* 检测extraction冲突(数值差异,未来)
|
||
*/
|
||
detectExtractionConflict(
|
||
modelAResult: ExtractionResult,
|
||
modelBResult: ExtractionResult
|
||
): ConflictAnalysis
|
||
|
||
/**
|
||
* 智能分流(基于冲突严重程度)
|
||
*/
|
||
prioritizeReview(conflict: ConflictAnalysis): {
|
||
priority: number; // 0-100
|
||
reviewDeadline: Date;
|
||
reasons: string[];
|
||
}
|
||
}
|
||
```
|
||
|
||
#### 3.4.3 冲突严重程度规则(基于Cochrane标准)
|
||
|
||
```typescript
|
||
/**
|
||
* Screening冲突严重程度(参考Cochrane RoB 2.0关键字段)
|
||
*/
|
||
private calculateSeverity(
|
||
conflictFields: string[],
|
||
overallConflict: boolean
|
||
): string {
|
||
|
||
// 关键方法学字段(Cochrane RoB 2.0核心域)
|
||
const criticalFields = ['随机化方法', '盲法', '结果完整性'];
|
||
const hasCriticalConflict = conflictFields.some(f => criticalFields.includes(f));
|
||
|
||
if (hasCriticalConflict) {
|
||
return 'high'; // 关键字段冲突 → 高优先级
|
||
}
|
||
// 1. field9(结局指标)冲突 → high
|
||
if (conflictFields.includes('field9')) {
|
||
return 'high'
|
||
}
|
||
|
||
// 2. 总体决策冲突 → high
|
||
if (overallConflict) {
|
||
return 'high'
|
||
}
|
||
|
||
// 3. 关键字段(field5/6/7)冲突 → medium
|
||
const criticalFields = ['field5', 'field6', 'field7']
|
||
const hasCriticalConflict = conflictFields.some(f => criticalFields.includes(f))
|
||
if (hasCriticalConflict) {
|
||
return 'medium'
|
||
}
|
||
|
||
// 4. 冲突字段>3个 → medium
|
||
if (conflictFields.length > 3) {
|
||
return 'medium'
|
||
}
|
||
|
||
// 5. 其他 → low
|
||
return conflictFields.length > 0 ? 'low' : 'none'
|
||
}
|
||
```
|
||
|
||
### 3.5 AsyncTaskService - 异步任务服务
|
||
|
||
#### 3.5.1 功能定位
|
||
|
||
批量处理服务,支持:
|
||
- 固定并发(3并发)
|
||
- 进度追踪
|
||
- 失败重试
|
||
- 任务取消
|
||
|
||
#### 3.4.2 核心接口
|
||
|
||
```typescript
|
||
// backend/src/modules/asl/common/services/AsyncTaskService.ts
|
||
|
||
export interface BatchResult<T> {
|
||
totalCount: number
|
||
successCount: number
|
||
failedCount: number
|
||
results: T[]
|
||
}
|
||
|
||
export class AsyncTaskService {
|
||
/**
|
||
* 批量处理文献
|
||
*/
|
||
async processBatch<T>(
|
||
taskId: string,
|
||
literatureIds: string[],
|
||
processor: (litId: string) => Promise<T>,
|
||
onProgress?: (completed: number, total: number) => void
|
||
): Promise<BatchResult<T>>
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 4. 数据库设计
|
||
|
||
### 4.1 核心表结构
|
||
|
||
#### 4.1.1 AslFulltextScreeningTask - 全文复筛任务表
|
||
|
||
```prisma
|
||
model AslFulltextScreeningTask {
|
||
@@schema("asl_schema")
|
||
|
||
id String @id @default(uuid())
|
||
projectId String
|
||
userId String
|
||
name String @default("全文复筛")
|
||
|
||
// 数据来源
|
||
sourceTitleTaskId String? // 来源标题初筛任务ID
|
||
sourceType String @default("title_screening") // title_screening/manual_import
|
||
|
||
// 任务配置(MVP阶段:成本友好模型)⭐
|
||
modelA String @default("deepseek-v3") // ¥0.001/1K tokens
|
||
modelB String @default("qwen-max") // ¥0.004/1K tokens
|
||
|
||
// 统计
|
||
totalCount Int @default(0)
|
||
pdfReadyCount Int @default(0)
|
||
processedCount Int @default(0)
|
||
includedCount Int @default(0)
|
||
excludedCount Int @default(0)
|
||
conflictCount Int @default(0)
|
||
pendingCount Int @default(0)
|
||
|
||
// 任务状态
|
||
status String @default("pending")
|
||
// pending → acquiring_pdfs → processing → completed/failed
|
||
errorMessage String? @db.Text
|
||
|
||
// 时间统计
|
||
startedAt DateTime?
|
||
completedAt DateTime?
|
||
processingTime Int? // 秒
|
||
|
||
// 成本统计
|
||
totalCost Float? // 美元
|
||
totalTokens Int?
|
||
|
||
createdAt DateTime @default(now())
|
||
updatedAt DateTime @updatedAt
|
||
|
||
// 关联关系
|
||
project AslScreeningProject @relation(fields: [projectId], references: [id], onDelete: Cascade)
|
||
results AslFulltextScreeningResult[]
|
||
|
||
@@index([userId, status])
|
||
@@index([projectId])
|
||
@@index([status, createdAt])
|
||
}
|
||
```
|
||
|
||
#### 4.1.2 AslFulltextScreeningResult - 全文复筛结果表
|
||
|
||
```prisma
|
||
model AslFulltextScreeningResult {
|
||
@@schema("asl_schema")
|
||
|
||
id String @id @default(uuid())
|
||
taskId String
|
||
literatureId String @unique // 一个文献只有一条screening结果
|
||
|
||
// 模型A判断(12字段评估)
|
||
modelAResult Json
|
||
// 结构:{
|
||
// field1_source: { present: true, completeness: "完整", note: "..." },
|
||
// field2_studyType: { present: true, completeness: "完整", note: "RCT" },
|
||
// ... field3-12 ...
|
||
// field9_outcomes: { present: true, completeness: "完整", extractable: true, note: "..." },
|
||
// overallAssessment: {
|
||
// fieldsComplete: "10/12",
|
||
// criticalFieldsMissing: [],
|
||
// dataQuality: "高",
|
||
// extractability: "可提取",
|
||
// decision: "纳入",
|
||
// reason: "所有关键字段完整,数据可提取",
|
||
// confidence: 0.95
|
||
// }
|
||
// }
|
||
|
||
modelAProcessTime Int? // 毫秒
|
||
modelATokens Int?
|
||
modelACost Float? // 美元
|
||
|
||
// 模型B判断(同上结构)
|
||
modelBResult Json
|
||
modelBProcessTime Int?
|
||
modelBTokens Int?
|
||
modelBCost Float?
|
||
|
||
// 冲突分析
|
||
isConflict Boolean @default(false)
|
||
conflictFields Json? // ["field5", "field9"]
|
||
conflictSeverity String? // low/medium/high
|
||
overallConflict Boolean @default(false) // 总体决策是否冲突
|
||
|
||
// 最终决策
|
||
finalDecision String @default("pending")
|
||
// pending → included/excluded
|
||
decisionMethod String? // ai_consensus/manual
|
||
exclusionReason String? @db.Text
|
||
exclusionCategory String?
|
||
// missing_outcomes/incomplete_data/poor_quality/
|
||
// protocol_violation/duplicate/other
|
||
|
||
// 质量标记
|
||
dataQuality String? // high/medium/low
|
||
extractability String? // extractable/partial/non_extractable
|
||
|
||
// 人工审核
|
||
reviewedBy String?
|
||
reviewedAt DateTime?
|
||
reviewNote String? @db.Text
|
||
|
||
createdAt DateTime @default(now())
|
||
updatedAt DateTime @updatedAt
|
||
|
||
// 关联关系
|
||
task AslFulltextScreeningTask @relation(fields: [taskId], references: [id], onDelete: Cascade)
|
||
literature AslLiterature @relation(fields: [literatureId], references: [id], onDelete: Cascade)
|
||
|
||
@@index([taskId, finalDecision])
|
||
@@index([taskId, isConflict])
|
||
@@index([exclusionCategory])
|
||
@@index([dataQuality])
|
||
}
|
||
```
|
||
|
||
#### 4.1.3 AslLiterature - 文献表(更新字段)
|
||
|
||
```prisma
|
||
model AslLiterature {
|
||
@@schema("asl_schema")
|
||
|
||
id String @id @default(uuid())
|
||
projectId String
|
||
|
||
// 基本信息
|
||
pmid String?
|
||
doi String?
|
||
title String @db.Text
|
||
authors String? @db.Text
|
||
journal String?
|
||
year Int?
|
||
abstract String? @db.Text
|
||
|
||
// PDF存储(由PDFStorageService管理)
|
||
hasPDF Boolean @default(false)
|
||
pdfStorageType String? // 'dify' or 'oss'
|
||
pdfStorageRef String? // Dify: documentId, OSS: key
|
||
pdfUrl String? // Dify: null, OSS: 公开URL
|
||
pdfStatus String? // acquiring/ready/failed
|
||
pdfAcquireMethod String? // auto/manual/knowledge_base
|
||
|
||
// 全文(由PDFStorageService提取并存储)
|
||
fullText String? @db.Text
|
||
fullTextTokenCount Int?
|
||
fullTextCharCount Int?
|
||
extractionMethod String? // pymupdf/nougat
|
||
extractionQuality Float? // 0.0-1.0
|
||
detectedLanguage String? // chinese/english
|
||
|
||
// 阶段标记
|
||
stage String @default("imported")
|
||
// imported → title_screened → pdf_acquired → fulltext_screened → extracted
|
||
|
||
// 时间戳
|
||
importedAt DateTime @default(now())
|
||
updatedAt DateTime @updatedAt
|
||
|
||
// 关联关系(独立)
|
||
project AslScreeningProject @relation(fields: [projectId], references: [id])
|
||
titleScreening AslTitleScreeningResult?
|
||
fulltextScreening AslFulltextScreeningResult? // 全文复筛(独立)
|
||
dataExtraction AslDataExtractionResult? // 全文提取(未来,独立)
|
||
|
||
@@index([projectId, stage])
|
||
@@index([pmid])
|
||
@@index([pdfStatus])
|
||
}
|
||
```
|
||
|
||
### 4.2 数据库迁移
|
||
|
||
**⚠️ 澄清:"迁移"就是"创建新表"的意思**
|
||
|
||
因为AI智能文献模块是**全新的**,这些表之前**不存在**,所以Prisma的"migrate"操作实际上就是**创建新表**。
|
||
|
||
```bash
|
||
# 创建新表(第一次执行"迁移")
|
||
cd backend
|
||
npx prisma migrate dev --name create_asl_fulltext_screening_tables
|
||
|
||
# 这个命令会:
|
||
# 1. 在 asl_schema 中创建3个新表:
|
||
# - AslFulltextScreeningTask(全新创建)
|
||
# - AslFulltextScreeningResult(全新创建)
|
||
# - AslLiterature(如果已存在则跳过,不存在则创建)
|
||
# 2. 生成SQL脚本(prisma/migrations/xxx/migration.sql)
|
||
# 3. 执行SQL创建表
|
||
# 4. 记录迁移历史
|
||
|
||
# SQL示例(实际执行的)
|
||
CREATE TABLE "asl_schema"."AslFulltextScreeningTask" (
|
||
"id" TEXT NOT NULL,
|
||
"projectId" TEXT NOT NULL,
|
||
"userId" TEXT NOT NULL,
|
||
-- ... 其他字段
|
||
PRIMARY KEY ("id")
|
||
);
|
||
|
||
CREATE TABLE "asl_schema"."AslFulltextScreeningResult" (
|
||
-- ...
|
||
);
|
||
```
|
||
|
||
**注意事项**:
|
||
- ✅ 不会修改PKB/AIA等其他模块的表
|
||
- ✅ 完全独立的新表(asl_schema)
|
||
- ✅ Schema隔离,数据隔离
|
||
|
||
---
|
||
|
||
## 5. API设计
|
||
|
||
### 5.1 API端点列表
|
||
|
||
```typescript
|
||
// 前缀:/api/v1/asl/fulltext-screening
|
||
|
||
// 1. 创建全文复筛任务
|
||
POST /tasks
|
||
|
||
// 2. 获取任务列表
|
||
GET /tasks
|
||
|
||
// 3. 获取任务详情(含进度)
|
||
GET /tasks/:taskId
|
||
|
||
// 4. 获取任务结果
|
||
GET /tasks/:taskId/results
|
||
|
||
// 5. 人工审核决策
|
||
PUT /results/:resultId/decision
|
||
|
||
// 6. 导出Excel
|
||
GET /tasks/:taskId/export-excel
|
||
|
||
// 7. 取消任务
|
||
POST /tasks/:taskId/cancel
|
||
|
||
// 8. 重试失败项
|
||
POST /tasks/:taskId/retry-failed
|
||
```
|
||
|
||
### 5.2 详细API设计
|
||
|
||
#### 5.2.1 创建全文复筛任务
|
||
|
||
```typescript
|
||
POST /api/v1/asl/fulltext-screening/tasks
|
||
|
||
Request:
|
||
{
|
||
"projectId": "proj-123",
|
||
"name": "全文复筛-第一批",
|
||
"sourceTitleTaskId": "title-task-456", // 来源标题初筛任务
|
||
"literatureIds": ["lit-001", "lit-002", ...], // 初步纳入的文献
|
||
"modelA": "deepseek-v3",
|
||
"modelB": "qwen-max"
|
||
}
|
||
|
||
Response:
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"taskId": "fs-task-789",
|
||
"status": "pending",
|
||
"totalCount": 30,
|
||
"pdfReadyCount": 28, // PDF已就绪
|
||
"message": "任务创建成功,正在开始处理"
|
||
}
|
||
}
|
||
|
||
// 后端逻辑
|
||
async createTask(req, reply) {
|
||
// 1. 验证文献是否有PDF
|
||
const literatures = await prisma.aslLiterature.findMany({
|
||
where: { id: { in: req.body.literatureIds } }
|
||
})
|
||
|
||
const pdfReady = literatures.filter(lit => lit.pdfStatus === 'ready')
|
||
|
||
// 2. 创建任务
|
||
const task = await prisma.aslFulltextScreeningTask.create({
|
||
data: {
|
||
projectId: req.body.projectId,
|
||
userId: req.userId,
|
||
name: req.body.name,
|
||
sourceTitleTaskId: req.body.sourceTitleTaskId,
|
||
modelA: req.body.modelA,
|
||
modelB: req.body.modelB,
|
||
totalCount: literatures.length,
|
||
pdfReadyCount: pdfReady.length,
|
||
status: 'pending'
|
||
}
|
||
})
|
||
|
||
// 3. 异步处理(不阻塞响应)
|
||
this.screeningService.processTask(task.id).catch(err => {
|
||
logger.error('Screening task failed', { taskId: task.id, error: err })
|
||
})
|
||
|
||
return { taskId: task.id, status: 'pending', totalCount: literatures.length }
|
||
}
|
||
```
|
||
|
||
#### 5.2.2 获取任务进度
|
||
|
||
```typescript
|
||
GET /api/v1/asl/fulltext-screening/tasks/:taskId
|
||
|
||
Response:
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"taskId": "fs-task-789",
|
||
"name": "全文复筛-第一批",
|
||
"status": "processing", // pending/acquiring_pdfs/processing/completed/failed
|
||
|
||
"totalCount": 30,
|
||
"pdfReadyCount": 28,
|
||
"processedCount": 15,
|
||
"includedCount": 10,
|
||
"excludedCount": 5,
|
||
"conflictCount": 3,
|
||
"pendingCount": 12,
|
||
|
||
"progress": 50, // 百分比
|
||
"estimatedTimeRemaining": 300, // 秒
|
||
|
||
"totalCost": 0.15, // 美元
|
||
"totalTokens": 150000,
|
||
|
||
"startedAt": "2025-11-22T10:00:00Z",
|
||
"updatedAt": "2025-11-22T10:05:00Z"
|
||
}
|
||
}
|
||
```
|
||
|
||
#### 5.2.3 获取任务结果
|
||
|
||
```typescript
|
||
GET /api/v1/asl/fulltext-screening/tasks/:taskId/results
|
||
Query:
|
||
- filter: 'all' | 'conflict' | 'included' | 'excluded' | 'pending'
|
||
- page: 1
|
||
- pageSize: 20
|
||
|
||
Response:
|
||
{
|
||
"success": true,
|
||
"data": {
|
||
"taskId": "fs-task-789",
|
||
"totalCount": 30,
|
||
"filteredCount": 3, // 符合filter的数量
|
||
|
||
"results": [
|
||
{
|
||
"resultId": "result-001",
|
||
"literatureId": "lit-001",
|
||
"literature": {
|
||
"pmid": "PMID12345",
|
||
"title": "SGLT2抑制剂治疗糖尿病肾病的RCT研究",
|
||
"authors": "Smith JA, et al.",
|
||
"journal": "Lancet",
|
||
"year": 2023
|
||
},
|
||
|
||
// 模型A判断
|
||
"modelAResult": {
|
||
"field1_source": { "present": true, "completeness": "完整", "note": "第一作者Smith, Lancet 2023" },
|
||
"field2_studyType": { "present": true, "completeness": "完整", "note": "RCT" },
|
||
"field5_population": { "present": true, "completeness": "完整", "note": "样本量500例,基线特征详细" },
|
||
"field9_outcomes": {
|
||
"present": true,
|
||
"completeness": "完整",
|
||
"extractable": true,
|
||
"note": "主要结局eGFR有完整数值数据(均值±标准差)"
|
||
},
|
||
// ... field3-4, 6-8, 10-12
|
||
|
||
"overallAssessment": {
|
||
"fieldsComplete": "12/12",
|
||
"criticalFieldsMissing": [],
|
||
"dataQuality": "高",
|
||
"extractability": "可提取",
|
||
"decision": "纳入",
|
||
"reason": "所有12个字段完整,关键数据(样本量、基线、干预、结局)均可提取,数据质量高",
|
||
"confidence": 0.95
|
||
}
|
||
},
|
||
|
||
// 模型B判断
|
||
"modelBResult": {
|
||
// 同上结构
|
||
"overallAssessment": {
|
||
"decision": "纳入",
|
||
"confidence": 0.92
|
||
}
|
||
},
|
||
|
||
// 冲突分析
|
||
"isConflict": false,
|
||
"conflictFields": [],
|
||
"overallConflict": false,
|
||
"conflictSeverity": "none",
|
||
|
||
// 最终决策
|
||
"finalDecision": "pending", // 待人工审核确认
|
||
"dataQuality": "high",
|
||
"extractability": "extractable",
|
||
|
||
"processingTime": 25000, // 25秒
|
||
"totalCost": 0.008 // 美元
|
||
},
|
||
|
||
// 冲突案例
|
||
{
|
||
"resultId": "result-002",
|
||
"literatureId": "lit-005",
|
||
"literature": { ... },
|
||
|
||
"modelAResult": {
|
||
"field9_outcomes": {
|
||
"present": false, // ← A认为缺失
|
||
"completeness": "缺失",
|
||
"extractable": false,
|
||
"note": "Results部分未报告主要结局数据"
|
||
},
|
||
"overallAssessment": {
|
||
"decision": "排除", // ← A建议排除
|
||
"reason": "关键字段field9缺失,无法用于Meta分析"
|
||
}
|
||
},
|
||
|
||
"modelBResult": {
|
||
"field9_outcomes": {
|
||
"present": true, // ← B认为存在
|
||
"completeness": "部分完整",
|
||
"extractable": true,
|
||
"note": "主要结局数据在Discussion部分报告"
|
||
},
|
||
"overallAssessment": {
|
||
"decision": "纳入", // ← B建议纳入
|
||
"reason": "虽然结局指标在Discussion报告,但数据完整可提取"
|
||
}
|
||
},
|
||
|
||
"isConflict": true, // ← 标记冲突
|
||
"conflictFields": ["field9"],
|
||
"overallConflict": true,
|
||
"conflictSeverity": "high", // field9冲突 → 高严重程度
|
||
|
||
"finalDecision": "pending" // 需要人工仲裁
|
||
}
|
||
],
|
||
|
||
"pagination": {
|
||
"page": 1,
|
||
"pageSize": 20,
|
||
"totalPages": 2
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
#### 5.2.4 人工审核决策
|
||
|
||
```typescript
|
||
PUT /api/v1/asl/fulltext-screening/results/:resultId/decision
|
||
|
||
Request:
|
||
{
|
||
"finalDecision": "excluded", // included/excluded
|
||
"exclusionReason": "关键字段field9(结局指标)数据不完整,仅有P值无具体数值",
|
||
"exclusionCategory": "missing_outcomes",
|
||
"reviewNote": "虽然报告了显著性P<0.05,但缺少均值±标准差,无法用于Meta分析"
|
||
}
|
||
|
||
Response:
|
||
{
|
||
"success": true,
|
||
"message": "决策已保存"
|
||
}
|
||
```
|
||
|
||
#### 5.2.5 导出Excel
|
||
|
||
```typescript
|
||
GET /api/v1/asl/fulltext-screening/tasks/:taskId/export-excel
|
||
|
||
Response:
|
||
// 直接下载Excel文件
|
||
|
||
// Excel结构(双Sheet)
|
||
Sheet 1: 最终纳入文献列表
|
||
- 文献ID
|
||
- PMID
|
||
- 研究ID(作者+年份)
|
||
- 文献来源
|
||
- 标题
|
||
- 最终决策
|
||
- 决策方式
|
||
- 数据质量
|
||
- 可提取性
|
||
|
||
Sheet 2: 排除文献列表
|
||
- 文献ID
|
||
- PMID
|
||
- 研究ID
|
||
- 标题
|
||
- 排除原因
|
||
- 排除分类
|
||
|
||
Sheet 3: PRISMA统计
|
||
- 总计复筛: n=30
|
||
- 最终纳入: n=18
|
||
- 排除: n=12
|
||
- 结局指标缺失: n=5
|
||
- 数据不完整: n=3
|
||
- 质量问题: n=2
|
||
- 方案违背: n=1
|
||
- 其他: n=1
|
||
```
|
||
|
||
---
|
||
|
||
## 6. 全文复筛业务层设计
|
||
|
||
### 6.1 FulltextScreeningService
|
||
|
||
```typescript
|
||
// backend/src/modules/asl/fulltext-screening/services/FulltextScreeningService.ts
|
||
|
||
export class FulltextScreeningService {
|
||
|
||
private pdfStorage: PDFStorageService
|
||
private llmService: LLM12FieldsService
|
||
private conflictDetection: ConflictDetectionService
|
||
private asyncTask: AsyncTaskService
|
||
|
||
/**
|
||
* 处理全文复筛任务
|
||
*/
|
||
async processTask(taskId: string): Promise<void> {
|
||
const task = await prisma.aslFulltextScreeningTask.findUnique({
|
||
where: { id: taskId },
|
||
include: { project: true }
|
||
})
|
||
|
||
// 1. 获取待筛选文献(仅PDF已就绪)
|
||
const literatures = await this.getLiteratures(task)
|
||
|
||
// 2. 更新任务状态
|
||
await prisma.aslFulltextScreeningTask.update({
|
||
where: { id: taskId },
|
||
data: {
|
||
status: 'processing',
|
||
startedAt: new Date()
|
||
}
|
||
})
|
||
|
||
// 3. 批量处理(3并发)
|
||
try {
|
||
await this.asyncTask.processBatch(
|
||
taskId,
|
||
literatures.map(lit => lit.id),
|
||
async (litId) => await this.screenLiterature(taskId, litId, task),
|
||
async (completed, total) => {
|
||
// 进度回调:更新任务统计
|
||
await this.updateTaskProgress(taskId, completed, total)
|
||
}
|
||
)
|
||
|
||
// 4. 任务完成
|
||
await this.completeTask(taskId)
|
||
|
||
} catch (error) {
|
||
// 5. 任务失败
|
||
await prisma.aslFulltextScreeningTask.update({
|
||
where: { id: taskId },
|
||
data: {
|
||
status: 'failed',
|
||
errorMessage: error.message
|
||
}
|
||
})
|
||
throw error
|
||
}
|
||
}
|
||
|
||
/**
|
||
* 筛选单篇文献
|
||
*/
|
||
private async screenLiterature(
|
||
taskId: string,
|
||
literatureId: string,
|
||
task: AslFulltextScreeningTask
|
||
): Promise<AslFulltextScreeningResult> {
|
||
|
||
// 1. 获取文献全文
|
||
const literature = await prisma.aslLiterature.findUnique({
|
||
where: { id: literatureId }
|
||
})
|
||
|
||
if (!literature.fullText) {
|
||
throw new Error('文献全文未就绪')
|
||
}
|
||
|
||
// 2. 双模型并行调用
|
||
const { resultA, resultB } = await this.llmService.processDualModels(
|
||
LLM12FieldsMode.SCREENING,
|
||
task.modelA,
|
||
task.modelB,
|
||
literature.fullText,
|
||
task.project // PICOS上下文
|
||
)
|
||
|
||
// 3. 冲突检测
|
||
const conflict = this.conflictDetection.detectScreeningConflict(
|
||
resultA.result,
|
||
resultB.result
|
||
)
|
||
|
||
// 4. 自动决策(无冲突且双模型一致)
|
||
let finalDecision = 'pending'
|
||
let decisionMethod = null
|
||
|
||
if (!conflict.hasConflict) {
|
||
finalDecision = resultA.result.overallAssessment.decision
|
||
decisionMethod = 'ai_consensus'
|
||
}
|
||
|
||
// 5. 保存结果
|
||
const result = await prisma.aslFulltextScreeningResult.create({
|
||
data: {
|
||
taskId,
|
||
literatureId,
|
||
|
||
modelAResult: resultA.result,
|
||
modelAProcessTime: resultA.processingTime,
|
||
modelATokens: resultA.tokenUsage,
|
||
modelACost: resultA.cost,
|
||
|
||
modelBResult: resultB.result,
|
||
modelBProcessTime: resultB.processingTime,
|
||
modelBTokens: resultB.tokenUsage,
|
||
modelBCost: resultB.cost,
|
||
|
||
isConflict: conflict.hasConflict,
|
||
conflictFields: conflict.conflictFields,
|
||
overallConflict: conflict.overallConflict,
|
||
conflictSeverity: conflict.severity,
|
||
|
||
finalDecision,
|
||
decisionMethod,
|
||
|
||
dataQuality: resultA.result.overallAssessment.dataQuality,
|
||
extractability: resultA.result.overallAssessment.extractability
|
||
}
|
||
})
|
||
|
||
// 6. 更新文献阶段
|
||
await prisma.aslLiterature.update({
|
||
where: { id: literatureId },
|
||
data: { stage: 'fulltext_screened' }
|
||
})
|
||
|
||
logger.info('Literature screened', {
|
||
literatureId,
|
||
decision: finalDecision,
|
||
conflict: conflict.hasConflict
|
||
})
|
||
|
||
return result
|
||
}
|
||
}
|
||
```
|
||
|
||
### 6.2 Prompt模板
|
||
|
||
#### 6.2.1 System Prompt
|
||
|
||
```text
|
||
// backend/src/modules/asl/fulltext-screening/prompts/12fields_screening_system.txt
|
||
|
||
您是循证医学专家,负责评估文献的数据完整性和可用性。
|
||
|
||
## 任务
|
||
基于12字段评估框架,判断文献是否可用于Meta分析。
|
||
|
||
## 研究方案
|
||
- **人群(P):** {{population}}
|
||
- **干预(I):** {{intervention}}
|
||
- **对照(C):** {{comparison}}
|
||
- **结局(O):** {{outcome}}
|
||
- **研究设计(S):** {{studyDesign}}
|
||
|
||
## 12字段评估标准
|
||
|
||
### 1. 文献来源
|
||
- 检查:第一作者、年份、期刊、DOI是否齐全
|
||
- 标准:4项齐全为"完整"
|
||
|
||
### 2. 研究类型
|
||
- 检查:是否明确说明研究设计(RCT、队列研究等)
|
||
- 标准:类型清晰明确
|
||
|
||
### 3. 研究设计细节
|
||
- 检查:(1) 随访时间 (2) 数据来源(单/多中心)
|
||
- 标准:两项都有为"完整"
|
||
|
||
### 4. 疾病诊断标准
|
||
- 检查:是否有明确诊断标准引用
|
||
- 标准:有标准引用
|
||
|
||
### 5. 人群特征 ⭐ 关键
|
||
- 检查:(1) 样本量(总数+分组) (2) 人口统计学(年龄、性别)
|
||
- 标准:样本量完整,基线特征有均值±标准差
|
||
|
||
### 6. 基线数据 ⭐ 关键
|
||
- 检查:(1) 主要功能指标 (2) 合并症
|
||
- 标准:功能指标有基线值(均值±标准差)
|
||
|
||
### 7. 干预措施 ⭐ 关键
|
||
- 检查:(1) 药物类别 (2) 剂量与疗程
|
||
- 标准:药物、剂量、频次、疗程都明确
|
||
|
||
### 8. 对照措施
|
||
- 检查:对照组干预是否明确
|
||
- 标准:对照内容清晰
|
||
|
||
### 9. 结局指标 ⭐⭐⭐ 最关键
|
||
- 检查:(1) 主要结局是否报告数据 (2) 是否有均值±标准差
|
||
- 标准:**必须有完整数值数据可供提取**
|
||
- **排除标准**:只有P值无具体数据、只有定性描述
|
||
|
||
### 10. 统计方法
|
||
- 检查:统计软件、检验方法
|
||
- 标准:方法恰当
|
||
|
||
### 11. 质量评价
|
||
- 检查:偏倚风险评估
|
||
- 标准:低风险优先
|
||
|
||
### 12. 其他信息
|
||
- 检查:注册号、利益冲突
|
||
- 标准:信息透明
|
||
|
||
## 输出格式(严格JSON)
|
||
|
||
{
|
||
"field1_source": {
|
||
"present": true,
|
||
"completeness": "完整/不完整/缺失",
|
||
"note": "第一作者Smith, Lancet 2023, DOI: 10.1016/..."
|
||
},
|
||
"field2_studyType": {
|
||
"present": true,
|
||
"completeness": "完整",
|
||
"note": "RCT,明确说明双盲随机对照试验"
|
||
},
|
||
// ... field3-8 同样结构
|
||
|
||
"field9_outcomes": {
|
||
"present": true/false,
|
||
"completeness": "完整/不完整/缺失",
|
||
"extractable": true/false, // ⭐ 关键:数据是否可提取
|
||
"note": "主要结局eGFR下降速率,有完整数值数据(干预组-2.4±5.2, 对照组-5.5±6.8)"
|
||
},
|
||
|
||
// ... field10-12
|
||
|
||
"overallAssessment": {
|
||
"fieldsComplete": "10/12",
|
||
"criticalFieldsMissing": ["field9"], // 缺失的关键字段
|
||
"dataQuality": "高/中/低",
|
||
"extractability": "可提取/部分可提取/无法提取",
|
||
|
||
"decision": "纳入/排除", // ⭐ 最终判断
|
||
"reason": "详细说明理由(100字内)",
|
||
"confidence": 0.0-1.0
|
||
}
|
||
}
|
||
|
||
## 特别注意
|
||
1. **field9最关键**:无数值数据必须排除
|
||
2. **field5/6/7也重要**:样本量、基线、干预必须完整
|
||
3. **排除优先**:有关键数据缺失果断排除
|
||
4. **数据可提取性>发表质量**:数据完整比期刊影响因子重要
|
||
```
|
||
|
||
#### 6.2.2 User Prompt
|
||
|
||
```text
|
||
// backend/src/modules/asl/fulltext-screening/prompts/12fields_screening_user.txt
|
||
|
||
请仔细阅读以下文献的**全文内容**,逐项评估12个字段。
|
||
|
||
## 全文内容
|
||
{{fullText}}
|
||
```
|
||
|
||
---
|
||
|
||
## 7. 前端设计
|
||
|
||
### 7.1 页面结构(三子视图)
|
||
|
||
```
|
||
全文复筛模块
|
||
├── 子视图1:设置与启动
|
||
│ ├── PICOS标准展示(从研究方案继承)
|
||
│ ├── 全文获取管理表格
|
||
│ └── 开始复筛按钮
|
||
│
|
||
├── 子视图2:审核工作台 ⭐ 核心
|
||
│ ├── PICOS标准参考(可折叠)
|
||
│ ├── 表格化审核界面
|
||
│ │ ├── 双行表头(模型 + 12字段)
|
||
│ │ ├── 主行:文献信息、判断结果、冲突状态、最终决策
|
||
│ │ └── 展开行:双模型12字段详细评估
|
||
│ └── 点击判断 → 弹出双视图原文审查
|
||
│
|
||
└── 子视图3:复筛结果
|
||
├── 统计卡片(总计/纳入/排除)
|
||
├── PRISMA排除原因统计
|
||
├── Tab切换(纳入/排除列表)
|
||
└── Excel导出
|
||
```
|
||
|
||
### 7.2 核心组件
|
||
|
||
#### 7.2.1 FulltextScreeningWorkbench - 审核工作台
|
||
|
||
```tsx
|
||
// frontend-v2/src/modules/asl/pages/FulltextScreeningWorkbench.tsx
|
||
|
||
export const FulltextScreeningWorkbench: React.FC = () => {
|
||
const { taskId } = useParams()
|
||
const [task, setTask] = useState<ScreeningTask | null>(null)
|
||
const [results, setResults] = useState<ScreeningResult[]>([])
|
||
const [expandedRows, setExpandedRows] = useState<Set<string>>(new Set())
|
||
const [reviewModalVisible, setReviewModalVisible] = useState(false)
|
||
const [currentReview, setCurrentReview] = useState<any>(null)
|
||
|
||
// 轮询任务进度
|
||
useEffect(() => {
|
||
const interval = setInterval(async () => {
|
||
const taskData = await fetchTaskProgress(taskId)
|
||
setTask(taskData)
|
||
|
||
if (taskData.status === 'completed' || taskData.status === 'failed') {
|
||
clearInterval(interval)
|
||
}
|
||
}, 2000)
|
||
|
||
return () => clearInterval(interval)
|
||
}, [taskId])
|
||
|
||
// 加载结果
|
||
useEffect(() => {
|
||
if (task?.status === 'processing' || task?.status === 'completed') {
|
||
fetchTaskResults(taskId).then(setResults)
|
||
}
|
||
}, [task?.status])
|
||
|
||
return (
|
||
<div className="fulltext-screening-workbench">
|
||
{/* 进度条 */}
|
||
{task?.status === 'processing' && (
|
||
<Progress
|
||
percent={task.progress}
|
||
status="active"
|
||
format={() => `${task.processedCount} / ${task.totalCount}`}
|
||
/>
|
||
)}
|
||
|
||
{/* PICOS标准面板(可折叠) */}
|
||
<PICOSPanel project={task?.project} collapsible />
|
||
|
||
{/* 表格化审核界面 */}
|
||
<ScreeningTable
|
||
taskType="fulltext"
|
||
results={results}
|
||
expandedRows={expandedRows}
|
||
onToggleExpand={(id) => toggleExpanded(id)}
|
||
onClickJudge={(result, field) => {
|
||
setCurrentReview({ result, field })
|
||
setReviewModalVisible(true)
|
||
}}
|
||
onDecisionChange={(resultId, decision) =>
|
||
updateDecision(resultId, decision)
|
||
}
|
||
/>
|
||
|
||
{/* 双视图原文审查模态框 */}
|
||
<ReviewModal
|
||
visible={reviewModalVisible}
|
||
result={currentReview?.result}
|
||
field={currentReview?.field}
|
||
onClose={() => setReviewModalVisible(false)}
|
||
/>
|
||
</div>
|
||
)
|
||
}
|
||
```
|
||
|
||
#### 7.2.2 ScreeningTable - 表格化审核
|
||
|
||
```tsx
|
||
// frontend-v2/src/modules/asl/components/screening/ScreeningTable.tsx
|
||
|
||
interface ScreeningTableProps {
|
||
taskType: 'title' | 'fulltext'
|
||
results: ScreeningResult[]
|
||
expandedRows: Set<string>
|
||
onToggleExpand: (id: string) => void
|
||
onClickJudge: (result: ScreeningResult, field: string) => void
|
||
onDecisionChange: (resultId: string, decision: string) => void
|
||
}
|
||
|
||
export const ScreeningTable: React.FC<ScreeningTableProps> = ({
|
||
taskType,
|
||
results,
|
||
expandedRows,
|
||
onToggleExpand,
|
||
onClickJudge,
|
||
onDecisionChange
|
||
}) => {
|
||
|
||
return (
|
||
<div className="screening-table">
|
||
<table>
|
||
{/* 双行表头 */}
|
||
<thead>
|
||
<tr>
|
||
<th rowSpan={2}>展开</th>
|
||
<th rowSpan={2}>文献ID</th>
|
||
<th rowSpan={2}>研究ID</th>
|
||
<th rowSpan={2}>文献来源</th>
|
||
|
||
{/* DeepSeek判断 */}
|
||
<th colSpan={taskType === 'title' ? 5 : 13}>
|
||
DeepSeek 判断
|
||
</th>
|
||
|
||
{/* Qwen判断 */}
|
||
<th colSpan={taskType === 'title' ? 5 : 13}>
|
||
Qwen 判断
|
||
</th>
|
||
|
||
<th rowSpan={2}>冲突状态</th>
|
||
<th rowSpan={2}>最终决策</th>
|
||
</tr>
|
||
<tr>
|
||
{/* DeepSeek细分(全文复筛:12字段) */}
|
||
{taskType === 'fulltext' ? (
|
||
<>
|
||
<th>F1</th><th>F2</th><th>F3</th><th>F4</th>
|
||
<th>F5</th><th>F6</th><th>F7</th><th>F8</th>
|
||
<th>F9</th><th>F10</th><th>F11</th><th>F12</th>
|
||
<th>结论</th>
|
||
</>
|
||
) : (
|
||
// 标题初筛:PICOS
|
||
<>
|
||
<th>P</th><th>I</th><th>C</th><th>S</th><th>结论</th>
|
||
</>
|
||
)}
|
||
|
||
{/* Qwen细分(同上) */}
|
||
{/* ... */}
|
||
</tr>
|
||
</thead>
|
||
|
||
<tbody>
|
||
{results.map(result => (
|
||
<React.Fragment key={result.id}>
|
||
{/* 主行 */}
|
||
<tr className={result.isConflict ? 'conflict-row' : ''}>
|
||
<td onClick={() => onToggleExpand(result.id)}>
|
||
{expandedRows.has(result.id) ? '-' : '+'}
|
||
</td>
|
||
<td>{result.literature.pmid}</td>
|
||
<td>{result.literature.studyId}</td>
|
||
<td>{result.literature.source}</td>
|
||
|
||
{/* DeepSeek判断(12字段) */}
|
||
{taskType === 'fulltext' && (
|
||
<>
|
||
{[1,2,3,4,5,6,7,8,9,10,11,12].map(i => (
|
||
<td
|
||
key={`a-f${i}`}
|
||
className="judge-cell"
|
||
onClick={() => onClickJudge(result, `field${i}`)}
|
||
>
|
||
{getFieldIcon(result.modelAResult[`field${i}`])}
|
||
</td>
|
||
))}
|
||
<td className={result.modelAResult.overallAssessment.decision === '纳入' ? 'text-green' : 'text-red'}>
|
||
{result.modelAResult.overallAssessment.decision}
|
||
</td>
|
||
</>
|
||
)}
|
||
|
||
{/* Qwen判断(同上) */}
|
||
{/* ... */}
|
||
|
||
{/* 冲突状态 */}
|
||
<td>
|
||
{result.isConflict ? (
|
||
<Badge status="error" text="冲突" />
|
||
) : (
|
||
<Badge status="success" text="一致" />
|
||
)}
|
||
</td>
|
||
|
||
{/* 最终决策 */}
|
||
<td>
|
||
<Select
|
||
value={result.finalDecision}
|
||
onChange={(value) => onDecisionChange(result.id, value)}
|
||
>
|
||
<Option value="pending">待定</Option>
|
||
<Option value="included">纳入</Option>
|
||
<Option value="excluded">排除</Option>
|
||
</Select>
|
||
</td>
|
||
</tr>
|
||
|
||
{/* 展开行(12字段详细评估) */}
|
||
{expandedRows.has(result.id) && (
|
||
<tr className="expanded-row">
|
||
<td colSpan={30}>
|
||
<div className="grid grid-cols-2 gap-4">
|
||
{/* DeepSeek详细评估 */}
|
||
<div>
|
||
<h4>DeepSeek 评估</h4>
|
||
{Object.entries(result.modelAResult).map(([field, assessment]) => {
|
||
if (field.startsWith('field')) {
|
||
return (
|
||
<div key={field} className="field-assessment">
|
||
<strong>{field}:</strong>
|
||
<span>{assessment.completeness}</span>
|
||
<span className="note">{assessment.note}</span>
|
||
</div>
|
||
)
|
||
}
|
||
})}
|
||
</div>
|
||
|
||
{/* Qwen详细评估 */}
|
||
<div>
|
||
{/* 同上 */}
|
||
</div>
|
||
</div>
|
||
</td>
|
||
</tr>
|
||
)}
|
||
</React.Fragment>
|
||
))}
|
||
</tbody>
|
||
</table>
|
||
</div>
|
||
)
|
||
}
|
||
|
||
// 辅助函数:根据completeness返回图标
|
||
function getFieldIcon(assessment: any): ReactNode {
|
||
const { completeness } = assessment
|
||
|
||
switch (completeness) {
|
||
case '完整':
|
||
return <CheckCircle className="text-green-600" />
|
||
case '不完整':
|
||
return <ExclamationCircle className="text-yellow-600" />
|
||
case '缺失':
|
||
return <CloseCircle className="text-red-600" />
|
||
default:
|
||
return <QuestionCircle className="text-gray-400" />
|
||
}
|
||
}
|
||
```
|
||
|
||
#### 7.2.3 ReviewModal - 双视图原文审查
|
||
|
||
```tsx
|
||
// frontend-v2/src/modules/asl/components/screening/ReviewModal.tsx
|
||
|
||
interface ReviewModalProps {
|
||
visible: boolean
|
||
result: ScreeningResult
|
||
field: string // 'field1'-'field12'
|
||
onClose: () => void
|
||
}
|
||
|
||
export const ReviewModal: React.FC<ReviewModalProps> = ({
|
||
visible,
|
||
result,
|
||
field,
|
||
onClose
|
||
}) => {
|
||
|
||
return (
|
||
<Modal
|
||
open={visible}
|
||
onCancel={onClose}
|
||
width="90vw"
|
||
footer={null}
|
||
title={`原文审查 - ${result?.literature.title}`}
|
||
>
|
||
<div className="flex h-[80vh]">
|
||
{/* 左侧:PDF全文 */}
|
||
<div className="w-1/2 border-r overflow-y-auto p-4">
|
||
<h3 className="font-bold mb-2">全文</h3>
|
||
<PDFViewer
|
||
pdfUrl={result?.literature.pdfUrl}
|
||
highlightEvidence={getFieldEvidence(result, field)}
|
||
/>
|
||
</div>
|
||
|
||
{/* 右侧:双模型判断 */}
|
||
<div className="w-1/2 overflow-y-auto p-4 bg-gray-50">
|
||
<h3 className="font-bold mb-4">
|
||
12字段评估详情 - {field}
|
||
</h3>
|
||
|
||
<div className="space-y-6">
|
||
{/* DeepSeek判断 */}
|
||
<div className="model-panel border rounded-lg p-4 bg-white">
|
||
<h4 className="font-bold mb-3">DeepSeek</h4>
|
||
<FieldAssessmentDetail
|
||
assessment={result?.modelAResult[field]}
|
||
field={field}
|
||
/>
|
||
</div>
|
||
|
||
{/* Qwen判断 */}
|
||
<div className="model-panel border rounded-lg p-4 bg-white">
|
||
<h4 className="font-bold mb-3">Qwen</h4>
|
||
<FieldAssessmentDetail
|
||
assessment={result?.modelBResult[field]}
|
||
field={field}
|
||
/>
|
||
</div>
|
||
|
||
{/* 冲突标记 */}
|
||
{result?.conflictFields?.includes(field) && (
|
||
<Alert
|
||
type="warning"
|
||
message="双模型判断不一致"
|
||
description="请仔细核对全文,做出最终决策"
|
||
/>
|
||
)}
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</Modal>
|
||
)
|
||
}
|
||
|
||
// 字段评估详情组件
|
||
const FieldAssessmentDetail: React.FC<{
|
||
assessment: any
|
||
field: string
|
||
}> = ({ assessment, field }) => {
|
||
return (
|
||
<div className="field-detail">
|
||
<div className="mb-2">
|
||
<strong>存在性:</strong>
|
||
<Tag color={assessment.present ? 'green' : 'red'}>
|
||
{assessment.present ? '存在' : '不存在'}
|
||
</Tag>
|
||
</div>
|
||
|
||
<div className="mb-2">
|
||
<strong>完整性:</strong>
|
||
<Tag color={
|
||
assessment.completeness === '完整' ? 'green' :
|
||
assessment.completeness === '不完整' ? 'orange' : 'red'
|
||
}>
|
||
{assessment.completeness}
|
||
</Tag>
|
||
</div>
|
||
|
||
{field === 'field9' && (
|
||
<div className="mb-2">
|
||
<strong>可提取性:</strong>
|
||
<Tag color={assessment.extractable ? 'green' : 'red'}>
|
||
{assessment.extractable ? '可提取' : '不可提取'}
|
||
</Tag>
|
||
</div>
|
||
)}
|
||
|
||
<div className="mt-3 p-3 bg-gray-50 rounded">
|
||
<strong>说明:</strong>
|
||
<p className="mt-1 text-sm">{assessment.note}</p>
|
||
</div>
|
||
</div>
|
||
)
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 8. 开发排期
|
||
|
||
### 8.1 Week 1:通用能力层 + 数据库 + 后端核心
|
||
|
||
#### Day 1(2025-11-22 周五):通用能力层 - PDF存储 ✅ 已完成
|
||
|
||
**目标**:包装现有PDF能力为统一服务接口
|
||
|
||
**⚠️ 重要**:不是重新开发,是**包装现有能力** ✅
|
||
|
||
**任务**:
|
||
- [x] 创建 `PDFStorageService.ts`(统一接口,包装现有服务)✅
|
||
- [x] 实现 `DifyPDFStorageAdapter.ts`(复用DifyClient)✅
|
||
- [x] 实现 `OSSPDFStorageAdapter.ts`(预留)✅
|
||
- [x] 实现 `PDFStorageFactory.ts`(工厂类)✅
|
||
- [x] 集成现有服务:✅
|
||
- [x] 复用 `ExtractionClient.ts`(已实现)✅
|
||
- [x] 复用 `DifyClient.ts`(已实现)✅
|
||
- [x] 复用 `storage`(平台基础层)✅
|
||
- [x] 编写单元测试 ✅
|
||
- [x] 环境配置(.env)✅
|
||
|
||
**产出**:
|
||
- 6个文件(~500行代码,包含types和index)
|
||
- 测试覆盖率>80% ✅
|
||
|
||
**复用清单**:
|
||
- ✅ `backend/src/common/document/ExtractionClient.ts`
|
||
- ✅ `backend/src/common/rag/DifyClient.ts`
|
||
- ✅ `backend/src/common/storage/`
|
||
- ✅ `extraction_service/`(Python微服务)
|
||
|
||
---
|
||
|
||
#### Day 2(2025-11-22 周五):通用能力层 - LLM服务 ✅ 已完成
|
||
|
||
**目标**:实现LLM处理12字段服务
|
||
|
||
**任务**:
|
||
- [x] 创建 `LLM12FieldsService.ts` ✅
|
||
- [x] 实现screening模式Prompt加载 ✅
|
||
- [x] 实现extraction模式Prompt加载(预留)✅
|
||
- [x] 集成LLM适配器(复用现有)✅
|
||
- [x] DeepSeek-V3(已实现)✅
|
||
- [x] Qwen3-Max(已实现)✅
|
||
- [x] LLMFactory(已实现)✅
|
||
- [x] 实现缓存策略(复用cache服务)✅
|
||
- [x] 实现成本计算 ✅
|
||
- [x] 编写Prompt模板(12字段screening)⭐ 核心工作 ✅
|
||
- [x] 编写单元测试 ✅
|
||
- [x] **额外完成**:PromptBuilder服务(动态Prompt组装)✅
|
||
- [x] **额外完成**:3层JSON解析策略(json-repair集成)✅
|
||
- [x] **额外完成**:Promise.allSettled双模型容错 ✅
|
||
|
||
**复用清单**:
|
||
- ✅ `backend/src/common/llm/adapters/DeepSeekAdapter.ts`
|
||
- ✅ `backend/src/common/llm/adapters/QwenAdapter.ts`
|
||
- ✅ `backend/src/common/llm/LLMFactory.ts`
|
||
- ✅ `backend/src/common/cache/`
|
||
|
||
**产出**:
|
||
- `LLM12FieldsService.ts`(~400行)
|
||
- 2个Prompt文件(~500行)
|
||
- 测试覆盖率>80%
|
||
|
||
---
|
||
|
||
#### Day 3(2025-11-22 周五):通用能力层 - 验证服务 ✅ 已完成
|
||
|
||
**目标**:完成验证服务和冲突检测
|
||
|
||
**任务**:
|
||
- [x] 创建 `MedicalLogicValidator.ts` ✅
|
||
- [x] 5条医学逻辑验证规则 ✅
|
||
- [x] safeGetFieldValue容错机制 ✅
|
||
- [x] 创建 `EvidenceChainValidator.ts` ✅
|
||
- [x] 证据链完整性验证 ✅
|
||
- [x] 引用长度和位置验证 ✅
|
||
- [x] null/undefined安全处理 ✅
|
||
- [x] 创建 `ConflictDetectionService.ts` ✅
|
||
- [x] 实现screening冲突检测 ✅
|
||
- [x] 实现extraction冲突检测(预留)✅
|
||
- [x] 冲突严重程度分级 ✅
|
||
- [x] 复核优先级计算 ✅
|
||
- [x] logger容错修复 ✅
|
||
- [x] 编写单元测试 ✅
|
||
- [x] 集成测试(通用能力层整体)✅
|
||
- [x] integration-test.ts(完整测试)✅
|
||
- [x] quick-test.ts(快速测试)✅
|
||
- [x] cached-result-test.ts(容错验证)✅
|
||
|
||
**产出**:
|
||
- 3个验证服务(~1,300行代码)
|
||
- 3个测试文件(~600行代码)
|
||
- 通用能力层核心完成✅
|
||
|
||
**注**:AsyncTaskService延后到Day 4实现(批处理服务中)
|
||
|
||
---
|
||
|
||
#### Day 4(2025-11-28 周四):数据库设计 + 全文复筛业务逻辑
|
||
|
||
**目标**:完成数据库迁移和核心业务逻辑
|
||
|
||
**上午任务(数据库)**:
|
||
- [ ] 设计Prisma Schema
|
||
- [ ] `AslFulltextScreeningTask` 表
|
||
- [ ] `AslFulltextScreeningResult` 表
|
||
- [ ] `AslLiterature` 表更新(PDF存储字段)
|
||
- [ ] 执行迁移:`npx prisma migrate dev --name add_fulltext_screening`
|
||
- [ ] 验证表结构
|
||
|
||
**下午任务(业务逻辑)**:
|
||
- [ ] 创建 `FulltextScreeningService.ts`
|
||
- [ ] `processTask()` - 任务处理入口
|
||
- [ ] `screenLiterature()` - 单篇文献筛选
|
||
- [ ] `updateTaskProgress()` - 进度更新
|
||
- [ ] `completeTask()` - 任务完成
|
||
- [ ] 集成通用能力层服务
|
||
|
||
**产出**:
|
||
- 3个数据表
|
||
- `FulltextScreeningService.ts`(~600行)
|
||
|
||
---
|
||
|
||
#### Day 5(2025-11-29 周五):后端API实现
|
||
|
||
**目标**:完成全文复筛API(5个核心接口)
|
||
|
||
**任务**:
|
||
- [ ] 创建 `FulltextScreeningController.ts`
|
||
- [ ] `createTask()` - 创建任务
|
||
- [ ] `getTaskProgress()` - 获取进度
|
||
- [ ] `getTaskResults()` - 获取结果
|
||
- [ ] `updateDecision()` - 人工审核决策
|
||
- [ ] `exportExcel()` - 导出Excel
|
||
- [ ] 创建 `fulltext-screening.ts` 路由
|
||
- [ ] API测试(Postman)
|
||
- [ ] 错误处理完善
|
||
|
||
**产出**:
|
||
- 5个API接口
|
||
- API测试通过
|
||
- 后端完成✅
|
||
|
||
---
|
||
|
||
### 8.2 Week 2:前端开发 + 联调测试
|
||
|
||
#### Day 6(2025-12-02 周一):前端核心组件(上)
|
||
|
||
**目标**:实现设置页面和审核工作台基础
|
||
|
||
**任务**:
|
||
- [ ] 创建 `FulltextScreeningSettings.tsx`
|
||
- [ ] PICOS标准展示
|
||
- [ ] 全文获取管理表格
|
||
- [ ] 开始复筛按钮
|
||
- [ ] 创建 `ScreeningTable.tsx`(复用title-screening)
|
||
- [ ] 双行表头
|
||
- [ ] 主行渲染
|
||
- [ ] 展开行渲染
|
||
- [ ] 判断图标点击
|
||
- [ ] API集成(创建任务)
|
||
|
||
**产出**:
|
||
- 2个页面组件
|
||
- 表格组件基础完成
|
||
|
||
---
|
||
|
||
#### Day 7(2025-12-03 周二):前端核心组件(下)
|
||
|
||
**目标**:完成审核工作台和PDF查看器
|
||
|
||
**任务**:
|
||
- [ ] 创建 `FulltextScreeningWorkbench.tsx`
|
||
- [ ] 进度条
|
||
- [ ] 任务状态轮询
|
||
- [ ] 结果加载
|
||
- [ ] 创建 `PDFViewer.tsx`(react-pdf)
|
||
- [ ] PDF渲染
|
||
- [ ] 页面翻页
|
||
- [ ] 证据高亮
|
||
- [ ] 创建 `ReviewModal.tsx`
|
||
- [ ] 双视图布局
|
||
- [ ] 12字段详情展示
|
||
- [ ] 冲突标记
|
||
- [ ] API集成(获取进度、获取结果)
|
||
|
||
**产出**:
|
||
- 审核工作台完成
|
||
- PDF查看器完成
|
||
- 双视图模态框完成
|
||
|
||
---
|
||
|
||
#### Day 8(2025-12-04 周三):前端结果页面 + Excel导出
|
||
|
||
**目标**:完成复筛结果页面
|
||
|
||
**任务**:
|
||
- [ ] 创建 `FulltextScreeningResults.tsx`
|
||
- [ ] 统计卡片
|
||
- [ ] PRISMA排除统计图表
|
||
- [ ] Tab切换(纳入/排除)
|
||
- [ ] 结果列表
|
||
- [ ] 创建 `ExcelExporter.ts`
|
||
- [ ] 双Sheet导出(纳入+排除)
|
||
- [ ] PRISMA统计Sheet
|
||
- [ ] API集成(导出Excel)
|
||
- [ ] 路由配置
|
||
|
||
**产出**:
|
||
- 结果页面完成
|
||
- Excel导出完成
|
||
- 前端完成✅
|
||
|
||
---
|
||
|
||
#### Day 9(2025-12-05 周四):前后端联调
|
||
|
||
**目标**:完整流程测试
|
||
|
||
**任务**:
|
||
- [ ] 准备测试数据(30篇文献)
|
||
- [ ] 完整流程测试
|
||
- [ ] 创建任务
|
||
- [ ] PDF上传(测试Dify集成)
|
||
- [ ] 任务执行(测试双模型调用)
|
||
- [ ] 进度更新
|
||
- [ ] 审核工作台(测试冲突检测)
|
||
- [ ] 人工决策
|
||
- [ ] 结果导出
|
||
- [ ] Bug修复
|
||
- [ ] 性能优化(缓存、并发)
|
||
|
||
**产出**:
|
||
- 测试报告
|
||
- Bug列表
|
||
|
||
---
|
||
|
||
#### Day 10(2025-12-06 周五):优化 + 文档
|
||
|
||
**目标**:代码优化和文档完善
|
||
|
||
**任务**:
|
||
- [ ] 代码优化
|
||
- [ ] 错误处理完善
|
||
- [ ] 日志输出优化
|
||
- [ ] 类型定义完善
|
||
- [ ] UI/UX优化
|
||
- [ ] 加载状态
|
||
- [ ] 空状态
|
||
- [ ] 错误提示
|
||
- [ ] 文档编写
|
||
- [ ] API文档
|
||
- [ ] 用户使用手册
|
||
- [ ] 开发者文档
|
||
- [ ] 代码Review
|
||
|
||
**产出**:
|
||
- 代码质量提升
|
||
- 完整文档
|
||
- **全文复筛MVP完成**✅
|
||
|
||
---
|
||
|
||
## 9. 技术要点
|
||
|
||
### 9.1 核心技术挑战
|
||
|
||
#### 9.1.1 LLM成本控制
|
||
|
||
**问题**:双模型全文筛选需要控制成本(DeepSeek-V3 + Qwen3-Max)
|
||
|
||
**解决方案**:
|
||
1. **三级缓存策略**
|
||
- L1: 内存缓存(开发环境)
|
||
- L2: Redis缓存(生产环境)
|
||
- L3: 数据库缓存(永久存储)
|
||
|
||
2. **统一模型配置**(MVP阶段)
|
||
```typescript
|
||
// MVP阶段:统一使用成本友好的模型组合
|
||
const DEFAULT_MODELS = {
|
||
modelA: 'deepseek-v3', // ¥1/百万tokens,性价比极高
|
||
modelB: 'qwen-max' // ¥4/百万tokens,中文友好
|
||
}
|
||
|
||
// 成本对比:
|
||
// - DeepSeek-V3 + Qwen-Max: ≈¥0.05/篇(10K tokens)
|
||
// - GPT-4o + Claude-4.5: ≈¥3.2/篇(10K tokens)
|
||
// 节省成本:64倍 ⭐
|
||
```
|
||
|
||
3. **批量预取**
|
||
- 一次性获取多篇文献全文
|
||
- 减少API调用次数
|
||
|
||
#### 9.1.2 Serverless超时问题
|
||
|
||
**问题**:30篇文献筛选可能需要15-20分钟,超过Serverless限制(30秒)
|
||
|
||
**解决方案**:
|
||
- ✅ **异步任务模式**(已实现)
|
||
- 创建任务立即返回(<1秒)
|
||
- 后台异步处理
|
||
- 前端轮询进度
|
||
|
||
#### 9.1.3 PDF处理性能
|
||
|
||
**问题**:PDF提取可能较慢(Nougat需要40秒/篇)
|
||
|
||
**解决方案**:
|
||
1. **智能提取策略**
|
||
```typescript
|
||
// 语言检测 → 选择提取方法
|
||
if (language === 'chinese') {
|
||
method = 'pymupdf' // 快速(2秒/篇)
|
||
} else {
|
||
try {
|
||
method = 'nougat' // 高质量(40秒/篇)
|
||
} catch {
|
||
method = 'pymupdf' // 降级
|
||
}
|
||
}
|
||
```
|
||
|
||
2. **提前批量提取**
|
||
- 标题初筛完成后,后台自动提取全文
|
||
- 全文复筛时直接使用
|
||
|
||
### 9.2 数据一致性保证
|
||
|
||
#### 9.2.1 事务处理
|
||
|
||
```typescript
|
||
// 使用Prisma事务保证一致性
|
||
await prisma.$transaction(async (tx) => {
|
||
// 1. 创建筛选结果
|
||
const result = await tx.aslFulltextScreeningResult.create({ ... })
|
||
|
||
// 2. 更新文献阶段
|
||
await tx.aslLiterature.update({
|
||
where: { id: literatureId },
|
||
data: { stage: 'fulltext_screened' }
|
||
})
|
||
|
||
// 3. 更新任务统计
|
||
await tx.aslFulltextScreeningTask.update({
|
||
where: { id: taskId },
|
||
data: { processedCount: { increment: 1 } }
|
||
})
|
||
})
|
||
```
|
||
|
||
#### 9.2.2 失败重试
|
||
|
||
```typescript
|
||
// 指数退避重试
|
||
async function retryWithBackoff<T>(
|
||
fn: () => Promise<T>,
|
||
maxRetries: number = 3
|
||
): Promise<T> {
|
||
for (let i = 0; i < maxRetries; i++) {
|
||
try {
|
||
return await fn()
|
||
} catch (error) {
|
||
if (i === maxRetries - 1) throw error
|
||
|
||
const delay = Math.pow(2, i) * 1000 // 1s, 2s, 4s
|
||
await sleep(delay)
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 9.3 前端性能优化
|
||
|
||
#### 9.3.1 虚拟滚动
|
||
|
||
```tsx
|
||
// 使用react-window处理大量文献
|
||
import { FixedSizeList } from 'react-window'
|
||
|
||
<FixedSizeList
|
||
height={600}
|
||
itemCount={results.length}
|
||
itemSize={100}
|
||
width="100%"
|
||
>
|
||
{({ index, style }) => (
|
||
<div style={style}>
|
||
<ResultRow result={results[index]} />
|
||
</div>
|
||
)}
|
||
</FixedSizeList>
|
||
```
|
||
|
||
#### 9.3.2 懒加载
|
||
|
||
```tsx
|
||
// PDF查看器懒加载
|
||
const PDFViewer = lazy(() => import('./PDFViewer'))
|
||
|
||
<Suspense fallback={<Spin />}>
|
||
<PDFViewer pdfUrl={url} />
|
||
</Suspense>
|
||
```
|
||
|
||
---
|
||
|
||
## 10. 风险与注意事项
|
||
|
||
### 10.1 技术风险
|
||
|
||
| 风险 | 等级 | 缓解措施 |
|
||
|------|------|---------|
|
||
| **Dify存储限制** | 🟡 中 | 适配器模式预留OSS迁移 |
|
||
| **LLM成本超预算** | 🟡 中 | 三级缓存 + 智能模型选择 |
|
||
| **Python微服务不稳定** | 🟡 中 | 重试机制 + 降级策略 |
|
||
| **前端PDF渲染性能** | 🟢 低 | react-pdf + 虚拟滚动 |
|
||
|
||
### 10.2 开发注意事项
|
||
|
||
#### 10.2.1 云原生规范
|
||
|
||
```typescript
|
||
// ✅ 正确:使用平台服务
|
||
import { storage, logger, cache, jobQueue } from '@/common'
|
||
|
||
// ❌ 错误:本地文件存储
|
||
fs.writeFileSync('./uploads/file.pdf', buffer) // 禁止!
|
||
```
|
||
|
||
#### 10.2.2 数据库设计
|
||
|
||
```prisma
|
||
// ✅ 正确:指定Schema
|
||
model AslFulltextScreeningTask {
|
||
@@schema("asl_schema") // 必须指定
|
||
// ...
|
||
}
|
||
|
||
// ❌ 错误:不指定Schema
|
||
model FulltextScreeningTask {
|
||
// 会放到public,违反隔离原则
|
||
}
|
||
```
|
||
|
||
#### 10.2.3 Git提交规范
|
||
|
||
```bash
|
||
# ✅ 正确:完成功能后统一提交
|
||
git commit -m "feat(asl): 完成全文复筛功能
|
||
|
||
- 实现通用能力层(PDF存储、LLM服务、冲突检测)
|
||
- 实现全文复筛业务逻辑
|
||
- 完成前端审核工作台
|
||
- 添加Excel导出功能
|
||
|
||
Tested: 完整流程测试通过"
|
||
|
||
# ❌ 错误:频繁碎片化提交
|
||
git commit -m "fix bug" # 每改一次就提交
|
||
```
|
||
|
||
### 10.3 未来迁移注意事项
|
||
|
||
#### 10.3.1 Dify → OSS迁移
|
||
|
||
```bash
|
||
# 只需修改环境变量
|
||
PDF_STORAGE_TYPE=oss # 从dify改为oss
|
||
|
||
# 业务代码零改动✅
|
||
# 数据库字段兼容✅(pdfStorageType: 'dify' or 'oss')
|
||
```
|
||
|
||
#### 10.3.2 全文提取模块复用
|
||
|
||
```typescript
|
||
// 全文提取模块可以直接复用通用能力层
|
||
import {
|
||
PDFStorageService, // ✅ 复用
|
||
LLM12FieldsService, // ✅ 复用(切换到extraction模式)
|
||
ConflictDetectionService, // ✅ 复用
|
||
AsyncTaskService // ✅ 复用
|
||
} from '@/modules/asl/common/services'
|
||
|
||
// 只需开发extraction特定的业务逻辑
|
||
export class DataExtractionService {
|
||
// 调用通用服务
|
||
private llmService = new LLM12FieldsService()
|
||
|
||
async extractLiterature(literatureId: string) {
|
||
// 使用extraction模式
|
||
const result = await this.llmService.process12Fields(
|
||
LLM12FieldsMode.EXTRACTION, // ← 切换模式
|
||
'gpt-4o',
|
||
fullText,
|
||
context
|
||
)
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 附录
|
||
|
||
### A. 相关文档
|
||
|
||
| 文档 | 路径 | 说明 |
|
||
|------|------|------|
|
||
| 全文复筛需求 | `01-需求分析/03-全文复筛需求详述.md` | 功能需求 |
|
||
| 12字段模板 | `01-需求分析/全文复筛及全文提取模版.txt` | 循证医学模板 |
|
||
| UI原型 | `03-UI设计/AI智能文献-全文复筛.html` | 前端原型 |
|
||
| 云原生规范 | `docs/04-开发规范/08-云原生开发规范.md` | 必读 |
|
||
| 系统架构 | `docs/00-系统总体设计/00-系统当前状态与开发指南.md` | 必读 |
|
||
|
||
### B. 环境配置示例
|
||
|
||
```bash
|
||
# .env.development
|
||
|
||
# PDF存储(MVP阶段使用Dify)
|
||
PDF_STORAGE_TYPE=dify
|
||
DIFY_API_KEY=app-xxx
|
||
DIFY_BASE_URL=http://localhost:5001/v1
|
||
DIFY_ASL_DATASET_ID=dataset-xxx
|
||
|
||
# Python微服务
|
||
EXTRACTION_SERVICE_URL=http://localhost:8000
|
||
|
||
# LLM配置
|
||
CLOSEAI_API_KEY=sk-xxx
|
||
CLOSEAI_OPENAI_BASE_URL=https://api.openai-proxy.org/v1
|
||
CLOSEAI_CLAUDE_BASE_URL=https://api.openai-proxy.org/anthropic
|
||
|
||
# 缓存配置
|
||
CACHE_TYPE=memory # memory/redis
|
||
REDIS_URL=redis://localhost:6379
|
||
|
||
# 数据库
|
||
DATABASE_URL=postgresql://user:pass@localhost:5432/asl_db
|
||
```
|
||
|
||
### C. 测试数据准备
|
||
|
||
```sql
|
||
-- 准备30篇测试文献
|
||
INSERT INTO asl_schema."AslLiterature" (
|
||
id, project_id, pmid, title, abstract,
|
||
has_pdf, pdf_storage_type, pdf_storage_ref, pdf_status,
|
||
full_text, full_text_token_count, stage
|
||
) VALUES
|
||
('lit-001', 'proj-123', 'PMID12345', '...', '...',
|
||
true, 'dify', 'doc-001', 'ready',
|
||
'全文内容...', 8500, 'pdf_acquired'),
|
||
-- ... 29条记录
|
||
```
|
||
|
||
---
|
||
|
||
**文档维护者:** ASL开发团队
|
||
**最后更新:** 2025-11-22
|
||
**文档状态:** ✅ 已完成,待开发
|
||
|
||
**📝 版本历史:**
|
||
- V1.0 (2025-11-22): 初始版本,完整开发计划
|
||
|