Files

HaHafeng beb7f7f559 feat(asl): Implement full-text screening core LLM service and validation system (Day 1-3)

Core Components:
- PDFStorageService with Dify/OSS adapters
- LLM12FieldsService with Nougat-first + dual-model + 3-layer JSON parsing
- PromptBuilder for dynamic prompt assembly
- MedicalLogicValidator with 5 rules + fault tolerance
- EvidenceChainValidator for citation integrity
- ConflictDetectionService for dual-model comparison

Prompt Engineering:
- System Prompt (6601 chars, Section-Aware strategy)
- User Prompt template (PICOS context injection)
- JSON Schema (12 fields constraints)
- Cochrane standards (not loaded in MVP)

Key Innovations:
- 3-layer JSON parsing (JSON.parse + json-repair + code block extraction)
- Promise.allSettled for dual-model fault tolerance
- safeGetFieldValue for robust field extraction
- Mixed CN/EN token calculation

Integration Tests:
- integration-test.ts (full test)
- quick-test.ts (quick test)
- cached-result-test.ts (fault tolerance test)

Documentation Updates:
- Development record (Day 2-3 summary)
- Quality assurance strategy (full-text screening)
- Development plan (progress update)
- Module status (v1.1 update)
- Technical debt (10 new items)

Test Results:
- JSON parsing success rate: 100%
- Medical logic validation: 5/5 passed
- Dual-model parallel processing: OK
- Cost per PDF: CNY 0.10

Files: 238 changed, 14383 insertions(+), 32 deletions(-)
Docs: docs/03-涓氬姟妯″潡/ASL-AI鏅鸿兘鏂囩尞/05-寮€鍙戣褰?2025-11-22_Day2-Day3_LLM鏈嶅姟涓庨獙璇佺郴缁熷紑鍙?md

2025-11-22 22:21:12 +08:00

1.8 KiB

Raw Blame History

RAG引擎

能力定位： 通用能力层
复用率： 43% (3个模块依赖)
优先级： P1
状态： ✅ 已实现（基于Dify）

📋 能力概述

RAG引擎负责：

向量化存储（Embedding）
语义检索（Semantic Search）
检索增强生成（RAG）
Rerank重排序

📊 依赖模块

3个模块依赖（43%复用率）：

AIA - AI智能问答（@知识库问答）
ASL - AI智能文献（文献内容检索）
PKB - 个人知识库（RAG问答）

💡 核心功能

1. 向量化存储

基于Dify平台
Qdrant向量数据库（Dify内置）

2. 语义检索

Top-K检索
相关度评分
多知识库联合检索

3. RAG问答

检索 + 生成
智能引用系统（100%准确溯源）

🏗️ 技术架构

基于Dify平台：

// DifyClient封装
interface RAGEngine {
  // 创建知识库
  createDataset(name: string): Promise<string>;
  
  // 上传文档
  uploadDocument(datasetId: string, file: File): Promise<string>;
  
  // 语义检索
  search(datasetId: string, query: string, topK?: number): Promise<SearchResult[]>;
  
  // RAG问答
  chatWithRAG(datasetId: string, query: string): Promise<string>;
}

📈 优化成果

检索参数优化：

指标	优化前	优化后	提升
检索数量	3 chunks	15 chunks	5倍
Chunk大小	500 tokens	1500 tokens	3倍
总覆盖	1,500 tokens	22,500 tokens	15倍
覆盖率	~5%	~40-50%	8-10倍

🔗 相关文档

最后更新： 2025-11-06
维护人： 技术架构师

1.8 KiB Raw Blame History Unescape Escape

RAG引擎

📋 能力概述

📊 依赖模块

💡 核心功能

1. 向量化存储

2. 语义检索

3. RAG问答

🏗️ 技术架构

📈 优化成果

🔗 相关文档

1.8 KiB

Raw Blame History