Files
AIclinicalresearch/docs/02-通用能力层/README.md
HaHafeng beb7f7f559 feat(asl): Implement full-text screening core LLM service and validation system (Day 1-3)
Core Components:
- PDFStorageService with Dify/OSS adapters
- LLM12FieldsService with Nougat-first + dual-model + 3-layer JSON parsing
- PromptBuilder for dynamic prompt assembly
- MedicalLogicValidator with 5 rules + fault tolerance
- EvidenceChainValidator for citation integrity
- ConflictDetectionService for dual-model comparison

Prompt Engineering:
- System Prompt (6601 chars, Section-Aware strategy)
- User Prompt template (PICOS context injection)
- JSON Schema (12 fields constraints)
- Cochrane standards (not loaded in MVP)

Key Innovations:
- 3-layer JSON parsing (JSON.parse + json-repair + code block extraction)
- Promise.allSettled for dual-model fault tolerance
- safeGetFieldValue for robust field extraction
- Mixed CN/EN token calculation

Integration Tests:
- integration-test.ts (full test)
- quick-test.ts (quick test)
- cached-result-test.ts (fault tolerance test)

Documentation Updates:
- Development record (Day 2-3 summary)
- Quality assurance strategy (full-text screening)
- Development plan (progress update)
- Module status (v1.1 update)
- Technical debt (10 new items)

Test Results:
- JSON parsing success rate: 100%
- Medical logic validation: 5/5 passed
- Dual-model parallel processing: OK
- Cost per PDF: CNY 0.10

Files: 238 changed, 14383 insertions(+), 32 deletions(-)
Docs: docs/03-涓氬姟妯″潡/ASL-AI鏅鸿兘鏂囩尞/05-寮€鍙戣褰?2025-11-22_Day2-Day3_LLM鏈嶅姟涓庨獙璇佺郴缁熷紑鍙?md
2025-11-22 22:21:12 +08:00

106 lines
2.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 通用能力层
> **层级定义:** 跨业务模块共享的核心技术能力
> **核心原则:** 可复用、高内聚、独立部署
---
## 📋 能力清单
| 能力 | 说明 | 复用率 | 优先级 | 状态 |
|------|------|-------|--------|------|
| **01-LLM大模型网关** | 统一管理LLM调用、成本控制、模型切换 | 71% (5/7) | P0 | ⏳ 待实现 |
| **02-文档处理引擎** | PDF/Docx/Txt提取、OCR、表格提取 | 86% (6/7) | P0 | ✅ 已实现 |
| **03-RAG引擎** | 向量检索、语义搜索、RAG问答 | 43% (3/7) | P1 | ✅ 已实现 |
| **04-数据ETL引擎** | Excel JOIN、数据清洗、数据转换 | 29% (2/7) | P2 | ⏳ 待实现 |
| **05-医学NLP引擎** | 医学实体识别、术语标准化 | 14% (1/7) | P2 | ⏳ 待实现 |
---
## 🎯 设计原则
### 1. 可复用性
- 多个业务模块共享
- 避免重复开发
### 2. 独立部署
- 可以独立为微服务
- 支持独立扩展
### 3. 高内聚
- 每个能力职责单一
- 接口清晰
### 4. 领域知识
- 包含业务领域知识
- 不是纯技术组件
---
## 📊 复用率分析
**LLM网关** - 71%复用率(最高优先级)
- AIAAI智能问答
- ASLAI智能文献
- PKB个人知识库
- DC数据清洗
- RVW稿件审查
**文档处理引擎** - 86%复用率(已实现)
- ASL、PKB、DC、SSA、ST、RVW
**RAG引擎** - 43%复用率(已实现)
- AIA、ASL、PKB
---
## 📚 快速导航
### 快速上下文
- **[AI对接] 通用能力快速上下文.md** - 2-3分钟了解通用能力层
### 核心能力
1. [LLM大模型网关](./01-LLM大模型网关/README.md) - P0优先级 ⭐
2. [文档处理引擎](./02-文档处理引擎/README.md) - 已实现
3. [RAG引擎](./03-RAG引擎/README.md) - 已实现
4. [数据ETL引擎](./04-数据ETL引擎/README.md)
5. [医学NLP引擎](./05-医学NLP引擎/README.md)
---
## 🔗 相关文档
- [系统架构分层设计](../00-系统总体设计/01-系统架构分层设计.md)
- [平台基础层](../01-平台基础层/README.md)
- [业务模块层](../03-业务模块/README.md)
---
**最后更新:** 2025-11-06
**维护人:** 技术架构师