Files
AIclinicalresearch/docs/02-通用能力层/05-医学NLP引擎/README.md
HaHafeng beb7f7f559 feat(asl): Implement full-text screening core LLM service and validation system (Day 1-3)
Core Components:
- PDFStorageService with Dify/OSS adapters
- LLM12FieldsService with Nougat-first + dual-model + 3-layer JSON parsing
- PromptBuilder for dynamic prompt assembly
- MedicalLogicValidator with 5 rules + fault tolerance
- EvidenceChainValidator for citation integrity
- ConflictDetectionService for dual-model comparison

Prompt Engineering:
- System Prompt (6601 chars, Section-Aware strategy)
- User Prompt template (PICOS context injection)
- JSON Schema (12 fields constraints)
- Cochrane standards (not loaded in MVP)

Key Innovations:
- 3-layer JSON parsing (JSON.parse + json-repair + code block extraction)
- Promise.allSettled for dual-model fault tolerance
- safeGetFieldValue for robust field extraction
- Mixed CN/EN token calculation

Integration Tests:
- integration-test.ts (full test)
- quick-test.ts (quick test)
- cached-result-test.ts (fault tolerance test)

Documentation Updates:
- Development record (Day 2-3 summary)
- Quality assurance strategy (full-text screening)
- Development plan (progress update)
- Module status (v1.1 update)
- Technical debt (10 new items)

Test Results:
- JSON parsing success rate: 100%
- Medical logic validation: 5/5 passed
- Dual-model parallel processing: OK
- Cost per PDF: CNY 0.10

Files: 238 changed, 14383 insertions(+), 32 deletions(-)
Docs: docs/03-涓氬姟妯″潡/ASL-AI鏅鸿兘鏂囩尞/05-寮€鍙戣褰?2025-11-22_Day2-Day3_LLM鏈嶅姟涓庨獙璇佺郴缁熷紑鍙?md
2025-11-22 22:21:12 +08:00

1.1 KiB
Raw Blame History

医学NLP引擎

能力定位: 通用能力层
复用率: 14% (1个模块依赖)
优先级: P2
状态: 待实现


📋 能力概述

医学NLP引擎负责

  • 医学实体识别NER
  • 医学术语标准化
  • 疾病/药物识别

📊 依赖模块

1个模块依赖14%复用率):

  1. DC - 数据清洗整理病例数据NER提取

💡 核心功能

1. 医学实体识别

  • 疾病识别
  • 药物识别
  • 手术识别
  • TNM分期提取

2. 术语标准化

  • ICD编码
  • ATC编码

3. 关系抽取

  • 疾病-药物关系
  • 症状-疾病关系

🏗️ 技术方案

云端版(高准确率)

# 基于LLM APIClaude/GPT
# JSON Mode结构化输出

单机版(隐私优先)

# 基于spaCy + 医学模型
# 100%本地运行

🔗 相关文档


最后更新: 2025-11-06
维护人: 技术架构师