feat(aia): Complete AIA V2.0 with universal streaming capabilities
Major Changes: - Add StreamingService with OpenAI Compatible format - Upgrade Chat component V2 with Ant Design X integration - Implement AIA module with 12 intelligent agents - Update API routes to unified /api/v1 prefix - Update system documentation Backend (~1300 lines): - common/streaming: OpenAI Compatible adapter - modules/aia: 12 agents, conversation service, streaming integration - Update route versions (RVW, PKB to v1) Frontend (~3500 lines): - modules/aia: AgentHub + ChatWorkspace (100% prototype restoration) - shared/Chat: AIStreamChat, ThinkingBlock, useAIStream Hook - Update API endpoints to v1 Documentation: - AIA module status guide - Universal capabilities catalog - System overview updates - All module documentation sync Tested: Stream response verified, authentication working Status: AIA V2.0 core completed (85%)
This commit is contained in:
@@ -1,86 +1,86 @@
|
||||
# ASL <20><>讃憭<E8AE83><E686AD><EFBFBD><EFBFBD><EFBFBD>舫<EFBFBD>匧<EFBFBD>
|
||||
|
||||
> **文档版本:** V1.0
|
||||
> **创建日期:** 2025-11-15
|
||||
> **适用模块:** AI 智能文献(ASL)
|
||||
> **目标:** 定义初筛、全文复筛、全文提取的技术栈和实现路径
|
||||
> **<EFBFBD><EFBFBD>﹝<EFBFBD><EFBFBD>𧋦嚗?* V1.0
|
||||
> **<EFBFBD>𥕦遣<EFBFBD>交<EFBFBD>嚗?* 2025-11-15
|
||||
> **<EFBFBD><EFBFBD>鍂璅∪<EFBFBD>嚗?* AI <20>箄<EFBFBD><E7AE84><EFBFBD>讃嚗㇁SL嚗?
|
||||
> **<EFBFBD>格<EFBFBD>嚗?* 摰帋<E691B0><E5B88B>萘<EFBFBD><E89098><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝑䜘<E89D91><E49C98><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>𣇉<EFBFBD><F0A38789><EFBFBD><EFBFBD>舀<EFBFBD><E88880><EFBFBD><EFBFBD><EFBFBD>啗楝敺?
|
||||
|
||||
---
|
||||
|
||||
## <20><> <20><>﹝璁<EFB99D>膩
|
||||
|
||||
ASL 模块涉及三种不同的文献处理场景,每种场景有不同的技术特点和实现方案:
|
||||
ASL 璅∪<EFBFBD>瘨匧<EFBFBD>銝厩<EFBFBD>銝滚<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>桀<EFBFBD><EFBFBD><EFBFBD>㦤<EFBFBD>荔<EFBFBD>瘥讐<EFBFBD><EFBFBD>箸艶<EFBFBD>劐<EFBFBD><EFBFBD>𣬚<EFBFBD><EFBFBD><EFBFBD><EFBFBD>舐鸌<EFBFBD>孵<EFBFBD>摰䂿緵<EFBFBD>寞<EFBFBD>嚗?
|
||||
|
||||
| 场景 | 输入格式 | 核心技术 | 主要挑战 |
|
||||
| <EFBFBD>箸艶 | 颲枏<E9A2B2><E69E8F>澆<EFBFBD> | <20>詨<EFBFBD><E8A9A8><EFBFBD><EFBFBD>?| 銝餉<E98A9D><E9A489>烐<EFBFBD> |
|
||||
|------|---------|---------|---------|
|
||||
| **标题摘要初筛** | Excel 文件 | Excel 解析 + LLM 筛选 | 批量处理效率 |
|
||||
| **全文复筛** | PDF 全文 | PDF 提取 + LLM 筛选 | PDF 解析准确率 |
|
||||
| **全文数据提取** | PDF 全文 | PDF 提取 + LLM 结构化提取 | 表格、公式准确提取 |
|
||||
| **<EFBFBD><EFBFBD><EFBFBD><EFBFBD>䁅<EFBFBD><EFBFBD>萘<EFBFBD>** | Excel <EFBFBD><EFBFBD>辣 | Excel 閫<EFBFBD><EFBFBD> + LLM 蝑偦<EFBFBD>?| <20>寥<EFBFBD>憭<EFBFBD><E686AD><EFBFBD><EFBFBD><EFBFBD> |
|
||||
| **<EFBFBD>冽<EFBFBD>憭滨<EFBFBD>** | PDF <EFBFBD>冽<EFBFBD> | PDF <EFBFBD>𣂼<EFBFBD> + LLM 蝑偦<EFBFBD>?| PDF 閫<EFBFBD><EFBFBD><EFBFBD><EFBFBD>&<EFBFBD>?|
|
||||
| **<EFBFBD>冽<EFBFBD><EFBFBD>唳旿<EFBFBD>𣂼<EFBFBD>** | PDF <EFBFBD>冽<EFBFBD> | PDF <EFBFBD>𣂼<EFBFBD> + LLM 蝏𤘪<EFBFBD><EFBFBD>𡝗<EFBFBD><EFBFBD>?| 銵冽聢<E586BD><E881A2><EFBFBD>撘誩<E69298>蝖格<E89D96><E6A0BC>?|
|
||||
|
||||
---
|
||||
|
||||
## <20>㴓 <20><><EFBFBD>舀沲<E88880><E6B2B2><EFBFBD>餉<EFBFBD>
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ ASL 文献处理流程 │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
│
|
||||
<EFBFBD>𢞖<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
|
||||
<EFBFBD>? ASL <EFBFBD><EFBFBD>讃憭<EFBFBD><EFBFBD>瘚<EFBFBD><EFBFBD> <EFBFBD>?
|
||||
<EFBFBD>婙<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
|
||||
<EFBFBD>?
|
||||
<20>鎿<EFBFBD> <20>箸艶 1: <20><><EFBFBD><EFBFBD>䁅<EFBFBD><E48185>萘<EFBFBD>
|
||||
│ └─ 用户上传 Excel → 解析 → LLM 批量筛选 → 导出结果
|
||||
│
|
||||
<EFBFBD>? <20>婙<EFBFBD> <20>冽<EFBFBD>銝𠹺<E98A9D> Excel <20>?閫<><E996AB> <20>?LLM <20>寥<EFBFBD>蝑偦<E89D91>?<3F>?撖澆枂蝏𤘪<E89D8F>
|
||||
<EFBFBD>?
|
||||
<20>鎿<EFBFBD> <20>箸艶 2: <20>冽<EFBFBD>憭滨<E686AD>
|
||||
│ └─ 用户上传 PDF → PDF 提取 → LLM 筛选 → 复核
|
||||
│
|
||||
<EFBFBD>? <20>婙<EFBFBD> <20>冽<EFBFBD>銝𠹺<E98A9D> PDF <EFBFBD>?PDF <EFBFBD>𣂼<EFBFBD> <20>?LLM 蝑偦<E89D91>?<3F>?憭齿瓲
|
||||
<EFBFBD>?
|
||||
<20>婙<EFBFBD> <20>箸艶 3: <20>冽<EFBFBD><E586BD>唳旿<E594B3>𣂼<EFBFBD>
|
||||
└─ PDF → 提取 + 结构化 → LLM 提取数据 → 人工复核
|
||||
<EFBFBD>婙<EFBFBD> PDF <EFBFBD>?<3F>𣂼<EFBFBD> + 蝏𤘪<E89D8F><F0A498AA>?<3F>?LLM <20>𣂼<EFBFBD><F0A382BC>唳旿 <20>?鈭箏極憭齿瓲
|
||||
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 技术栈分层架构(共享) │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ 前端层: React 19 + Ant Design 5 + xlsx/exceljs │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ 后端层: Node.js (Fastify) + TypeScript │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ 文档处理层: Python 微服务 (extraction_service) │
|
||||
│ ├─ PyMuPDF: 快速 PDF 提取 │
|
||||
│ ├─ Nougat: 英文科学文献高质量提取 ⭐ │
|
||||
│ └─ Language Detector: 自动语言检测 │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ LLM 层: DeepSeek-V3 + Qwen3 / GPT-5 + Claude-4.5 │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ 数据库: PostgreSQL 15 (asl_schema) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
<EFBFBD>𢞖<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD><EFBFBD><EFBFBD>舀<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>嗆<EFBFBD>嚗<EFBFBD><EFBFBD>鈭恬<EFBFBD> <EFBFBD>?
|
||||
<EFBFBD>鎿<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
|
||||
<EFBFBD>? <20>滨垢撅? React 19 + Ant Design 5 + xlsx/exceljs <EFBFBD>?
|
||||
<EFBFBD>鎿<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
|
||||
<EFBFBD>? <20>𡒊垢撅? Node.js (Fastify) + TypeScript <EFBFBD>?
|
||||
<EFBFBD>鎿<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
|
||||
<EFBFBD>? <20><>﹝憭<EFB99D><E686AD>撅? Python 敺格<EFBFBD><EFBFBD>?(extraction_service) <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>鎿<EFBFBD> PyMuPDF: 敹恍<EFBFBD>?PDF <EFBFBD>𣂼<EFBFBD> <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>鎿<EFBFBD> Nougat: <EFBFBD>望<EFBFBD>蝘穃郎<EFBFBD><EFBFBD>讃擃䁅捶<EFBFBD>𤩺<EFBFBD><EFBFBD>?潃? <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>婙<EFBFBD> Language Detector: <EFBFBD>芸𢆡霂剛<EFBFBD>璉<EFBFBD>瘚? <EFBFBD>?
|
||||
<EFBFBD>鎿<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
|
||||
<EFBFBD>? LLM 撅? DeepSeek-V3 + Qwen3 / GPT-5 + Claude-4.5 <EFBFBD>?
|
||||
<EFBFBD>鎿<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
|
||||
<EFBFBD>? <20>唳旿摨? PostgreSQL 15 (asl_schema) <EFBFBD>?
|
||||
<EFBFBD>婙<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## <20><> <20>箸艶 1: <20><><EFBFBD><EFBFBD>䁅<EFBFBD><E48185>萘<EFBFBD>
|
||||
|
||||
### 1.1 技术特点
|
||||
### 1.1 <EFBFBD><EFBFBD><EFBFBD>舐鸌<EFBFBD>?
|
||||
|
||||
- **颲枏<E9A2B2><E69E8F>澆<EFBFBD>**: Excel <20><>辣 (`.xlsx` / `.xls`)
|
||||
- **数据规模**: 50-500 篇文献/批次
|
||||
- **主要字段**: 标题、摘要、DOI、作者、发表年份、期刊
|
||||
- **<EFBFBD>唳旿閫<EFBFBD>芋**: 50-500 蝭<EFBFBD><EFBFBD><EFBFBD>?<3F>寞活
|
||||
- **銝餉<EFBFBD>摮埈挾**: <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD>閬<EFBFBD><E996AC><EFBFBD>OI<4F><49><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>銵典僑隞賬<E99A9E><E8B3AC><EFBFBD><EFBFBD>?
|
||||
- **憭<><E686AD><EFBFBD>滨<EFBFBD>**: <20>寥<EFBFBD>擃䀹<E69383>憭<EFBFBD><E686AD>嚗峕<E59A97><E5B395><EFBFBD> PDF 閫<><E996AB>
|
||||
|
||||
### 1.2 <20><><EFBFBD>舫<EFBFBD>匧<EFBFBD>
|
||||
|
||||
#### 前端:Excel 上传与解析
|
||||
#### <EFBFBD>滨垢嚗鍃xcel 銝𠹺<E98A9D>銝舘圾<E88898>?
|
||||
|
||||
| 技术 | 库 | 用途 | 优势 |
|
||||
| <EFBFBD><EFBFBD><EFBFBD>?| 摨?| <20>券<EFBFBD>?| 隡睃飵 |
|
||||
|------|-----|------|------|
|
||||
| **Excel 銝𠹺<E98A9D>** | `antd Upload` | <20><>辣銝𠹺<E98A9D>蝏<EFBFBD>辣 | <20>𡝗嗻銝𠹺<E98A9D><F0A0B9BA><EFBFBD><EFBFBD>摨行辺 |
|
||||
| **Excel 解析** | `xlsx` / `exceljs` | 前端解析 Excel | 纯前端处理,快速预览 |
|
||||
| **模板验证** | 自定义逻辑 | 校验列名和数据格式 | 提前发现格式错误 |
|
||||
| **Excel 閫<EFBFBD><EFBFBD>** | `xlsx` / `exceljs` | <EFBFBD>滨垢閫<EFBFBD><EFBFBD> Excel | 蝥臬<EFBFBD>蝡臬<EFBFBD><EFBFBD><EFBFBD><EFBFBD>敹恍<EFBFBD>罸<EFBFBD>閫?|
|
||||
| **璅⊥踎撉諹<EFBFBD>** | <EFBFBD>芸<EFBFBD>銋厰<EFBFBD>餉<EFBFBD> | <20>⊿<EFBFBD><E28ABF>堒<EFBFBD><E5A092>峕㺭<E5B395>格聢撘?| <20>𣂼<EFBFBD><F0A382BC>𤑳緵<F0A491B3>澆<EFBFBD><E6BE86>躰秤 |
|
||||
|
||||
**推荐方案:`xlsx` 库(SheetJS)**
|
||||
- ✅ 支持 `.xlsx` 和 `.xls` 格式
|
||||
- ✅ 纯 JavaScript,前端直接解析
|
||||
- ✅ 体积小(~600KB),性能好
|
||||
- ✅ 支持大文件(1000+ 行)
|
||||
**<EFBFBD>刻<EFBFBD><EFBFBD>寞<EFBFBD>嚗䫤xlsx` 摨橒<EFBFBD>SheetJS嚗?*
|
||||
- <20>?<3F>舀<EFBFBD> `.xlsx` <20>?`.xls` <20>澆<EFBFBD>
|
||||
- <20>?蝥?JavaScript嚗<EFBFBD><EFBFBD>蝡舐凒<EFBFBD>亥圾<EFBFBD>?
|
||||
- <20>?雿梶妖撠𧶏<E692A0>~600KB嚗㚁<E59A97><E39A81>扯<EFBFBD>憟?
|
||||
- <20>?<3F>舀<EFBFBD>憭扳<E686AD>隞塚<E99A9E>1000+ 銵䕘<EFBFBD>
|
||||
|
||||
**代码示例:**
|
||||
**隞<><E99A9E>蝷箔<E89DB7>嚗?*
|
||||
```typescript
|
||||
import * as XLSX from 'xlsx';
|
||||
|
||||
@@ -97,15 +97,15 @@ function parseExcel(file: File): Promise<Literature[]> {
|
||||
const sheetName = workbook.SheetNames[0];
|
||||
const worksheet = workbook.Sheets[sheetName];
|
||||
|
||||
// 转换为 JSON
|
||||
// 頧祆揢銝?JSON
|
||||
const jsonData = XLSX.utils.sheet_to_json(worksheet);
|
||||
|
||||
// 映射为标准格式
|
||||
// <EFBFBD>惩<EFBFBD>銝箸<EFBFBD><EFBFBD><EFBFBD>聢撘?
|
||||
const literatures = jsonData.map((row: any) => ({
|
||||
title: row['Title'] || row['<27><><EFBFBD>'],
|
||||
abstract: row['Abstract'] || row['<27>䁅<EFBFBD>'],
|
||||
doi: row['DOI'],
|
||||
authors: row['Authors'] || row['作者'],
|
||||
authors: row['Authors'] || row['雿𡏭<E99BBF>?],
|
||||
year: row['Year'] || row['撟港遢'],
|
||||
journal: row['Journal'] || row['<27>笔<EFBFBD>'],
|
||||
}));
|
||||
@@ -122,20 +122,20 @@ function parseExcel(file: File): Promise<Literature[]> {
|
||||
}
|
||||
```
|
||||
|
||||
#### 后端:批量筛选处理
|
||||
#### <EFBFBD>𡒊垢嚗𡁏鸌<EFBFBD>讐<EFBFBD><EFBFBD>匧<EFBFBD><EFBFBD>?
|
||||
|
||||
**处理流程:**
|
||||
**憭<><E686AD>瘚<EFBFBD><E7989A>嚗?*
|
||||
```
|
||||
Excel 数据 → 批量分组(10-20 篇/组)→ 并行调用 LLM → 汇总结果
|
||||
Excel <EFBFBD>唳旿 <20>?<3F>寥<EFBFBD><E5AFA5><EFBFBD><EFBFBD>嚗?0-20 蝭?蝏<><E89D8F><EFBFBD>?撟嗉<E6929F>靚<EFBFBD>鍂 LLM <20>?瘙<><E79899>餌<EFBFBD><E9A48C>?
|
||||
```
|
||||
|
||||
**关键技术点:**
|
||||
1. **批量分组**:避免单次请求过大,10-20 篇/组最优
|
||||
2. **并行处理**:使用 `Promise.all` 并行调用 LLM
|
||||
3. **进度推送**:WebSocket 实时推送处理进度
|
||||
**<EFBFBD>喲睸<EFBFBD><EFBFBD><EFBFBD>舐<EFBFBD>嚗?*
|
||||
1. **<2A>寥<EFBFBD><E5AFA5><EFBFBD><EFBFBD>**嚗𡁻<E59A97><F0A181BB>滚<EFBFBD>甈∟窈瘙<E7AA88><E79899>憭改<E686AD>10-20 蝭?蝏<><E89D8F>隡?
|
||||
2. **撟嗉<E6929F>憭<EFBFBD><E686AD>**嚗帋蝙<E5B88B>?`Promise.all` 撟嗉<E6929F>靚<EFBFBD>鍂 LLM
|
||||
3. **餈𥕦漲<F0A595A6>券<EFBFBD>?*嚗阳ebSocket 摰墧𧒄<E5A2A7>券<EFBFBD><E588B8><EFBFBD><EFBFBD><EFBFBD><EFBFBD>摨?
|
||||
4. **<2A>剔<EFBFBD>蝏凋<E89D8F>**嚗𡁏𣈲<F0A1818F><F0A388B2>遙<EFBFBD>∩葉<E288A9>剖<EFBFBD>蝏抒賒
|
||||
|
||||
**代码示例:**
|
||||
**隞<><E99A9E>蝷箔<E89DB7>嚗?*
|
||||
```typescript
|
||||
async function batchScreening(
|
||||
literatures: Literature[],
|
||||
@@ -156,7 +156,7 @@ async function batchScreening(
|
||||
|
||||
results.push(...batchResults);
|
||||
|
||||
// 推送进度
|
||||
// <EFBFBD>券<EFBFBD><EFBFBD><EFBFBD>摨?
|
||||
const progress = Math.round(((i + 1) / batches.length) * 100);
|
||||
progressCallback(progress);
|
||||
}
|
||||
@@ -165,55 +165,55 @@ async function batchScreening(
|
||||
}
|
||||
```
|
||||
|
||||
### 1.3 数据流
|
||||
### 1.3 <EFBFBD>唳旿瘚?
|
||||
|
||||
```
|
||||
<EFBFBD>冽<EFBFBD><EFBFBD>滢<EFBFBD> <20>滨垢憭<E59EA2><E686AD> <20>𡒊垢憭<E59EA2><E686AD> LLM 憭<><E686AD>
|
||||
│ │ │ │
|
||||
├─ 上传 Excel │ │ │
|
||||
│ └──────────────→│ │ │
|
||||
│ ├─ 解析 Excel │ │
|
||||
│ ├─ 验证格式 │ │
|
||||
│ ├─ 显示预览 │ │
|
||||
│ │ │ │
|
||||
│ ├─ 提交筛选任务 │ │
|
||||
│ │ └───────────────→│ │
|
||||
│ │ ├─ 保存任务 │
|
||||
│ │ ├─ 分组(15 篇/组) │
|
||||
│ │ │ │
|
||||
│ │ ├─ 批次 1 │
|
||||
│ │ │ └──────────────→│
|
||||
│ │ │ ├─ DeepSeek 筛选
|
||||
│ │ │ ├─ Qwen3 筛选
|
||||
│ │ │ ├─ 对比结果
|
||||
│ │ │ ←──────────────┘
|
||||
│ │ ├─ 保存结果 │
|
||||
│ │ │ │
|
||||
│ │ ├─ 批次 2... │
|
||||
│ │ │ │
|
||||
│ │ ←───────────────┤ 返回完整结果 │
|
||||
│ ←──────────────┤ 显示结果 │ │
|
||||
└─ 人工复核 │ │ │
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>? <EFBFBD>?
|
||||
<EFBFBD>鎿<EFBFBD> 銝𠹺<E98A9D> Excel <EFBFBD>? <EFBFBD>? <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>婙<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>圝<EFBFBD> <20>? <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>鎿<EFBFBD> 閫<><E996AB> Excel <EFBFBD>? <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>鎿<EFBFBD> 撉諹<E69289><E8ABB9>澆<EFBFBD> <20>? <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>鎿<EFBFBD> <20>曄內憸<E585A7><E686B8> <20>? <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>? <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>鎿<EFBFBD> <20>𣂷漱蝑偦<E89D91>劐遙<E58A90>? <EFBFBD>? <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>? <20>婙<EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>圝<EFBFBD> <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>鎿<EFBFBD> 靽嘥<E99DBD>隞餃𦛚 <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>鎿<EFBFBD> <20><><EFBFBD>嚗?5 蝭?蝏<><E89D8F> <20>?
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>? <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>鎿<EFBFBD> <20>寞活 1 <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>? <20>婙<EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>圝<EFBFBD>
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>? <EFBFBD>鎿<EFBFBD> DeepSeek 蝑偦<EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>? <EFBFBD>鎿<EFBFBD> Qwen3 蝑偦<EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>? <EFBFBD>鎿<EFBFBD> 撖寞<E69296>蝏𤘪<E89D8F>
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>? <20>鐥<EFBFBD><E990A5><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>鎿<EFBFBD> 靽嘥<E99DBD>蝏𤘪<E89D8F> <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>? <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>鎿<EFBFBD> <20>寞活 2... <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>? <EFBFBD>? <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>? <20>鐥<EFBFBD><E990A5><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?餈𥪜<E9A488>摰峕㟲蝏𤘪<E89D8F> <EFBFBD>?
|
||||
<EFBFBD>? <EFBFBD>鐥<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?<3F>曄內蝏𤘪<E89D8F> <EFBFBD>? <EFBFBD>?
|
||||
<20>婙<EFBFBD> 鈭箏極憭齿瓲 <EFBFBD>? <20>? <EFBFBD>?
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📌 场景 2 & 3: 全文复筛与数据提取
|
||||
## <EFBFBD><EFBFBD> <20>箸艶 2 & 3: <EFBFBD>冽<EFBFBD>憭滨<EFBFBD>銝擧㺭<EFBFBD>格<EFBFBD><EFBFBD>?
|
||||
|
||||
### 2.1 技术特点
|
||||
### 2.1 <EFBFBD><EFBFBD><EFBFBD>舐鸌<EFBFBD>?
|
||||
|
||||
- **颲枏<E9A2B2><E69E8F>澆<EFBFBD>**: PDF <20><>辣嚗<E8BEA3>㘚<EFBFBD><E3989A>龫摮行<E691AE><E8A18C>殷<EFBFBD>
|
||||
- **<2A><>辣<EFBFBD>寧<EFBFBD>**:
|
||||
- 蝘穃郎霈箸<E99C88><E7AEB8>澆<EFBFBD>嚗<EFBFBD><E59A97>憸塩<E686B8><E5A1A9><EFBFBD>閬<EFBFBD><E996AC><EFBFBD><EFBFBD>閮<EFBFBD><E996AE><EFBFBD>䲮瘜𨰻<E7989C><F0A8B0BB><EFBFBD><EFBFBD>栶<EFBFBD><E6A0B6>悄霈箝<E99C88><E7AE9D><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>殷<EFBFBD>
|
||||
- 包含复杂表格、公式、图表
|
||||
- 通常 10-30 页
|
||||
- <20><>鉄憭齿<E686AD>銵冽聢<E586BD><E881A2><EFBFBD>撘譌<E69298><E8AD8C>㦛銵?
|
||||
- <20>𡁜虜 10-30 憿?
|
||||
- **憭<><E686AD><EFBFBD>滨<EFBFBD>**: 擃睃<E69383>蝖桃<E89D96><E6A183>𣂼<EFBFBD>嚗䔶<E59A97><E494B6>嗵<EFBFBD><E597B5><EFBFBD><EFBFBD><EFBFBD>澆<EFBFBD>
|
||||
|
||||
### 2.2 <20><><EFBFBD>舫<EFBFBD>匧<EFBFBD>嚗䥪DF <20>𣂼<EFBFBD>
|
||||
|
||||
#### 核心方案:Nougat + PyMuPDF 顺序降级策略 ⭐
|
||||
#### <EFBFBD>詨<EFBFBD><EFBFBD>寞<EFBFBD>嚗鐭ougat + PyMuPDF 憿箏<EFBFBD><EFBFBD>滨漣蝑𣇉裦 潃?
|
||||
|
||||
**现有架构**(已实现,位于 `extraction_service/`):
|
||||
**<EFBFBD>唳<EFBFBD><EFBFBD>嗆<EFBFBD>**嚗<>歇摰䂿緵嚗䔶<E59A97>鈭?`extraction_service/`嚗㚁<E59A97>
|
||||
|
||||
```python
|
||||
# 憿箏<E686BF><E7AE8F>滨漣蝑𣇉裦
|
||||
@@ -221,75 +221,75 @@ def extract_pdf(file_path: str):
|
||||
# Step 1: 璉<>瘚贝祗閮<E7A597>
|
||||
language = detect_language(file_path)
|
||||
|
||||
# Step 2: 中文 PDF → PyMuPDF(快速)
|
||||
# Step 2: 銝剜<EFBFBD> PDF <EFBFBD>?PyMuPDF嚗<EFBFBD>翰<EFBFBD><EFBFBD><EFBFBD>
|
||||
if language == 'chinese':
|
||||
return extract_pdf_pymupdf(file_path)
|
||||
|
||||
# Step 3: 英文 PDF → 尝试 Nougat
|
||||
# Step 3: <EFBFBD>望<EFBFBD> PDF <EFBFBD>?撠肽<E692A0> Nougat
|
||||
if check_nougat_available():
|
||||
result = extract_pdf_nougat(file_path)
|
||||
|
||||
# 质量检查(阈值 0.7)
|
||||
# 韐券<EFBFBD>璉<EFBFBD><EFBFBD>伐<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?0.7嚗?
|
||||
if result['quality_score'] >= 0.7:
|
||||
return result # ✅ Nougat 成功
|
||||
return result # <EFBFBD>?Nougat <EFBFBD>𣂼<EFBFBD>
|
||||
|
||||
# Step 4: 降级到 PyMuPDF
|
||||
# Step 4: <EFBFBD>滨漣<EFBFBD>?PyMuPDF
|
||||
return extract_pdf_pymupdf(file_path)
|
||||
```
|
||||
|
||||
#### 技术对比
|
||||
#### <EFBFBD><EFBFBD><EFBFBD>臬笆瘥?
|
||||
|
||||
| <20>寞<EFBFBD> | 隡睃飵 | <20><>飵 | <20><>鍂<EFBFBD>箸艶 |
|
||||
|------|------|------|---------|
|
||||
| **Nougat** ⭐ | • 专为科学文献设计<br>• 公式、表格准确率高<br>• 输出 Markdown 格式<br>• 保留文档结构 | • 速度慢(1-2 分钟/20 页)<br>• 需要 GPU 加速<br>• 内存占用大(~4GB) | 英文医学文献全文提取 |
|
||||
| **PyMuPDF** | • 速度快(秒级)<br>• 内存占用低<br>• 部署简单 | • 公式、表格易丢失<br>• 纯文本输出<br>• 布局易混乱 | 中文文献、快速预览 |
|
||||
| **Adobe API** | • 商业级准确率<br>• 云端处理 | • 需付费<br>• 网络依赖<br>• 隐私风险 | 不推荐(成本高) |
|
||||
| **Tesseract OCR** | • 开源免费<br>• 支持多语言 | • 需要图像预处理<br>• 准确率不稳定 | 扫描版 PDF(备选) |
|
||||
| **Nougat** 潃?| <20>?銝㮖蛹蝘穃郎<E7A983><E9838E>讃霈曇恣<br><3E>?<3F>砍<EFBFBD><E7A08D><EFBFBD>”<EFBFBD>澆<EFBFBD>蝖桃<E89D96>擃?br><3E>?颲枏枂 Markdown <EFBFBD>澆<EFBFBD><br><EFBFBD>?靽萘<E99DBD><E89098><EFBFBD>﹝蝏𤘪<E89D8F> | <20>?<3F>笔漲<E7AC94>g<EFBFBD>1-2 <EFBFBD><EFBFBD><EFBFBD>/20 憿蛛<EFBFBD><br><EFBFBD>?<3F><>閬?GPU <20>𣳇<EFBFBD>?br><3E>?<3F><><EFBFBD><EFBFBD>删鍂憭改<E686AD>~4GB嚗?| <20>望<EFBFBD><E69C9B>餃郎<E9A483><E9838E>讃<EFBFBD>冽<EFBFBD><E586BD>𣂼<EFBFBD> |
|
||||
| **PyMuPDF** | <EFBFBD>?<3F>笔漲敹恬<E695B9>蝘垍漣嚗?br><3E>?<3F><><EFBFBD><EFBFBD>删鍂雿?br><3E>?<3F>函蔡蝞<E894A1><E89D9E>?| <20>?<3F>砍<EFBFBD><E7A08D><EFBFBD>”<EFBFBD>潭<EFBFBD>銝W仃<br><3E>?蝥舀<E89DA5><E88880>祈<EFBFBD><E7A588>?br><3E>?撣<><E692A3><EFBFBD>𤘪毽銋?| 銝剜<E98A9D><E5899C><EFBFBD>讃<EFBFBD><E8AE83>翰<EFBFBD>罸<EFBFBD>閫?|
|
||||
| **Adobe API** | <EFBFBD>?<3F><><EFBFBD>蝥批<E89DA5>蝖桃<E89D96><br><3E>?鈭𤑳垢憭<E59EA2><E686AD> | <20>?<3F><>隞䁅晶<br><3E>?蝵𤑳<E89DB5>靘肽<E99D98><br><3E>?<3F>鞟<EFBFBD>憌𡡞埯 | 銝齿綫<E9BDBF>琜<EFBFBD><E7909C>鞉𧋦擃矋<E69383> |
|
||||
| **Tesseract OCR** | <EFBFBD>?撘<>皞𣂼<E79A9E>韐?br><3E>?<3F>舀<EFBFBD>憭朞祗閮<E7A597> | <20>?<3F><>閬<EFBFBD>㦛<EFBFBD>誯<EFBFBD>憭<EFBFBD><E686AD><br><3E>?<3F><>&<EFBFBD><EFBC86><EFBFBD>蝔喳<E89D94> | <20>急<EFBFBD><E680A5>?PDF嚗<46><E59A97><EFBFBD>㚁<EFBFBD> |
|
||||
|
||||
**推荐方案:Nougat(主) + PyMuPDF(降级) ⭐**
|
||||
**<EFBFBD>刻<EFBFBD><EFBFBD>寞<EFBFBD>嚗鐭ougat嚗<EFBFBD>蜓嚗?+ PyMuPDF嚗<46><E59A97>蝥改<E89DA5> 潃?*
|
||||
|
||||
#### Nougat <20>詨<EFBFBD>隡睃飵嚗<E9A3B5>龫摮行<E691AE><E8A18C>桀㦤<E6A180>荔<EFBFBD>
|
||||
|
||||
```
|
||||
✅ 专为科学文献设计
|
||||
<EFBFBD>?銝㮖蛹蝘穃郎<E7A983><E9838E>讃霈曇恣
|
||||
<20>鎿<EFBFBD> 霈剔<E99C88><E58994>唳旿嚗惨rXiv 霈箸<E99C88> + 蝘穃郎<E7A983>笔<EFBFBD>
|
||||
<20>鎿<EFBFBD> <20>砍<EFBFBD>霂<EFBFBD><E99C82>嚗匁aTeX <20>澆<EFBFBD>颲枏枂
|
||||
<20>鎿<EFBFBD> 銵冽聢靽萘<E99DBD>嚗鐝arkdown 銵冽聢<E586BD>澆<EFBFBD>
|
||||
└─ 结构化输出:章节、段落清晰
|
||||
<EFBFBD>婙<EFBFBD> 蝏𤘪<E89D8F><F0A498AA>𤥁<EFBFBD><F0A4A581>綽<EFBFBD>蝡㰘<E89DA1><E3B098><EFBFBD>挾<EFBFBD>賣<EFBFBD><E8B3A3>?
|
||||
|
||||
✅ 输出格式:Markdown
|
||||
├─ 标题层级:# ## ###
|
||||
<EFBFBD>?颲枏枂<E69E8F>澆<EFBFBD>嚗鐝arkdown
|
||||
<EFBFBD>鎿<EFBFBD> <20><><EFBFBD>撅<EFBFBD>漣嚗? ## ###
|
||||
<20>鎿<EFBFBD> 銵冽聢嚗𡯂 Header | Data |
|
||||
├─ 公式:$$ formula $$
|
||||
<EFBFBD>鎿<EFBFBD> <20>砍<EFBFBD>嚗?$ formula $$
|
||||
<20>婙<EFBFBD> 撘閧鍂嚗靀1] [2] [3]
|
||||
|
||||
✅ 质量评估机制
|
||||
├─ 自动质量评分(0-1)
|
||||
├─ 低质量自动降级 PyMuPDF
|
||||
└─ 保证提取成功率
|
||||
<EFBFBD>?韐券<E99F90>霂<EFBFBD>摯<EFBFBD>箏<EFBFBD>
|
||||
<EFBFBD>鎿<EFBFBD> <20>芸𢆡韐券<E99F90>霂<EFBFBD><E99C82>嚗?-1嚗?
|
||||
<EFBFBD>鎿<EFBFBD> 雿舘捶<E88898>讛䌊<E8AE9B>券<EFBFBD>蝥?PyMuPDF
|
||||
<EFBFBD>婙<EFBFBD> 靽肽<E99DBD><E882BD>𣂼<EFBFBD><F0A382BC>𣂼<EFBFBD><F0A382BC>?
|
||||
```
|
||||
|
||||
#### 摰䂿緵蝏<E7B7B5><E89D8F>
|
||||
|
||||
**服务架构:**
|
||||
**<2A>滚𦛚<E6BB9A>嗆<EFBFBD>嚗?*
|
||||
```
|
||||
Node.js Backend (Port 3001)
|
||||
│
|
||||
<EFBFBD>?
|
||||
<20>鎿<EFBFBD> 靚<>鍂 ExtractionClient.ts
|
||||
│ └─ HTTP 请求 → Python 微服务
|
||||
│
|
||||
<EFBFBD>? <20>婙<EFBFBD> HTTP 霂瑟<E99C82> <20>?Python 敺格<E695BA><E6A0BC>?
|
||||
<EFBFBD>?
|
||||
Python Extraction Service (Port 8000)
|
||||
│
|
||||
<EFBFBD>?
|
||||
<20>鎿<EFBFBD> /api/extract/pdf
|
||||
│ ├─ detect_language()
|
||||
│ ├─ extract_pdf_nougat() → Nougat Model
|
||||
│ └─ extract_pdf_pymupdf() → PyMuPDF
|
||||
│
|
||||
<EFBFBD>? <20>鎿<EFBFBD> detect_language()
|
||||
<EFBFBD>? <20>鎿<EFBFBD> extract_pdf_nougat() <EFBFBD>?Nougat Model
|
||||
<EFBFBD>? <20>婙<EFBFBD> extract_pdf_pymupdf() <EFBFBD>?PyMuPDF
|
||||
<EFBFBD>?
|
||||
<20>婙<EFBFBD> /api/health
|
||||
└─ 检查 Nougat 可用性
|
||||
<EFBFBD>婙<EFBFBD> 璉<><E79289>?Nougat <20>舐鍂<E88890>?
|
||||
```
|
||||
|
||||
**Node.js 调用代码:**
|
||||
**Node.js 靚<EFBFBD>鍂隞<EFBFBD><EFBFBD>嚗?*
|
||||
```typescript
|
||||
import { extractionClient } from '@common/document/ExtractionClient';
|
||||
|
||||
@@ -320,7 +320,7 @@ async function extractLiteraturePDF(file: Buffer, filename: string) {
|
||||
}
|
||||
```
|
||||
|
||||
**Python 提取代码:**
|
||||
**Python <EFBFBD>𣂼<EFBFBD>隞<EFBFBD><EFBFBD>嚗?*
|
||||
```python
|
||||
# extraction_service/services/nougat_extractor.py
|
||||
|
||||
@@ -336,14 +336,14 @@ def extract_pdf_nougat(file_path: str) -> Dict[str, Any]:
|
||||
file_path,
|
||||
'-o', output_dir,
|
||||
'--markdown', # 颲枏枂 Markdown <20>澆<EFBFBD>
|
||||
'--no-skipping' # 不跳过任何页面
|
||||
'--no-skipping' # 銝滩歲餈<EFBFBD>遙雿閖△<EFBFBD>?
|
||||
]
|
||||
|
||||
# 执行 Nougat(超时 5 分钟)
|
||||
# <EFBFBD>扯<EFBFBD> Nougat嚗<EFBFBD><EFBFBD><EFBFBD>?5 <20><><EFBFBD>嚗?
|
||||
process = subprocess.Popen(cmd, ...)
|
||||
stdout, stderr = process.communicate(timeout=300)
|
||||
|
||||
# 读取输出文件(.mmd)
|
||||
# 霂餃<EFBFBD>颲枏枂<EFBFBD><EFBFBD>辣嚗?mmd嚗?
|
||||
markdown_text = read_output_file()
|
||||
|
||||
# 韐券<E99F90>霂<EFBFBD>摯
|
||||
@@ -362,9 +362,9 @@ def extract_pdf_nougat(file_path: str) -> Dict[str, Any]:
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 文本后处理
|
||||
### 2.3 <EFBFBD><EFBFBD>𧋦<EFBFBD>𤾸<EFBFBD><EFBFBD>?
|
||||
|
||||
**Nougat 输出优化:**
|
||||
**Nougat 颲枏枂隡睃<EFBFBD>嚗?*
|
||||
```typescript
|
||||
function postProcessNougatOutput(markdown: string): ProcessedText {
|
||||
return {
|
||||
@@ -380,10 +380,10 @@ function postProcessNougatOutput(markdown: string): ProcessedText {
|
||||
// <20>砍<EFBFBD><E7A08D>𣂼<EFBFBD>
|
||||
formulas: extractFormulas(markdown),
|
||||
|
||||
// 纯文本(去除格式)
|
||||
// 蝥舀<EFBFBD><EFBFBD>穿<EFBFBD><EFBFBD>駁膄<EFBFBD>澆<EFBFBD>嚗?
|
||||
plainText: markdownToPlainText(markdown),
|
||||
|
||||
// 结构化数据(用于 LLM)
|
||||
// 蝏𤘪<EFBFBD><EFBFBD>𡝗㺭<EFBFBD>殷<EFBFBD><EFBFBD>其<EFBFBD> LLM嚗?
|
||||
structured: {
|
||||
title: extractTitle(markdown),
|
||||
abstract: extractAbstract(markdown),
|
||||
@@ -398,13 +398,13 @@ function postProcessNougatOutput(markdown: string): ProcessedText {
|
||||
|
||||
## <20><> <20>箸艶 4: <20><>讃銝贝蝸嚗㇎npaywall API嚗争<E59A97>
|
||||
|
||||
### 3.1 技术背景
|
||||
### 3.1 <EFBFBD><EFBFBD><EFBFBD>航<EFBFBD><EFBFBD>?
|
||||
|
||||
**Unpaywall** 是一个免费的开放获取(Open Access)文献 API,可以:
|
||||
- ✅ 通过 DOI 查询文献是否有免费全文
|
||||
- ✅ 获取合法的 PDF 下载链接
|
||||
- ✅ 完全免费,无需付费
|
||||
- ✅ 数据库覆盖 3000+ 万篇文献
|
||||
**Unpaywall** <20>臭<EFBFBD>銝芸<E98A9D>韐寧<E99F90>撘<EFBFBD><E69298>曇繮<E69B87>吔<EFBFBD>Open Access嚗㗇<E59A97><E39787>?API嚗<49>虾隞伐<E99A9E>
|
||||
- <20>?<3F>朞<EFBFBD> DOI <20>亥砭<E4BAA5><E7A0AD>讃<EFBFBD>臬炏<E887AC>匧<EFBFBD>韐孵<E99F90><E5ADB5>?
|
||||
- <20>?<3F>瑕<EFBFBD><E79195><EFBFBD><EFBFBD><EFBFBD>?PDF 銝贝蝸<E8B49D>暹𦻖
|
||||
- <20>?摰<><E691B0><EFBFBD>滩晶嚗峕<E59A97><E5B395><EFBFBD>隞䁅晶
|
||||
- <20>?<3F>唳旿摨栞<E691A8><E6A09E>?3000+ 銝<><E98A9D><EFBFBD><EFBFBD>讃
|
||||
|
||||
**摰条<E691B0>**: https://unpaywall.org/products/api
|
||||
|
||||
@@ -412,18 +412,18 @@ function postProcessNougatOutput(markdown: string): ProcessedText {
|
||||
|
||||
#### API 靚<>鍂<EFBFBD>孵<EFBFBD>
|
||||
|
||||
**基础信息:**
|
||||
**<2A>箇<EFBFBD>靽⊥<E99DBD>嚗?*
|
||||
- **API 蝡舐<E89DA1>**: `https://api.unpaywall.org/v2/{doi}?email={your_email}`
|
||||
- **霂瑟<E99C82><E7919F>寞<EFBFBD>**: GET
|
||||
- **霈方<E99C88><E696B9>孵<EFBFBD>**: <20>𣳇<EFBFBD> API Key嚗䔶<E59A97><E494B6><EFBFBD><EFBFBD>𣂷<EFBFBD><F0A382B7>桃拳
|
||||
- **速率限制**: 100,000 次/天(免费)
|
||||
- **<2A>毺<EFBFBD><E6AFBA>𣂼<EFBFBD>**: 100,000 甈?憭抬<E686AD><E68AAC>滩晶嚗?
|
||||
|
||||
**示例请求:**
|
||||
**蝷箔<E89DB7>霂瑟<E99C82>嚗?*
|
||||
```bash
|
||||
curl "https://api.unpaywall.org/v2/10.1038/nature12373?email=YOUR_EMAIL"
|
||||
```
|
||||
|
||||
**响应示例:**
|
||||
**<2A>滚<EFBFBD>蝷箔<E89DB7>嚗?*
|
||||
```json
|
||||
{
|
||||
"doi": "10.1038/nature12373",
|
||||
@@ -443,7 +443,7 @@ curl "https://api.unpaywall.org/v2/10.1038/nature12373?email=YOUR_EMAIL"
|
||||
|
||||
#### Node.js 摰䂿緵
|
||||
|
||||
**服务封装:**
|
||||
**<2A>滚𦛚撠<F0A69B9A><E692A0>嚗?*
|
||||
```typescript
|
||||
// backend/src/common/literature/UnpaywallClient.ts
|
||||
|
||||
@@ -453,7 +453,7 @@ import { config } from '../../config/env';
|
||||
export interface UnpaywallResult {
|
||||
doi: string;
|
||||
title: string;
|
||||
isOA: boolean; // 是否开放获取
|
||||
isOA: boolean; // <EFBFBD>臬炏撘<EFBFBD><EFBFBD>曇繮<EFBFBD>?
|
||||
oaStatus: string; // "gold" | "green" | "hybrid" | "bronze" | "closed"
|
||||
pdfUrl: string | null; // PDF 銝贝蝸<E8B49D>暹𦻖
|
||||
landingPageUrl: string; // <20><>讃憿菟𢒰<E88F9F>暹𦻖
|
||||
@@ -476,12 +476,12 @@ class UnpaywallClient {
|
||||
try {
|
||||
const url = `${this.baseUrl}/${doi}?email=${this.email}`;
|
||||
const response = await axios.get(url, {
|
||||
timeout: 10000, // 10 秒超时
|
||||
timeout: 10000, // 10 蝘坿<EFBFBD><EFBFBD>?
|
||||
});
|
||||
|
||||
const data = response.data;
|
||||
|
||||
// 获取最佳下载位置
|
||||
// <EFBFBD>瑕<EFBFBD><EFBFBD><EFBFBD>雿喃<EFBFBD>頧賭<EFBFBD>蝵?
|
||||
const bestOA = data.best_oa_location;
|
||||
|
||||
return {
|
||||
@@ -505,7 +505,7 @@ class UnpaywallClient {
|
||||
}
|
||||
|
||||
/**
|
||||
* 批量查询(带速率限制)
|
||||
* <EFBFBD>寥<EFBFBD><EFBFBD>亥砭嚗<EFBFBD>蒂<EFBFBD>毺<EFBFBD><EFBFBD>𣂼<EFBFBD>嚗?
|
||||
*/
|
||||
async getBatch(dois: string[]): Promise<UnpaywallResult[]> {
|
||||
const results = [];
|
||||
@@ -515,7 +515,7 @@ class UnpaywallClient {
|
||||
const result = await this.getByDoi(doi);
|
||||
results.push(result);
|
||||
|
||||
// 速率限制:100ms/请求
|
||||
// <EFBFBD>毺<EFBFBD><EFBFBD>𣂼<EFBFBD>嚗?00ms/霂瑟<EFBFBD>
|
||||
await new Promise(resolve => setTimeout(resolve, 100));
|
||||
} catch (error) {
|
||||
console.error(`Failed to fetch ${doi}:`, error.message);
|
||||
@@ -547,7 +547,7 @@ class UnpaywallClient {
|
||||
export const unpaywallClient = new UnpaywallClient();
|
||||
```
|
||||
|
||||
**环境变量配置:**
|
||||
**<EFBFBD>臬<EFBFBD><EFBFBD>㗛<EFBFBD><EFBFBD>滨蔭嚗?*
|
||||
```env
|
||||
# .env
|
||||
UNPAYWALL_EMAIL=your-email@example.com
|
||||
@@ -560,7 +560,7 @@ UNPAYWALL_EMAIL=your-email@example.com
|
||||
async function checkLiteratureAvailability(literatures: Literature[]) {
|
||||
const dois = literatures
|
||||
.map(lit => lit.doi)
|
||||
.filter(doi => doi); // 过滤空 DOI
|
||||
.filter(doi => doi); // 餈<EFBFBD>誘蝛?DOI
|
||||
|
||||
const results = await unpaywallClient.getBatch(dois);
|
||||
|
||||
@@ -572,7 +572,7 @@ async function checkLiteratureAvailability(literatures: Literature[]) {
|
||||
}
|
||||
```
|
||||
|
||||
**场景 2:用户点击下载全文**
|
||||
**<EFBFBD>箸艶 2嚗𡁶鍂<F0A181B6>瑞<EFBFBD><E7919E>颱<EFBFBD>頧賢<E9A0A7><E8B3A2>?*
|
||||
```typescript
|
||||
async function downloadLiteratureFullText(doi: string) {
|
||||
// Step 1: <20>亥砭 Unpaywall
|
||||
@@ -588,7 +588,7 @@ async function downloadLiteratureFullText(doi: string) {
|
||||
|
||||
await unpaywallClient.downloadPdf(unpaywallResult.pdfUrl, outputPath);
|
||||
|
||||
// Step 3: 提取文本(调用 extraction_service)
|
||||
// Step 3: <EFBFBD>𣂼<EFBFBD><EFBFBD><EFBFBD>𧋦嚗<EFBFBD><EFBFBD><EFBFBD>?extraction_service嚗?
|
||||
const extractionResult = await extractionClient.extractPdf(
|
||||
fs.readFileSync(outputPath),
|
||||
filename,
|
||||
@@ -605,9 +605,9 @@ async function downloadLiteratureFullText(doi: string) {
|
||||
|
||||
### 3.3 <20>滨垢<E6BBA8><E59EA2><EFBFBD>
|
||||
|
||||
**批量下载按钮:**
|
||||
**<EFBFBD>寥<EFBFBD>銝贝蝸<EFBFBD>厰僼嚗?*
|
||||
```typescript
|
||||
// 批量检查可下载性
|
||||
// <20>寥<EFBFBD>璉<EFBFBD><E79289>亙虾銝贝蝸<E8B49D>?
|
||||
async function checkDownloadable(selectedRows: Literature[]) {
|
||||
setLoading(true);
|
||||
|
||||
@@ -631,7 +631,7 @@ async function downloadFullText(literature: Literature) {
|
||||
const result = await api.downloadLiteratureFullText(literature.doi);
|
||||
message.success('銝贝蝸<E8B49D>𣂼<EFBFBD>');
|
||||
|
||||
// 打开 PDF 查看器
|
||||
// <EFBFBD>枏<EFBFBD> PDF <EFBFBD>亦<EFBFBD><EFBFBD>?
|
||||
openPdfViewer(result.pdfPath);
|
||||
} catch (error) {
|
||||
message.error(`銝贝蝸憭梯揖: ${error.message}`);
|
||||
@@ -645,23 +645,23 @@ async function downloadFullText(literature: Literature) {
|
||||
|
||||
### 4.1 <20>冽<EFBFBD><E586BD>啁<EFBFBD><E59581><EFBFBD><EFBFBD>舐<EFBFBD><E88890>餌<EFBFBD>
|
||||
|
||||
| 技术点 | 状态 | 说明 |
|
||||
| <EFBFBD><EFBFBD><EFBFBD>舐<EFBFBD> | <20>嗆<EFBFBD>?| 霂湔<E99C82> |
|
||||
|--------|------|------|
|
||||
| ✅ Nougat 模型 | 已实现 | `extraction_service/services/nougat_extractor.py` |
|
||||
| ✅ PyMuPDF | 已实现 | `extraction_service/services/pdf_extractor.py` |
|
||||
| ✅ 顺序降级策略 | 已实现 | 英文→Nougat,中文→PyMuPDF |
|
||||
| 🆕 Unpaywall API | 需新增 | 本文档提供实现方案 |
|
||||
| ✅ Excel 解析 | 需新增 | 使用 `xlsx` 库(前端) |
|
||||
| <EFBFBD>?Nougat 璅∪<EFBFBD> | 撌脣<E6928C><E884A3>?| `extraction_service/services/nougat_extractor.py` |
|
||||
| <EFBFBD>?PyMuPDF | 撌脣<EFBFBD><EFBFBD>?| `extraction_service/services/pdf_extractor.py` |
|
||||
| <EFBFBD>?憿箏<E686BF><E7AE8F>滨漣蝑𣇉裦 | 撌脣<E6928C><E884A3>?| <20>望<EFBFBD><E69C9B>𩦝ougat嚗䔶葉<E494B6><E89189><EFBFBD>PyMuPDF |
|
||||
| <EFBFBD><EFBFBD> Unpaywall API | <EFBFBD><EFBFBD><EFBFBD>啣<EFBFBD> | <20>祆<EFBFBD>獢<EFBFBD><E78DA2>靘𥕦<E99D98><F0A595A6>唳䲮獢?|
|
||||
| <EFBFBD>?Excel 閫<EFBFBD><EFBFBD> | <20><><EFBFBD>啣<EFBFBD> | 雿輻鍂 `xlsx` 摨橒<E691A8><E6A992>滨垢嚗?|
|
||||
|
||||
### 4.2 可能遗漏的技术点 ⭐
|
||||
### 4.2 <EFBFBD>航<EFBFBD><EFBFBD>埈<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>舐<EFBFBD> 潃?
|
||||
|
||||
#### (1)表格提取增强
|
||||
#### 嚗?嚗㕑”<E39591>潭<EFBFBD><E6BDAD>硋<EFBFBD>撘?
|
||||
|
||||
**问题**:Nougat 虽然保留表格结构,但 LLM 直接处理 Markdown 表格可能不准确。
|
||||
**<EFBFBD>桅<EFBFBD>**嚗鐭ougat <20>賜<EFBFBD>靽萘<E99DBD>銵冽聢蝏𤘪<E89D8F>嚗䔶<E59A97> LLM <20>湔𦻖憭<F0A6BB96><E686AD> Markdown 銵冽聢<E586BD>航<EFBFBD>銝滚<E98A9D>蝖柴<E89D96>?
|
||||
|
||||
**閫<><E996AB><EFBFBD>寞<EFBFBD>嚗関able Transformer**
|
||||
```python
|
||||
# 使用微软的 Table Transformer 模型
|
||||
# 雿輻鍂敺株蔓<E6A0AA>?Table Transformer 璅∪<EFBFBD>
|
||||
# https://github.com/microsoft/table-transformer
|
||||
|
||||
from transformers import TableTransformerForObjectDetection
|
||||
@@ -675,7 +675,7 @@ def extract_tables_enhanced(pdf_path: str):
|
||||
"microsoft/table-transformer-detection"
|
||||
)
|
||||
|
||||
# 检测表格位置
|
||||
# 璉<EFBFBD>瘚贝”<EFBFBD>潔<EFBFBD>蝵?
|
||||
tables = model.detect_tables(pdf_path)
|
||||
|
||||
# <20>𣂼<EFBFBD>瘥譍葵銵冽聢
|
||||
@@ -686,22 +686,22 @@ def extract_tables_enhanced(pdf_path: str):
|
||||
return structured_tables
|
||||
```
|
||||
|
||||
**优先级:V2.0**(MVP 阶段 Nougat 足够)
|
||||
**隡睃<EFBFBD>蝥改<EFBFBD>V2.0**嚗㇈VP <20>嗆挾 Nougat 頞喳<EFBFBD>嚗?
|
||||
|
||||
#### (2)引用解析与链接
|
||||
#### 嚗?嚗匧<E59A97><E58CA7>刻圾<E588BB>𣂷<EFBFBD><F0A382B7>暹𦻖
|
||||
|
||||
**问题**:科学文献包含大量引用 `[1] [2] [3]`,需要解析并链接到参考文献。
|
||||
**<EFBFBD>桅<EFBFBD>**嚗𡁶<E59A97>摮行<E691AE><E8A18C>桀<EFBFBD><E6A180>怠之<E680A0>誩<EFBFBD><E8AAA9>?`[1] [2] [3]`嚗屸<E59A97>閬<EFBFBD>圾<EFBFBD>𣂼僎<F0A382BC>暹𦻖<E69AB9>啣<EFBFBD><E595A3><EFBFBD><EFBFBD><EFBFBD>柴<EFBFBD>?
|
||||
|
||||
**閫<><E996AB><EFBFBD>寞<EFBFBD>嚗鎭ROBID**
|
||||
```python
|
||||
# GROBID: 开源科学文献解析工具
|
||||
# GROBID: 撘<EFBFBD>皞鞟<EFBFBD>摮行<EFBFBD><EFBFBD>株圾<EFBFBD>𣂼極<EFBFBD>?
|
||||
# https://github.com/kermitt2/grobid
|
||||
|
||||
import requests
|
||||
|
||||
def parse_references(pdf_path: str):
|
||||
"""
|
||||
使用 GROBID 解析参考文献
|
||||
雿輻鍂 GROBID 閫<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
|
||||
"""
|
||||
with open(pdf_path, 'rb') as f:
|
||||
files = {'input': f}
|
||||
@@ -714,11 +714,11 @@ def parse_references(pdf_path: str):
|
||||
return response.json()['references']
|
||||
```
|
||||
|
||||
**优先级:V2.0**(非核心功能)
|
||||
**隡睃<EFBFBD>蝥改<EFBFBD>V2.0**嚗<><E59A97><EFBFBD>詨<EFBFBD><E8A9A8>蠘<EFBFBD>嚗?
|
||||
|
||||
#### (3)公式识别与渲染
|
||||
#### 嚗?嚗匧<E59A97>撘讛<E69298><E8AE9B>思<EFBFBD>皜脫<E79A9C>
|
||||
|
||||
**问题**:Nougat 输出 LaTeX 公式,前端需要渲染。
|
||||
**<EFBFBD>桅<EFBFBD>**嚗鐭ougat 颲枏枂 LaTeX <EFBFBD>砍<EFBFBD>嚗<EFBFBD><EFBFBD>蝡舫<EFBFBD>閬<EFBFBD>葡<EFBFBD>瓐<EFBFBD>?
|
||||
|
||||
**閫<><E996AB><EFBFBD>寞<EFBFBD>嚗鐗aTeX / MathJax**
|
||||
```typescript
|
||||
@@ -736,9 +736,9 @@ function renderFormula(latex: string) {
|
||||
|
||||
**隡睃<E99AA1>蝥改<E89DA5>MVP**嚗<><E59A97><EFBFBD><EFBFBD>鍂<EFBFBD>瑚<EFBFBD>撉䕘<E69289>
|
||||
|
||||
#### (4)PDF 预览与标注
|
||||
#### 嚗?嚗侨DF 憸<><E686B8>銝擧<E98A9D>瘜?
|
||||
|
||||
**问题**:人工复核时需要查看原文,并高亮标注。
|
||||
**<EFBFBD>桅<EFBFBD>**嚗帋犖撌亙<E6928C><E4BA99>豢𧒄<E8B1A2><F0A79284>閬<EFBFBD>䰻<EFBFBD>见<EFBFBD><E8A781><EFBFBD><EFBFBD>撟園<E6929F>鈭格<E988AD>瘜具<E7989C>?
|
||||
|
||||
**閫<><E996AB><EFBFBD>寞<EFBFBD>嚗䥪DF.js + Annotator.js**
|
||||
```typescript
|
||||
@@ -762,11 +762,11 @@ function PdfViewer({ pdfUrl, annotations }) {
|
||||
|
||||
**隡睃<E99AA1>蝥改<E89DA5>MVP**嚗<>瓲敹<E793B2><E695B9><EFBFBD>踝<EFBFBD>
|
||||
|
||||
#### (5)文献去重
|
||||
#### 嚗?嚗㗇<E59A97><E39787>桀縧<E6A180>?
|
||||
|
||||
**问题**:Excel 上传可能包含重复文献(同一篇文献不同版本)。
|
||||
**<EFBFBD>桅<EFBFBD>**嚗鍃xcel 銝𠹺<E98A9D><F0A0B9BA>航<EFBFBD><E888AA><EFBFBD>鉄<EFBFBD>滚<EFBFBD><E6BB9A><EFBFBD>讃嚗<E8AE83><E59A97>銝<EFBFBD>蝭<EFBFBD><E89DAD><EFBFBD>桐<EFBFBD><E6A190>𣬚<EFBFBD><F0A3AC9A>穿<EFBFBD><E7A9BF>?
|
||||
|
||||
**解决方案:基于 DOI 和标题的去重**
|
||||
**閫<EFBFBD><EFBFBD><EFBFBD>寞<EFBFBD>嚗𡁜抅鈭?DOI <20>峕<EFBFBD>憸条<E686B8><E69DA1>駁<EFBFBD>**
|
||||
```typescript
|
||||
function deduplicateLiteratures(literatures: Literature[]) {
|
||||
const seen = new Set();
|
||||
@@ -791,16 +791,16 @@ function normalizeTitle(title: string): string {
|
||||
return title
|
||||
.toLowerCase()
|
||||
.replace(/[^\w\s]/g, '') // <20>駁膄<E9A781><E88684><EFBFBD>
|
||||
.replace(/\s+/g, ' ') // 规范化空格
|
||||
.replace(/\s+/g, ' ') // 閫<EFBFBD><EFBFBD><EFBFBD>𣇉征<EFBFBD>?
|
||||
.trim();
|
||||
}
|
||||
```
|
||||
|
||||
**隡睃<E99AA1>蝥改<E89DA5>MVP**嚗<><E59A97>憿餃<E686BF><E9A483>踝<EFBFBD>
|
||||
|
||||
#### (6)文献元数据补全
|
||||
#### 嚗?嚗㗇<E59A97><E39787>桀<EFBFBD><E6A180>唳旿銵亙<E98AB5>
|
||||
|
||||
**问题**:Excel 上传的数据可能不完整(缺 DOI、年份等)。
|
||||
**<EFBFBD>桅<EFBFBD>**嚗鍃xcel 銝𠹺<E98A9D><F0A0B9BA><EFBFBD>㺭<EFBFBD>桀虾<E6A180>賭<EFBFBD>摰峕㟲嚗<E39FB2>撩 DOI<4F><49>僑隞賜<E99A9E>嚗剹<E59A97>?
|
||||
|
||||
**閫<><E996AB><EFBFBD>寞<EFBFBD>嚗鋴rossref API**
|
||||
```typescript
|
||||
@@ -826,9 +826,9 @@ async function enrichMetadata(literature: Literature) {
|
||||
|
||||
**隡睃<E99AA1>蝥改<E89DA5>V1.0**嚗<><E59A97>撘箏<E69298><E7AE8F>踝<EFBFBD>
|
||||
|
||||
#### (7)批处理进度持久化
|
||||
#### 嚗?嚗㗇鸌憭<E9B88C><E686AD>餈𥕦漲<F0A595A6><E6BCB2><EFBFBD><EFBFBD>?
|
||||
|
||||
**问题**:批量筛选耗时长(1000 篇 > 10 分钟),需支持断点续传。
|
||||
**<EFBFBD>桅<EFBFBD>**嚗𡁏鸌<F0A1818F>讐<EFBFBD><E8AE90>㕑<EFBFBD>埈𧒄<E59F88>選<EFBFBD>1000 蝭?> 10 <EFBFBD><EFBFBD><EFBFBD>嚗㚁<EFBFBD><EFBFBD><EFBFBD><EFBFBD>舀<EFBFBD><EFBFBD>剔<EFBFBD>蝏凋<EFBFBD><EFBFBD>?
|
||||
|
||||
**閫<><E996AB><EFBFBD>寞<EFBFBD>嚗鑹edis + 隞餃𦛚<E9A483>笔<EFBFBD>**
|
||||
```typescript
|
||||
@@ -862,11 +862,11 @@ screeningQueue.process(async (job) => {
|
||||
|
||||
**隡睃<E99AA1>蝥改<E89DA5>V1.0**嚗<><E59A97>撉䔶<E69289><E494B6>吔<EFBFBD>
|
||||
|
||||
#### (8)错误处理与重试
|
||||
#### 嚗?嚗厰<E59A97>霂臬<E99C82><E887AC><EFBFBD><EFBFBD><EFBFBD>滩<EFBFBD>
|
||||
|
||||
**问题**:LLM 调用可能失败(网络、超时、限流)。
|
||||
**<EFBFBD>桅<EFBFBD>**嚗匁LM 靚<>鍂<EFBFBD>航<EFBFBD>憭梯揖嚗<E68F96><E59A97>蝏栶<E89D8F><E6A0B6><EFBFBD><EFBFBD>嗚<EFBFBD><E5979A><EFBFBD>瘚<EFBFBD><E7989A><EFBFBD>?
|
||||
|
||||
**解决方案:指数退避重试**
|
||||
**閫<EFBFBD><EFBFBD><EFBFBD>寞<EFBFBD>嚗𡁏<EFBFBD><EFBFBD>圈<EFBFBD><EFBFBD><EFBFBD>輸<EFBFBD>霂?*
|
||||
```typescript
|
||||
async function retryWithBackoff<T>(
|
||||
fn: () => Promise<T>,
|
||||
@@ -892,30 +892,30 @@ async function retryWithBackoff<T>(
|
||||
|
||||
## <20><> <20><><EFBFBD>舫<EFBFBD>匧<EFBFBD><E58CA7>餌<EFBFBD>
|
||||
|
||||
### MVP 阶段必选技术
|
||||
### MVP <EFBFBD>嗆挾敹<EFBFBD><EFBFBD>㗇<EFBFBD><EFBFBD>?
|
||||
|
||||
| 层级 | 技术 | 用途 |
|
||||
| 撅<EFBFBD>漣 | <20><><EFBFBD>?| <20>券<EFBFBD>?|
|
||||
|------|------|------|
|
||||
| **<2A>滨垢** | `xlsx` | Excel 閫<><E996AB> |
|
||||
| **<2A>滨垢** | `PDF.js` | PDF 憸<><E686B8> |
|
||||
| **<2A>滨垢** | `KaTeX` | <20>砍<EFBFBD>皜脫<E79A9C> |
|
||||
| **后端** | `ExtractionClient` | 调用 Python 微服务 |
|
||||
| **<EFBFBD>𡒊垢** | `ExtractionClient` | 靚<EFBFBD>鍂 Python 敺格<EFBFBD><EFBFBD>?|
|
||||
| **<2A>𡒊垢** | `UnpaywallClient` | <20><>讃銝贝蝸 |
|
||||
| **Python** | `Nougat` | <20>望<EFBFBD> PDF <20>𣂼<EFBFBD> |
|
||||
| **Python** | `PyMuPDF` | 快速 PDF 提取 |
|
||||
| **数据库** | `asl_schema` | 数据存储 |
|
||||
| **Python** | `PyMuPDF` | 敹恍<EFBFBD>?PDF <EFBFBD>𣂼<EFBFBD> |
|
||||
| **<EFBFBD>唳旿摨?* | `asl_schema` | <EFBFBD>唳旿摮睃<EFBFBD> |
|
||||
|
||||
### V1.0 增强技术
|
||||
### V1.0 憓𧼮撩<EFBFBD><EFBFBD><EFBFBD>?
|
||||
|
||||
| 技术 | 用途 |
|
||||
| <EFBFBD><EFBFBD><EFBFBD>?| <20>券<EFBFBD>?|
|
||||
|------|------|
|
||||
| Crossref API | 元数据补全 |
|
||||
| Crossref API | <EFBFBD><EFBFBD>㺭<EFBFBD>株‘<EFBFBD>?|
|
||||
| Bull Queue | 隞餃𦛚<E9A483>笔<EFBFBD> |
|
||||
| Redis | 进度持久化 |
|
||||
| Redis | 餈𥕦漲<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?|
|
||||
|
||||
### V2.0 高级技术
|
||||
### V2.0 擃条漣<EFBFBD><EFBFBD><EFBFBD>?
|
||||
|
||||
| 技术 | 用途 |
|
||||
| <EFBFBD><EFBFBD><EFBFBD>?| <20>券<EFBFBD>?|
|
||||
|------|------|
|
||||
| Table Transformer | 銵冽聢蝎曄&<E69B84>𣂼<EFBFBD> |
|
||||
| GROBID | 撘閧鍂閫<E98D82><E996AB> |
|
||||
@@ -932,63 +932,63 @@ AIclinicalresearch/docs/03-业务模块/ASL-AI智能文献/
|
||||
<EFBFBD>婙<EFBFBD><EFBFBD><EFBFBD> 05-瘚贝<E7989A><E8B49D><EFBFBD>﹝/
|
||||
<20>鎿<EFBFBD><E98EBF><EFBFBD> 01-瘚贝<E7989A>霈∪<E99C88>.md
|
||||
<20>鎿<EFBFBD><E98EBF><EFBFBD> 02-<2D><><EFBFBD><EFBFBD>䁅<EFBFBD><E48185>萘<EFBFBD>瘚贝<E7989A><E8B49D>其<EFBFBD>.md
|
||||
└── 03-测试数据/ ← 新建文件夹
|
||||
├── README.md ← 说明文档
|
||||
<EFBFBD>婙<EFBFBD><EFBFBD><EFBFBD> 03-瘚贝<E7989A><E8B49D>唳旿/ <20>?<3F>啣遣<E595A3><E981A3>辣憭?
|
||||
<EFBFBD>鎿<EFBFBD><EFBFBD><EFBFBD> README.md <EFBFBD>?霂湔<E99C82><E6B994><EFBFBD>﹝
|
||||
<20>鎿<EFBFBD><E98EBF><EFBFBD> screening-test-data/
|
||||
│ ├── literature-list-199.xlsx ← 199 篇文献列表
|
||||
│ ├── picos-criteria.txt ← PICOS 标准
|
||||
│ └── expected-results.json ← 预期结果(金标准)
|
||||
<EFBFBD>? <20>鎿<EFBFBD><E98EBF><EFBFBD> literature-list-199.xlsx <EFBFBD>?199 蝭<EFBFBD><EFBFBD><EFBFBD>桀<EFBFBD>銵?
|
||||
<EFBFBD>? <20>鎿<EFBFBD><E98EBF><EFBFBD> picos-criteria.txt <EFBFBD>?PICOS <EFBFBD><EFBFBD><EFBFBD>
|
||||
<EFBFBD>? <20>婙<EFBFBD><E5A999><EFBFBD> expected-results.json <EFBFBD>?憸<><E686B8>蝏𤘪<E89D8F>嚗<EFBFBD><E59A97><EFBFBD><EFBFBD><EFBFBD>嚗?
|
||||
<20>鎿<EFBFBD><E98EBF><EFBFBD> pdf-samples/
|
||||
│ ├── sample-rct-01.pdf
|
||||
│ ├── sample-cohort-01.pdf
|
||||
│ └── README.md
|
||||
<EFBFBD>? <20>鎿<EFBFBD><E98EBF><EFBFBD> sample-rct-01.pdf
|
||||
<EFBFBD>? <20>鎿<EFBFBD><E98EBF><EFBFBD> sample-cohort-01.pdf
|
||||
<EFBFBD>? <20>婙<EFBFBD><E5A999><EFBFBD> README.md
|
||||
<20>婙<EFBFBD><E5A999><EFBFBD> extraction-test-data/
|
||||
<20>婙<EFBFBD><E5A999><EFBFBD> README.md
|
||||
```
|
||||
|
||||
**推荐结构:**
|
||||
**<2A>刻<EFBFBD>蝏𤘪<E89D8F>嚗?*
|
||||
```
|
||||
05-瘚贝<E7989A><E8B49D><EFBFBD>﹝/
|
||||
<EFBFBD>鎿<EFBFBD><EFBFBD><EFBFBD> 01-瘚贝<E7989A>霈∪<E99C88>.md
|
||||
<EFBFBD>鎿<EFBFBD><EFBFBD><EFBFBD> 02-<2D><><EFBFBD><EFBFBD>䁅<EFBFBD><E48185>萘<EFBFBD>瘚贝<E7989A><E8B49D>其<EFBFBD>.md
|
||||
<EFBFBD>婙<EFBFBD><EFBFBD><EFBFBD> 03-瘚贝<E7989A><E8B49D>唳旿/
|
||||
├── README.md ← 重要!说明测试数据来源、版权、使用方法
|
||||
<EFBFBD>鎿<EFBFBD><EFBFBD><EFBFBD> README.md <EFBFBD>?<3F>滩<EFBFBD>嚗<EFBFBD>秩<EFBFBD>擧<EFBFBD>霂閙㺭<E99699>格䔉皞僐<E79A9E><E58390><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝙<EFBFBD>冽䲮瘜?
|
||||
<20>鎿<EFBFBD><E98EBF><EFBFBD> screening/
|
||||
│ ├── literature-list-199.xlsx
|
||||
│ ├── picos-criteria.txt
|
||||
│ ├── inclusion-criteria.txt
|
||||
│ ├── exclusion-criteria.txt
|
||||
│ └── gold-standard.json ← 人工标注的正确答案
|
||||
<EFBFBD>? <20>鎿<EFBFBD><E98EBF><EFBFBD> literature-list-199.xlsx
|
||||
<EFBFBD>? <20>鎿<EFBFBD><E98EBF><EFBFBD> picos-criteria.txt
|
||||
<EFBFBD>? <20>鎿<EFBFBD><E98EBF><EFBFBD> inclusion-criteria.txt
|
||||
<EFBFBD>? <20>鎿<EFBFBD><E98EBF><EFBFBD> exclusion-criteria.txt
|
||||
<EFBFBD>? <20>婙<EFBFBD><E5A999><EFBFBD> gold-standard.json <EFBFBD>?鈭箏極<E7AE8F><E6A5B5>釣<EFBFBD><E987A3>迤蝖桃<E89D96>獢?
|
||||
<20>婙<EFBFBD><E5A999><EFBFBD> pdf-extraction/
|
||||
<20>鎿<EFBFBD><E98EBF><EFBFBD> sample-01-high-quality.pdf
|
||||
<20>鎿<EFBFBD><E98EBF><EFBFBD> sample-02-with-tables.pdf
|
||||
<20>婙<EFBFBD><E5A999><EFBFBD> sample-03-chinese.pdf
|
||||
```
|
||||
|
||||
**README.md 示例:**
|
||||
**README.md 蝷箔<EFBFBD>嚗?*
|
||||
```markdown
|
||||
# ASL 测试数据集
|
||||
# ASL 瘚贝<EFBFBD><EFBFBD>唳旿<EFBFBD>?
|
||||
|
||||
## <20><> <20>唳旿霂湔<E99C82>
|
||||
|
||||
### 1. <20><><EFBFBD><EFBFBD>䁅<EFBFBD><E48185>萘<EFBFBD>瘚贝<E7989A><E8B49D>唳旿
|
||||
- **<2A><>辣**: `literature-list-199.xlsx`
|
||||
- **数量**: 199 篇英文医学文献
|
||||
- **字段**: 标题、摘要、DOI、作者、年份、期刊
|
||||
- **<2A>圈<EFBFBD>**: 199 蝭<>㘚<EFBFBD><E3989A>龫摮行<E691AE><E8A18C>?
|
||||
- **摮埈挾**: <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD>閬<EFBFBD><E996AC><EFBFBD>OI<4F><49><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>僑隞賬<E99A9E><E8B3AC><EFBFBD><EFBFBD>?
|
||||
- **<2A>交<EFBFBD>**: [<5B>讛膩<E8AE9B>唳旿<E594B3>交<EFBFBD>]
|
||||
- **<2A><><EFBFBD>**: [霂湔<E99C82><E6B994><EFBFBD><EFBFBD>靽⊥<E99DBD>]
|
||||
|
||||
### 2. PICOS <20><><EFBFBD>
|
||||
- **<2A><>辣**: `picos-criteria.txt`
|
||||
- **<2A><>捆**: Population, Intervention, Comparison, Outcome, Study Design
|
||||
- **纳入标准**: 5 条
|
||||
- **排除标准**: 8 条
|
||||
- **蝥喳<E89DA5><E596B3><EFBFBD><EFBFBD>**: 5 <EFBFBD>?
|
||||
- **<2A>㘾膄<E398BE><E88684><EFBFBD>**: 8 <EFBFBD>?
|
||||
|
||||
### 3. 金标准(人工标注结果)
|
||||
### 3. <EFBFBD>烐<EFBFBD><EFBFBD><EFBFBD><EFBFBD>鈭箏極<EFBFBD><EFBFBD>釣蝏𤘪<EFBFBD>嚗?
|
||||
- **<2A><>辣**: `gold-standard.json`
|
||||
- **标注人**: [标注专家信息]
|
||||
- **<2A><>釣鈭?*: [<5B><>釣銝枏振靽⊥<E99DBD>]
|
||||
- **<2A><>釣<EFBFBD>園𡢿**: [<5B>園𡢿]
|
||||
- **预期准确率**: ≥ 90%
|
||||
- **憸<><E686B8><EFBFBD><EFBFBD>&<EFBFBD>?*: <EFBFBD>?90%
|
||||
|
||||
## <20>㴓 雿輻鍂<E8BCBB>寞<EFBFBD>
|
||||
|
||||
@@ -997,15 +997,15 @@ AIclinicalresearch/docs/03-业务模块/ASL-AI智能文献/
|
||||
npm run test:asl:screening
|
||||
```
|
||||
|
||||
### 评估准确率
|
||||
### 霂<EFBFBD>摯<EFBFBD><EFBFBD>&<EFBFBD>?
|
||||
```bash
|
||||
npm run test:asl:evaluate -- --gold-standard gold-standard.json
|
||||
```
|
||||
|
||||
## <20><> 憸<><E686B8>蝏𤘪<E89D8F>
|
||||
- 纳入: 45 篇
|
||||
- 排除: 132 篇
|
||||
- 不确定: 22 篇
|
||||
- 蝥喳<E89DA5>: 45 蝭?
|
||||
- <20>㘾膄: 132 蝭?
|
||||
- 銝滨&摰? 22 蝭?
|
||||
```
|
||||
|
||||
---
|
||||
@@ -1013,13 +1013,13 @@ npm run test:asl:evaluate -- --gold-standard gold-standard.json
|
||||
## <20><> <20>詨<EFBFBD><E8A9A8><EFBFBD>﹝
|
||||
|
||||
- [韐券<EFBFBD>靽嗪<EFBFBD>銝𤾸虾餈賣滲蝑𣇉裦](./06-韐券<E99F90>靽嗪<E99DBD>銝𤾸虾餈賣滲蝑𣇉裦.md)
|
||||
- [数据库设计](./01-数据库设计.md)
|
||||
- [<EFBFBD>唳旿摨栞挽霈((./01-<2D>唳旿摨栞挽霈?md)
|
||||
- [API 霈曇恣閫<E681A3><E996AB>](./02-API霈曇恣閫<E681A3><E996AB>.md)
|
||||
- [<5B><>﹝<EFBFBD>𣂼<EFBFBD>敺格<E695BA><E6A0BC>((../../../../extraction_service/README.md)
|
||||
|
||||
---
|
||||
|
||||
**更新日志**:
|
||||
**<EFBFBD>湔鰵<EFBFBD>亙<EFBFBD>**嚗?
|
||||
- 2025-11-15: <20>𥕦遣<F0A595A6><E981A3>﹝嚗<EFB99D><E59A97>銋匧<E98A8B>蝑䜘<E89D91><E49C98><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>桐<EFBFBD>頧賣<E9A0A7><E8B3A3>舫<EFBFBD>匧<EFBFBD>
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user