Major Changes: - Add StreamingService with OpenAI Compatible format - Upgrade Chat component V2 with Ant Design X integration - Implement AIA module with 12 intelligent agents - Update API routes to unified /api/v1 prefix - Update system documentation Backend (~1300 lines): - common/streaming: OpenAI Compatible adapter - modules/aia: 12 agents, conversation service, streaming integration - Update route versions (RVW, PKB to v1) Frontend (~3500 lines): - modules/aia: AgentHub + ChatWorkspace (100% prototype restoration) - shared/Chat: AIStreamChat, ThinkingBlock, useAIStream Hook - Update API endpoints to v1 Documentation: - AIA module status guide - Universal capabilities catalog - System overview updates - All module documentation sync Tested: Stream response verified, authentication working Status: AIA V2.0 core completed (85%)
1027 lines
30 KiB
Markdown
1027 lines
30 KiB
Markdown
# ASL 譁<>鍵螟<E98DB5>炊謚譛ッ騾牙梛
|
||
|
||
> **譁<>。」迚域悽<E59F9F>?* V1.0
|
||
> **蛻帛サコ譌・譛滂シ?* 2025-11-15
|
||
> **騾ら畑讓。蝮暦シ?* AI 譎コ閭ス譁<EFBDBD>鍵<EFBFBD><E98DB5>SL<53>?
|
||
> **逶ョ譬<EFBDAE>シ?* 螳壻ケ牙<EFBDB9>遲帙∝<C280>譁<EFBFBD>、咲ュ帙∝<C280>譁<EFBFBD>署蜿也噪謚譛ッ譬亥柱螳樒鴫霍ッ蠕?
|
||
|
||
---
|
||
|
||
## <20>搭 譁<>。」讎りソー
|
||
|
||
ASL 讓。蝮玲カ牙所荳臥ァ堺ク榊酔逧<E98594>枚迪ョ螟<EFBDAE>炊蝨コ譎ッ<E8AD8E>梧ッ冗ァ榊惻譎ッ譛我ク榊酔逧<E98594>橿譛ッ迚ケ轤ケ蜥悟ョ樒鴫譁ケ譯茨シ?
|
||
|
||
| 蝨コ譎ッ | 霎灘<E99C8E>譬シ蠑<EFBDBC> | 譬ク蠢<EFBDB8>橿譛?| 荳サ隕∵倦謌<E580A6> |
|
||
|------|---------|---------|---------|
|
||
| **譬<>「俶遭隕∝<E99A95>遲<EFBFBD>** | Excel 譁<>サカ | Excel 隗」譫<EFBDA3> + LLM 遲幃?| 謇ケ驥丞、<E4B89E>炊謨育紫 |
|
||
| **蜈ィ譁<EFBDA8>、咲ュ<E592B2>** | PDF 蜈ィ譁<EFBDA8> | PDF 謠仙叙 + LLM 遲幃?| PDF 隗」譫仙㊥遑ョ邇?|
|
||
| **蜈ィ譁<EFBDA8>焚謐ョ謠仙叙** | PDF 蜈ィ譁<EFBDA8> | PDF 謠仙叙 + LLM 扈捺桷蛹匁署蜿?| 陦ィ譬シ縲∝<E7B8B2>蠑丞㊥遑ョ謠仙<E8ACA0>?|
|
||
|
||
---
|
||
|
||
## <20>識 謚譛ッ譫カ譫<EFBDB6>サ隗<EFBDBB>
|
||
|
||
```
|
||
笏娯楳笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏?
|
||
笏? ASL 譁<>鍵螟<E98DB5>炊豬∫ィ<E288AB> 笏?
|
||
笏披楳笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏?
|
||
笏?
|
||
笏懌楳 蝨コ譎ッ 1: 譬<>「俶遭隕∝<E99A95>遲<EFBFBD>
|
||
笏? 笏披楳 逕ィ謌キ荳贋シ<E8B48B> Excel 竊?隗」譫<EFBDA3> 竊?LLM 謇ケ驥冗ュ幃?竊?蟇シ蜃コ扈捺棡
|
||
笏?
|
||
笏懌楳 蝨コ譎ッ 2: 蜈ィ譁<EFBDA8>、咲ュ<E592B2>
|
||
笏? 笏披楳 逕ィ謌キ荳贋シ<E8B48B> PDF 竊?PDF 謠仙叙 竊?LLM 遲幃?竊?螟肴<E89E9F>ク
|
||
笏?
|
||
笏披楳 蝨コ譎ッ 3: 蜈ィ譁<EFBDA8>焚謐ョ謠仙叙
|
||
笏披楳 PDF 竊?謠仙叙 + 扈捺桷蛹?竊?LLM 謠仙叙謨ー謐ョ 竊?莠コ蟾・螟肴<E89E9F>ク
|
||
|
||
笏娯楳笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏?
|
||
笏? 謚譛ッ譬亥<E8ADAC>螻よ楔譫<E6A594>シ亥<EFBDBC>莠ォ<E88EA0><EFBDAB> 笏?
|
||
笏懌楳笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏?
|
||
笏? 蜑咲ォッ螻? React 19 + Ant Design 5 + xlsx/exceljs 笏?
|
||
笏懌楳笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏?
|
||
笏? 蜷守ォッ螻? Node.js (Fastify) + TypeScript 笏?
|
||
笏懌楳笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏?
|
||
笏? 譁<>。」螟<EFBDA3>炊螻? Python 蠕ョ譛榊<E8AD9B>?(extraction_service) 笏?
|
||
笏? 笏懌楳 PyMuPDF: 蠢ォ騾?PDF 謠仙叙 笏?
|
||
笏? 笏懌楳 Nougat: 闍ア譁<EFBDB1>ァ大ュヲ譁<EFBDA6>鍵鬮倩エィ驥乗署蜿?箝? 笏?
|
||
笏? 笏披楳 Language Detector: 閾ェ蜉ィ隸ュ險譽豬? 笏?
|
||
笏懌楳笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏?
|
||
笏? LLM 螻? DeepSeek-V3 + Qwen3 / GPT-5 + Claude-4.5 笏?
|
||
笏懌楳笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏?
|
||
笏? 謨ー謐ョ蠎? PostgreSQL 15 (asl_schema) 笏?
|
||
笏披楳笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏?
|
||
```
|
||
|
||
---
|
||
|
||
## <20>東 蝨コ譎ッ 1: 譬<>「俶遭隕∝<E99A95>遲<EFBFBD>
|
||
|
||
### 1.1 謚譛ッ迚ケ轤?
|
||
|
||
- **霎灘<E99C8E>譬シ蠑<EFBDBC>**: Excel 譁<>サカ (`.xlsx` / `.xls`)
|
||
- **謨ー謐ョ隗<EFBDAE>ィ。**: 50-500 遽<>枚迪?謇ケ谺。
|
||
- **荳サ隕∝ュ玲ョオ**: 譬<>「倥∵遭隕√.OI縲∽ス懆<E68786>∝書陦ィ蟷エ莉ス縲∵悄蛻?
|
||
- **螟<>炊驥咲せ**: 謇ケ驥城ォ俶譜螟<E8AD9C>炊<EFBFBD>梧裏髴 PDF 隗」譫<EFBDA3>
|
||
|
||
### 1.2 謚譛ッ騾牙梛
|
||
|
||
#### 蜑咲ォッ<EFBDAB>哘xcel 荳贋シ<E8B48B>荳手ァ」譫?
|
||
|
||
| 謚譛?| 蠎?| 逕ィ騾?| 莨伜漢 |
|
||
|------|-----|------|------|
|
||
| **Excel 荳贋シ<E8B48B>** | `antd Upload` | 譁<>サカ荳贋シ<E8B48B>扈<EFBFBD>サカ | 諡匁郷荳贋シ<E8B48B>縲∬ソ帛コヲ譚。 |
|
||
| **Excel 隗」譫<EFBDA3>** | `xlsx` / `exceljs` | 蜑咲ォッ隗」譫<EFBDA3> Excel | 郤ッ蜑咲ォッ螟<EFBDAF>炊<EFBFBD>悟ソォ騾滄「<E6BB84>ァ?|
|
||
| **讓。譚ソ鬪瑚ッ<E7919A>** | 閾ェ螳壻ケ蛾サ霎<EFBDBB> | 譬。鬪悟<E9ACAA>蜷榊柱謨ー謐ョ譬シ蠑?| 謠仙燕蜿醍鴫譬シ蠑城漠隸ッ |
|
||
|
||
**謗ィ闕先婿譯茨シ啻xlsx` 蠎難シ<E99BA3>heetJS<4A>?*
|
||
- 笨?謾ッ謖<EFBDAF> `.xlsx` 蜥?`.xls` 譬シ蠑<EFBDBC>
|
||
- 笨?郤?JavaScript<70>悟燕遶ッ逶エ謗・隗」譫?
|
||
- 笨?菴鍋ァッ蟆擾シ<E693BE>600KB<4B>会シ梧ァ閭ス螂?
|
||
- 笨?謾ッ謖∝、ァ譁<EFBDA7>サカ<EFBDBB><EFBDB6>1000+ 陦鯉シ<E9AF89>
|
||
|
||
**莉」遐∫、コ萓具シ?*
|
||
```typescript
|
||
import * as XLSX from 'xlsx';
|
||
|
||
function parseExcel(file: File): Promise<Literature[]> {
|
||
return new Promise((resolve, reject) => {
|
||
const reader = new FileReader();
|
||
|
||
reader.onload = (e) => {
|
||
try {
|
||
const data = new Uint8Array(e.target.result as ArrayBuffer);
|
||
const workbook = XLSX.read(data, { type: 'array' });
|
||
|
||
// 隸サ蜿也ャャ荳荳ェ蟾・菴懆。ィ
|
||
const sheetName = workbook.SheetNames[0];
|
||
const worksheet = workbook.Sheets[sheetName];
|
||
|
||
// 霓ャ謐「荳?JSON
|
||
const jsonData = XLSX.utils.sheet_to_json(worksheet);
|
||
|
||
// 譏<>蟆<EFBFBD>クコ譬<EFBDBA>㊥譬シ蠑?
|
||
const literatures = jsonData.map((row: any) => ({
|
||
title: row['Title'] || row['譬<>「<EFBFBD>'],
|
||
abstract: row['Abstract'] || row['鞫倩ヲ<E580A9>'],
|
||
doi: row['DOI'],
|
||
authors: row['Authors'] || row['菴懆?],
|
||
year: row['Year'] || row['蟷エ莉ス'],
|
||
journal: row['Journal'] || row['譛溷<E8AD9B>'],
|
||
}));
|
||
|
||
resolve(literatures);
|
||
} catch (error) {
|
||
reject(new Error('Excel 隗」譫仙、ア雍・'));
|
||
}
|
||
};
|
||
|
||
reader.onerror = () => reject(new Error('譁<>サカ隸サ蜿門、ア雍・'));
|
||
reader.readAsArrayBuffer(file);
|
||
});
|
||
}
|
||
```
|
||
|
||
#### 蜷守ォッ<EFBDAB>壽音驥冗ュ幃牙、<E78999><EFBDA4>?
|
||
|
||
**螟<>炊豬∫ィ具シ?*
|
||
```
|
||
Excel 謨ー謐ョ 竊?謇ケ驥丞<E9A9A5>扈<EFBFBD>シ?0-20 遽?扈<>シ俄<EFBDBC>?蟷カ陦瑚ー<E7919A>畑 LLM 竊?豎<>サ扈捺<E68988>?
|
||
```
|
||
|
||
**蜈ウ髞ョ謚譛ッ轤ケ<E8BDA4>?*
|
||
1. **謇ケ驥丞<E9A9A5>扈<EFBFBD>**<2A>夐∩蜈榊黒谺。隸キ豎りソ<E3828A>、ァ<EFBDA4><EFBDA7>10-20 遽?扈<>怙莨?
|
||
2. **蟷カ陦悟、<E6829F>炊**<2A>壻スソ逕?`Promise.all` 蟷カ陦瑚ー<E7919A>畑 LLM
|
||
3. **霑帛コヲ謗ィ騾?*<2A>啗ebSocket 螳樊慮謗ィ騾∝、<E2889D>炊霑帛コ?
|
||
4. **譁ュ轤ケ扈ュ莨<EFBDAD>**<2A>壽髪謖∽ササ蜉。荳ュ譁ュ蜷守サァ扈ュ
|
||
|
||
**莉」遐∫、コ萓具シ?*
|
||
```typescript
|
||
async function batchScreening(
|
||
literatures: Literature[],
|
||
protocol: Protocol,
|
||
progressCallback: (progress: number) => void
|
||
) {
|
||
const batchSize = 15;
|
||
const batches = chunk(literatures, batchSize);
|
||
const results = [];
|
||
|
||
for (let i = 0; i < batches.length; i++) {
|
||
const batch = batches[i];
|
||
|
||
// 蟷カ陦悟、<E6829F>炊蠖灘燕謇ケ谺。
|
||
const batchResults = await Promise.all(
|
||
batch.map(lit => dualModelScreening(lit, protocol))
|
||
);
|
||
|
||
results.push(...batchResults);
|
||
|
||
// 謗ィ騾∬ソ帛コ?
|
||
const progress = Math.round(((i + 1) / batches.length) * 100);
|
||
progressCallback(progress);
|
||
}
|
||
|
||
return results;
|
||
}
|
||
```
|
||
|
||
### 1.3 謨ー謐ョ豬?
|
||
|
||
```
|
||
逕ィ謌キ謫堺ス<EFBFBD> 蜑咲ォッ螟<EFBDAF>炊 蜷守ォッ螟<EFBDAF>炊 LLM 螟<>炊
|
||
笏? 笏? 笏? 笏?
|
||
笏懌楳 荳贋シ<E8B48B> Excel 笏? 笏? 笏?
|
||
笏? 笏披楳笏笏笏笏笏笏笏笏笏笏笏笏笏竊停狽 笏? 笏?
|
||
笏? 笏懌楳 隗」譫<EFBDA3> Excel 笏? 笏?
|
||
笏? 笏懌楳 鬪瑚ッ∵<EFBDAF>シ蠑<EFBDBC> 笏? 笏?
|
||
笏? 笏懌楳 譏セ遉コ鬚<EFBDBA>ァ<EFBFBD> 笏? 笏?
|
||
笏? 笏? 笏? 笏?
|
||
笏? 笏懌楳 謠蝉コ、遲幃我ササ蜉? 笏? 笏?
|
||
笏? 笏? 笏披楳笏笏笏笏笏笏笏笏笏笏笏笏笏笏竊停狽 笏?
|
||
笏? 笏? 笏懌楳 菫晏ュ倅ササ蜉。 笏?
|
||
笏? 笏? 笏懌楳 蛻<>サ<EFBFBD>シ?5 遽?扈<>シ<EFBFBD> 笏?
|
||
笏? 笏? 笏? 笏?
|
||
笏? 笏? 笏懌楳 謇ケ谺。 1 笏?
|
||
笏? 笏? 笏? 笏披楳笏笏笏笏笏笏笏笏笏笏笏笏笏竊停狽
|
||
笏? 笏? 笏? 笏懌楳 DeepSeek 遲幃?
|
||
笏? 笏? 笏? 笏懌楳 Qwen3 遲幃?
|
||
笏? 笏? 笏? 笏懌楳 蟇ケ豈皮サ捺棡
|
||
笏? 笏? 笏? 竊絶楳笏笏笏笏笏笏笏笏笏笏笏笏笏笏?
|
||
笏? 笏? 笏懌楳 菫晏ュ倡サ捺棡 笏?
|
||
笏? 笏? 笏? 笏?
|
||
笏? 笏? 笏懌楳 謇ケ谺。 2... 笏?
|
||
笏? 笏? 笏? 笏?
|
||
笏? 笏? 竊絶楳笏笏笏笏笏笏笏笏笏笏笏笏笏笏笏?霑泌屓螳梧紛扈捺棡 笏?
|
||
笏? 竊絶楳笏笏笏笏笏笏笏笏笏笏笏笏笏笏?譏セ遉コ扈捺棡 笏? 笏?
|
||
笏披楳 莠コ蟾・螟肴<E89E9F>ク 笏? 笏? 笏?
|
||
```
|
||
|
||
---
|
||
|
||
## <20>東 蝨コ譎ッ 2 & 3: 蜈ィ譁<EFBDA8>、咲ュ帑ク取焚謐ョ謠仙<E8ACA0>?
|
||
|
||
### 2.1 謚譛ッ迚ケ轤?
|
||
|
||
- **霎灘<E99C8E>譬シ蠑<EFBDBC>**: PDF 譁<>サカ<EFBDBB>郁恭譁<E681AD>現蟄ヲ譁<EFBDA6>鍵<EFBFBD><E98DB5>
|
||
- **譁<>サカ迚ケ轤ケ**:
|
||
- 遘大ュヲ隶コ譁<EFBDBA><E8AD81>シ蠑擾シ域<EFBDBC><E59F9F>「倥∵遭隕√∝シ戊ィ縲∵婿豕輔∫サ捺棡縲∬ョィ隶コ縲∝盾閠<E79BBE>枚迪ョ<E8BFAA><EFBDAE>
|
||
- 蛹<>性螟肴揩陦ィ譬シ縲∝<E7B8B2>蠑上∝崟陦?
|
||
- 騾壼クク 10-30 鬘?
|
||
- **螟<>炊驥咲せ**: 鬮伜㊥遑ョ邇<EFBDAE>署蜿厄シ御ソ晉蕗扈捺桷蜥梧<E89CA5>シ蠑<EFBDBC>
|
||
|
||
### 2.2 謚譛ッ騾牙梛<E78999>啀DF 謠仙叙
|
||
|
||
#### 譬ク蠢<EFBDB8>婿譯茨シ哢ougat + PyMuPDF 鬘コ蠎城剄郤ァ遲也払 箝?
|
||
|
||
**邇ー譛画楔譫<E6A594>**<2A>亥キイ螳樒鴫<E6A892>御ス堺コ?`extraction_service/`<60>会シ<E4BC9A>
|
||
|
||
```python
|
||
# 鬘コ蠎城剄郤ァ遲也払
|
||
def extract_pdf(file_path: str):
|
||
# Step 1: 譽豬玖ッュ險
|
||
language = detect_language(file_path)
|
||
|
||
# Step 2: 荳ュ譁<EFBDAD> PDF 竊?PyMuPDF<44>亥ソォ騾滂シ<E6BB82>
|
||
if language == 'chinese':
|
||
return extract_pdf_pymupdf(file_path)
|
||
|
||
# Step 3: 闍ア譁<EFBDB1> PDF 竊?蟆晁ッ<E69981> Nougat
|
||
if check_nougat_available():
|
||
result = extract_pdf_nougat(file_path)
|
||
|
||
# 雍ィ驥乗」譟・<E8AD9F>磯<EFBFBD>蛟?0.7<EFBFBD>?
|
||
if result['quality_score'] >= 0.7:
|
||
return result # 笨?Nougat 謌仙粥
|
||
|
||
# Step 4: 髯咲コァ蛻?PyMuPDF
|
||
return extract_pdf_pymupdf(file_path)
|
||
```
|
||
|
||
#### 謚譛ッ蟇ケ豈?
|
||
|
||
| 譁ケ譯<EFBDB9> | 莨伜漢 | 蜉」蜉ソ | 騾ら畑蝨コ譎ッ |
|
||
|------|------|------|---------|
|
||
| **Nougat** 箝?| 窶?荳謎クコ遘大ュヲ譁<EFBDA6>鍵隶セ隶。<br>窶?蜈ャ蠑上∬。ィ譬シ蜃<EFBDBC>。ョ邇<EFBDAE>ォ?br>窶?霎灘<E99C8E> Markdown 譬シ蠑<EFBDBC><br>窶?菫晉蕗譁<E89597>。」扈捺桷 | 窶?騾溷コヲ諷「<E8ABB7><EFBDA2>1-2 蛻<>帖/20 鬘オ<E9AC98><EFBDB5><br>窶?髴隕?GPU 蜉<>騾?br>窶?蜀<>ュ伜頃逕ィ螟ァ<E89E9F><EFBDA7>4GB<47>?| 闍ア譁<EFBDB1>現蟄ヲ譁<EFBDA6>鍵蜈ィ譁<EFBDA8>署蜿<E7BDB2> |
|
||
| **PyMuPDF** | 窶?騾溷コヲ蠢ォ<E8A0A2>育ァ堤コァ<EFBDBA>?br>窶?蜀<>ュ伜頃逕ィ菴?br>窶?驛ィ鄂イ邂蜊?| 窶?蜈ャ蠑上∬。ィ譬シ譏謎ク「螟ア<br>窶?郤ッ譁<EFBDAF>悽霎灘<E99C8E>?br>窶?蟶<>ア譏捺キキ荵?| 荳ュ譁<EFBDAD>枚迪ョ縲∝ソォ騾滄「<E6BB84>ァ?|
|
||
| **Adobe API** | 窶?蝠<>ク夂コァ蜃<EFBDA7>。ョ邇<EFBDAE><br>窶?莠醍ォッ螟<EFBDAF>炊 | 窶?髴莉倩エケ<br>窶?鄂醍サ應セ晁オ<E69981><br>窶?髫千ァ<E58D83>」朱勦 | 荳肴耳闕撰シ域<EFBDBC>譛ャ鬮假シ<E58187> |
|
||
| **Tesseract OCR** | 窶?蠑貅仙<E8B285>雍?br>窶?謾ッ謖∝、夊ッュ險 | 窶?髴隕∝崟蜒城「<E59F8E>、<EFBFBD>炊<br>窶?蜃<>。ョ邇<EFBDAE>ク咲ィウ螳<EFBDB3> | 謇ォ謠冗<E8ACA0>?PDF<44>亥、<E4BAA5>会シ<E4BC9A> |
|
||
|
||
**謗ィ闕先婿譯茨シ哢ougat<61>井クサ<EFBDB8>?+ PyMuPDF<44>磯剄郤ァ<E983A4><EFBDA7> 箝?*
|
||
|
||
#### Nougat 譬ク蠢<EFBDB8>シ伜漢<E4BC9C>亥現蟄ヲ譁<EFBDA6>鍵蝨コ譎ッ<E8AD8E><EFBDAF>
|
||
|
||
```
|
||
笨?荳謎クコ遘大ュヲ譁<EFBDA6>鍵隶セ隶。
|
||
笏懌楳 隶ュ扈<EFBDAD>焚謐ョ<E8AC90>啾rXiv 隶コ譁<EFBDBA> + 遘大ュヲ譛溷<E8AD9B>
|
||
笏懌楳 蜈ャ蠑剰ッ<E589B0>悪<EFBFBD>哭aTeX 譬シ蠑剰セ灘<EFBDBE>
|
||
笏懌楳 陦ィ譬シ菫晉蕗<E69989>哺arkdown 陦ィ譬シ譬シ蠑<EFBDBC>
|
||
笏披楳 扈捺桷蛹冶セ灘<EFBDBE><E78198>夂ォ<E5A482>闃ゅ∵ョオ關ス貂<EFBDBD><E8B282>?
|
||
|
||
笨?霎灘<E99C8E>譬シ蠑擾シ哺arkdown
|
||
笏懌楳 譬<>「伜アらコァ<EFBDBA>? ## ###
|
||
笏懌楳 陦ィ譬シ<E8ADAC>嘶 Header | Data |
|
||
笏懌楳 蜈ャ蠑擾シ?$ formula $$
|
||
笏披楳 蠑慕畑<E68595>喙1] [2] [3]
|
||
|
||
笨?雍ィ驥剰ッ<E589B0>シー譛コ蛻カ
|
||
笏懌楳 閾ェ蜉ィ雍ィ驥剰ッ<E589B0><EFBDAF><EFBFBD>?-1<>?
|
||
笏懌楳 菴手エィ驥剰<E9A9A5>蜉ィ髯咲コ?PyMuPDF
|
||
笏披楳 菫晁ッ∵署蜿匁<E89CBF>蜉溽<E89C89>?
|
||
```
|
||
|
||
#### 螳樒鴫扈<E9B4AB>鰍
|
||
|
||
**譛榊苅譫カ譫<EFBDB6>シ?*
|
||
```
|
||
Node.js Backend (Port 3001)
|
||
笏?
|
||
笏懌楳 隹<>畑 ExtractionClient.ts
|
||
笏? 笏披楳 HTTP 隸キ豎<EFBDB7> 竊?Python 蠕ョ譛榊<E8AD9B>?
|
||
笏?
|
||
Python Extraction Service (Port 8000)
|
||
笏?
|
||
笏懌楳 /api/extract/pdf
|
||
笏? 笏懌楳 detect_language()
|
||
笏? 笏懌楳 extract_pdf_nougat() 竊?Nougat Model
|
||
笏? 笏披楳 extract_pdf_pymupdf() 竊?PyMuPDF
|
||
笏?
|
||
笏披楳 /api/health
|
||
笏披楳 譽譟?Nougat 蜿ッ逕ィ諤?
|
||
```
|
||
|
||
**Node.js 隹<>畑莉」遐<EFBDA3>シ?*
|
||
```typescript
|
||
import { extractionClient } from '@common/document/ExtractionClient';
|
||
|
||
async function extractLiteraturePDF(file: Buffer, filename: string) {
|
||
try {
|
||
// 譁ケ豕<EFBDB9> 1: 閾ェ蜉ィ騾画叫<E794BB>域耳闕撰シ<E692B0>
|
||
const result = await extractionClient.extractPdf(
|
||
file,
|
||
filename,
|
||
'auto'
|
||
);
|
||
|
||
// 譁ケ豕<EFBDB9> 2: 蠑コ蛻カ菴ソ逕ィ Nougat
|
||
// const result = await extractionClient.extractPdf(file, filename, 'nougat');
|
||
|
||
return {
|
||
text: result.text,
|
||
method: result.method, // "nougat" | "pymupdf"
|
||
quality: result.metadata.quality_score,
|
||
pageCount: result.metadata.page_count,
|
||
hasTables: result.metadata.has_tables,
|
||
hasFormulas: result.metadata.has_formulas
|
||
};
|
||
} catch (error) {
|
||
console.error('PDF extraction failed:', error);
|
||
throw error;
|
||
}
|
||
}
|
||
```
|
||
|
||
**Python 謠仙叙莉」遐<EFBDA3>シ?*
|
||
```python
|
||
# extraction_service/services/nougat_extractor.py
|
||
|
||
def extract_pdf_nougat(file_path: str) -> Dict[str, Any]:
|
||
"""
|
||
菴ソ逕ィ Nougat 謠仙叙 PDF 譁<>悽
|
||
|
||
蜻ス莉、陦瑚ー<E7919A>畑<EFBFBD><E79591>
|
||
nougat <pdf_path> -o <output_dir> --markdown --no-skipping
|
||
"""
|
||
cmd = [
|
||
'nougat',
|
||
file_path,
|
||
'-o', output_dir,
|
||
'--markdown', # 霎灘<E99C8E> Markdown 譬シ蠑<EFBDBC>
|
||
'--no-skipping' # 荳崎キウ霑<EFBDB3>ササ菴暮。オ髱?
|
||
]
|
||
|
||
# 謇ァ陦<EFBDA7> Nougat<61>郁カ<E98381><EFBDB6>?5 蛻<>帖<EFBFBD>?
|
||
process = subprocess.Popen(cmd, ...)
|
||
stdout, stderr = process.communicate(timeout=300)
|
||
|
||
# 隸サ蜿冶セ灘<EFBDBE>譁<EFBFBD>サカ<EFBDBB>?mmd<6D>?
|
||
markdown_text = read_output_file()
|
||
|
||
# 雍ィ驥剰ッ<E589B0>シー
|
||
quality_score = evaluate_nougat_quality(markdown_text)
|
||
|
||
return {
|
||
"success": True,
|
||
"method": "nougat",
|
||
"text": markdown_text,
|
||
"format": "markdown",
|
||
"metadata": {
|
||
"quality_score": quality_score,
|
||
"has_tables": detect_tables(markdown_text),
|
||
"has_formulas": detect_formulas(markdown_text)
|
||
}
|
||
}
|
||
```
|
||
|
||
### 2.3 譁<>悽蜷主、<E4B8BB><EFBDA4>?
|
||
|
||
**Nougat 霎灘<E99C8E>莨伜喧<E4BC9C>?*
|
||
```typescript
|
||
function postProcessNougatOutput(markdown: string): ProcessedText {
|
||
return {
|
||
// 蜴溷ァ<E6BAB7> Markdown
|
||
raw: markdown,
|
||
|
||
// 遶<>闃ょ<E99783>蜑イ
|
||
sections: extractSections(markdown), // {abstract, methods, results, ...}
|
||
|
||
// 陦ィ譬シ謠仙叙
|
||
tables: extractTables(markdown),
|
||
|
||
// 蜈ャ蠑乗署蜿<E7BDB2>
|
||
formulas: extractFormulas(markdown),
|
||
|
||
// 郤ッ譁<EFBDAF>悽<EFBFBD>亥悉髯、譬シ蠑擾シ?
|
||
plainText: markdownToPlainText(markdown),
|
||
|
||
// 扈捺桷蛹匁焚謐ョ<E8AC90>育畑莠<E79591> LLM<4C>?
|
||
structured: {
|
||
title: extractTitle(markdown),
|
||
abstract: extractAbstract(markdown),
|
||
methodology: extractMethodology(markdown),
|
||
results: extractResults(markdown),
|
||
}
|
||
};
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## <20>東 蝨コ譎ッ 4: 譁<>鍵荳玖スス<EFBDBD><EFBDBD>npaywall API<50>俄ュ<E4BF84>
|
||
|
||
### 3.1 謚譛ッ閭梧<E996AD>?
|
||
|
||
**Unpaywall** 譏ッ荳荳ェ蜈崎エケ逧<EFBDB9>シ謾セ闔キ蜿厄シ<E58E84>pen Access<73>画枚迪?API<50>悟庄莉・<E88E89><EFBDA5>
|
||
- 笨?騾夊ソ<E5A48A> DOI 譟・隸「譁<EFBDA2>鍵譏ッ蜷ヲ譛牙<E8AD9B>雍ケ蜈ィ譁?
|
||
- 笨?闔キ蜿門粋豕慕<E8B195>?PDF 荳玖スス體セ謗・
|
||
- 笨?螳悟<E89EB3>蜈崎エケ<EFBDB4>梧裏髴莉倩エケ
|
||
- 笨?謨ー謐ョ蠎楢ヲ<E6A5A2><EFBDA6>?3000+ 荳<>ッ<EFBFBD>枚迪ョ
|
||
|
||
**螳倡ス<E580A1>**: https://unpaywall.org/products/api
|
||
|
||
### 3.2 謚譛ッ騾牙梛
|
||
|
||
#### API 隹<>畑譁ケ蠑<EFBDB9>
|
||
|
||
**蝓コ遑菫。諱ッ<E8ABB1>?*
|
||
- **API 遶ッ轤ケ**: `https://api.unpaywall.org/v2/{doi}?email={your_email}`
|
||
- **隸キ豎よ婿豕<E5A9BF>**: GET
|
||
- **隶、隸∵婿蠑<E5A9BF>**: 譌<>髴 API Key<65>御サ<E5BEA1>怙謠蝉セ幃ぐ邂ア
|
||
- **騾溽紫髯仙宛**: 100,000 谺?螟ゥ<E89E9F>亥<EFBFBD>雍ケ<E99B8D>?
|
||
|
||
**遉コ萓玖ッキ豎ゑシ?*
|
||
```bash
|
||
curl "https://api.unpaywall.org/v2/10.1038/nature12373?email=YOUR_EMAIL"
|
||
```
|
||
|
||
**蜩榊コ皮、コ萓具シ?*
|
||
```json
|
||
{
|
||
"doi": "10.1038/nature12373",
|
||
"title": "The genome of the woodland strawberry",
|
||
"is_oa": true,
|
||
"oa_status": "gold",
|
||
"best_oa_location": {
|
||
"url": "https://www.nature.com/articles/nature12373.pdf",
|
||
"url_for_pdf": "https://www.nature.com/articles/nature12373.pdf",
|
||
"url_for_landing_page": "https://www.nature.com/articles/nature12373",
|
||
"license": "cc-by",
|
||
"version": "publishedVersion"
|
||
},
|
||
"oa_locations": [...]
|
||
}
|
||
```
|
||
|
||
#### Node.js 螳樒鴫
|
||
|
||
**譛榊苅蟆∬」<E288AC>シ?*
|
||
```typescript
|
||
// backend/src/common/literature/UnpaywallClient.ts
|
||
|
||
import axios from 'axios';
|
||
import { config } from '../../config/env';
|
||
|
||
export interface UnpaywallResult {
|
||
doi: string;
|
||
title: string;
|
||
isOA: boolean; // 譏ッ蜷ヲ蠑謾セ闔キ蜿?
|
||
oaStatus: string; // "gold" | "green" | "hybrid" | "bronze" | "closed"
|
||
pdfUrl: string | null; // PDF 荳玖スス體セ謗・
|
||
landingPageUrl: string; // 譁<>鍵鬘オ髱「體セ謗・
|
||
license: string | null; // 隶ク蜿ッ蜊剰ョョ
|
||
version: string | null; // "publishedVersion" | "acceptedVersion"
|
||
}
|
||
|
||
class UnpaywallClient {
|
||
private baseUrl = 'https://api.unpaywall.org/v2';
|
||
private email: string;
|
||
|
||
constructor(email: string = config.unpaywallEmail) {
|
||
this.email = email;
|
||
}
|
||
|
||
/**
|
||
* 騾夊ソ<E5A48A> DOI 譟・隸「譁<EFBDA2>鍵菫。諱ッ
|
||
*/
|
||
async getByDoi(doi: string): Promise<UnpaywallResult> {
|
||
try {
|
||
const url = `${this.baseUrl}/${doi}?email=${this.email}`;
|
||
const response = await axios.get(url, {
|
||
timeout: 10000, // 10 遘定カ<E5AE9A><EFBDB6>?
|
||
});
|
||
|
||
const data = response.data;
|
||
|
||
// 闔キ蜿匁怙菴ウ荳玖スス菴咲ス?
|
||
const bestOA = data.best_oa_location;
|
||
|
||
return {
|
||
doi: data.doi,
|
||
title: data.title,
|
||
isOA: data.is_oa,
|
||
oaStatus: data.oa_status,
|
||
pdfUrl: bestOA?.url_for_pdf || null,
|
||
landingPageUrl: bestOA?.url_for_landing_page || data.doi_url,
|
||
license: bestOA?.license || null,
|
||
version: bestOA?.version || null,
|
||
};
|
||
} catch (error) {
|
||
if (axios.isAxiosError(error)) {
|
||
if (error.response?.status === 404) {
|
||
throw new Error(`DOI not found: ${doi}`);
|
||
}
|
||
}
|
||
throw new Error(`Unpaywall API error: ${error.message}`);
|
||
}
|
||
}
|
||
|
||
/**
|
||
* 謇ケ驥乗衍隸「<E99AB8>亥クヲ騾溽紫髯仙宛<E4BB99>?
|
||
*/
|
||
async getBatch(dois: string[]): Promise<UnpaywallResult[]> {
|
||
const results = [];
|
||
|
||
for (const doi of dois) {
|
||
try {
|
||
const result = await this.getByDoi(doi);
|
||
results.push(result);
|
||
|
||
// 騾溽紫髯仙宛<E4BB99>?00ms/隸キ豎<EFBDB7>
|
||
await new Promise(resolve => setTimeout(resolve, 100));
|
||
} catch (error) {
|
||
console.error(`Failed to fetch ${doi}:`, error.message);
|
||
results.push(null); // 螟ア雍・鬘ケ譬<EFBDB9>ョー荳コ null
|
||
}
|
||
}
|
||
|
||
return results.filter(r => r !== null);
|
||
}
|
||
|
||
/**
|
||
* 荳玖スス PDF 譁<>サカ
|
||
*/
|
||
async downloadPdf(pdfUrl: string, outputPath: string): Promise<void> {
|
||
try {
|
||
const response = await axios.get(pdfUrl, {
|
||
responseType: 'arraybuffer',
|
||
timeout: 60000, // 1 蛻<>帖雜<E5B896>慮
|
||
});
|
||
|
||
const fs = require('fs');
|
||
fs.writeFileSync(outputPath, response.data);
|
||
} catch (error) {
|
||
throw new Error(`PDF download failed: ${error.message}`);
|
||
}
|
||
}
|
||
}
|
||
|
||
export const unpaywallClient = new UnpaywallClient();
|
||
```
|
||
|
||
**邇ッ蠅<EFBDAF>序驥城<E9A9A5>鄂ョ<E98482>?*
|
||
```env
|
||
# .env
|
||
UNPAYWALL_EMAIL=your-email@example.com
|
||
```
|
||
|
||
#### 荳壼苅髮<E88B85><E9ABAE>
|
||
|
||
**蝨コ譎ッ 1<>壽音驥乗」譟・譁<EFBDA5>鍵譏ッ蜷ヲ蜿ッ荳玖スス**
|
||
```typescript
|
||
async function checkLiteratureAvailability(literatures: Literature[]) {
|
||
const dois = literatures
|
||
.map(lit => lit.doi)
|
||
.filter(doi => doi); // 霑<>サ、遨?DOI
|
||
|
||
const results = await unpaywallClient.getBatch(dois);
|
||
|
||
return literatures.map(lit => ({
|
||
...lit,
|
||
downloadable: results.find(r => r.doi === lit.doi)?.isOA || false,
|
||
pdfUrl: results.find(r => r.doi === lit.doi)?.pdfUrl || null,
|
||
}));
|
||
}
|
||
```
|
||
|
||
**蝨コ譎ッ 2<>夂畑謌キ轤ケ蜃サ荳玖スス蜈ィ譁?*
|
||
```typescript
|
||
async function downloadLiteratureFullText(doi: string) {
|
||
// Step 1: 譟・隸「 Unpaywall
|
||
const unpaywallResult = await unpaywallClient.getByDoi(doi);
|
||
|
||
if (!unpaywallResult.pdfUrl) {
|
||
throw new Error('隸・譁<EFBDA5>鍵譌<E98DB5>蜈崎エケ蜈ィ譁<EFBDA8>');
|
||
}
|
||
|
||
// Step 2: 荳玖スス PDF
|
||
const filename = `${doi.replace(/\//g, '_')}.pdf`;
|
||
const outputPath = `./downloads/${filename}`;
|
||
|
||
await unpaywallClient.downloadPdf(unpaywallResult.pdfUrl, outputPath);
|
||
|
||
// Step 3: 謠仙叙譁<E58F99>悽<EFBFBD>郁ー<E98381><EFBDB0>?extraction_service<63>?
|
||
const extractionResult = await extractionClient.extractPdf(
|
||
fs.readFileSync(outputPath),
|
||
filename,
|
||
'auto'
|
||
);
|
||
|
||
return {
|
||
pdfPath: outputPath,
|
||
text: extractionResult.text,
|
||
method: extractionResult.method,
|
||
};
|
||
}
|
||
```
|
||
|
||
### 3.3 蜑咲ォッ髮<EFBDAF><E9ABAE>
|
||
|
||
**謇ケ驥丈ク玖スス謖蛾聴<E89BBE>?*
|
||
```typescript
|
||
// 謇ケ驥乗」譟・蜿ッ荳玖スス諤?
|
||
async function checkDownloadable(selectedRows: Literature[]) {
|
||
setLoading(true);
|
||
|
||
const results = await api.checkLiteratureAvailability(selectedRows);
|
||
|
||
const downloadableCount = results.filter(r => r.downloadable).length;
|
||
|
||
message.success(`蜿醍鴫 ${downloadableCount} 遽<>庄荳玖スス蜈ィ譁㌔);
|
||
setLiteratures(results);
|
||
setLoading(false);
|
||
}
|
||
|
||
// 荳玖スス蜈ィ譁<EFBDA8>
|
||
async function downloadFullText(literature: Literature) {
|
||
if (!literature.downloadable) {
|
||
message.warning('隸・譁<EFBDA5>鍵譌<E98DB5>蜈崎エケ蜈ィ譁<EFBDA8>');
|
||
return;
|
||
}
|
||
|
||
try {
|
||
const result = await api.downloadLiteratureFullText(literature.doi);
|
||
message.success('荳玖スス謌仙粥');
|
||
|
||
// 謇灘シ PDF 譟・逵句<E980B5>?
|
||
openPdfViewer(result.pdfPath);
|
||
} catch (error) {
|
||
message.error(`荳玖スス螟ア雍・: ${error.message}`);
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## <20>剥 陦・蜈<EFBDA5>橿譛ッ轤ケ
|
||
|
||
### 4.1 謔ィ謠仙芦逧<E88AA6>橿譛ッ轤ケ諤サ扈<EFBDBB>
|
||
|
||
| 謚譛ッ轤ケ | 迥カ諤?| 隸エ譏<EFBDB4> |
|
||
|--------|------|------|
|
||
| 笨?Nougat 讓。蝙<EFBDA1> | 蟾イ螳樒<E89EB3>?| `extraction_service/services/nougat_extractor.py` |
|
||
| 笨?PyMuPDF | 蟾イ螳樒<E89EB3>?| `extraction_service/services/pdf_extractor.py` |
|
||
| 笨?鬘コ蠎城剄郤ァ遲也払 | 蟾イ螳樒<E89EB3>?| 闍ア譁<EFBDB1><E8AD81>Nougat<61>御クュ譁<EFBDAD><E8AD81>PyMuPDF |
|
||
| <20><> Unpaywall API | 髴譁ー蠅<EFBDB0> | 譛ャ譁<EFBDAC>。」謠蝉セ帛ョ樒鴫譁ケ譯?|
|
||
| 笨?Excel 隗」譫<EFBDA3> | 髴譁ー蠅<EFBDB0> | 菴ソ逕ィ `xlsx` 蠎難シ亥燕遶ッ<E981B6>?|
|
||
|
||
### 4.2 蜿ッ閭ス驕玲シ冗噪謚譛ッ轤ケ 箝?
|
||
|
||
#### <20>?<3F>芽。ィ譬シ謠仙叙蠅槫シ?
|
||
|
||
**髣ョ鬚<EFBDAE>**<2A>哢ougat 陌ス辟カ菫晉蕗陦ィ譬シ扈捺桷<E68DBA>御ス<E5BEA1> LLM 逶エ謗・螟<EFBDA5>炊 Markdown 陦ィ譬シ蜿ッ閭ス荳榊㊥遑ョ縲?
|
||
|
||
**隗」蜀ウ譁ケ譯茨シ啜able Transformer**
|
||
```python
|
||
# 菴ソ逕ィ蠕ョ霓ッ逧?Table Transformer 讓。蝙<EFBDA1>
|
||
# https://github.com/microsoft/table-transformer
|
||
|
||
from transformers import TableTransformerForObjectDetection
|
||
import torch
|
||
|
||
def extract_tables_enhanced(pdf_path: str):
|
||
"""
|
||
菴ソ逕ィ Table Transformer 邊セ遑ョ螳壻ス崎。ィ譬シ
|
||
"""
|
||
model = TableTransformerForObjectDetection.from_pretrained(
|
||
"microsoft/table-transformer-detection"
|
||
)
|
||
|
||
# 譽豬玖。ィ譬シ菴咲ス?
|
||
tables = model.detect_tables(pdf_path)
|
||
|
||
# 謠仙叙豈丈クェ陦ィ譬シ
|
||
for table in tables:
|
||
table_image = crop_table(pdf_path, table.bbox)
|
||
table_data = ocr_table(table_image)
|
||
|
||
return structured_tables
|
||
```
|
||
|
||
**莨伜<E88EA8>郤ァ<E983A4>啖2.0**<EFBFBD><EFBFBD>VP 髦カ谿オ Nougat 雜ウ螟滂シ?
|
||
|
||
#### <20>?<3F>牙シ慕畑隗」譫蝉ク朱得謗・
|
||
|
||
**髣ョ鬚<EFBDAE>**<EFBFBD>夂ァ大ュヲ譁<EFBFBD>鍵蛹<EFBFBD>性螟ァ驥丞シ慕<EFBFBD>?`[1] [2] [3]`<EFBFBD>碁怙隕∬ァ」譫仙ケカ體セ謗・蛻ー蜿り<EFBFBD>枚迪ョ縲?
|
||
|
||
**隗」蜀ウ譁ケ譯茨シ哦ROBID**
|
||
```python
|
||
# GROBID: 蠑貅千ァ大ュヲ譁<EFBDA6>鍵隗」譫仙キ・蜈?
|
||
# https://github.com/kermitt2/grobid
|
||
|
||
import requests
|
||
|
||
def parse_references(pdf_path: str):
|
||
"""
|
||
菴ソ逕ィ GROBID 隗」譫仙盾閠<E79BBE>枚迪?
|
||
"""
|
||
with open(pdf_path, 'rb') as f:
|
||
files = {'input': f}
|
||
response = requests.post(
|
||
'http://localhost:8070/api/processFulltextDocument',
|
||
files=files
|
||
)
|
||
|
||
# 霑泌屓扈捺桷蛹也噪蠑慕畑蛻苓。ィ
|
||
return response.json()['references']
|
||
```
|
||
|
||
**莨伜<E88EA8>郤ァ<E983A4>啖2.0**<EFBFBD>磯撼譬ク蠢<EFBFBD>粥閭ス<EFBFBD>?
|
||
|
||
#### <20>?<3F>牙<EFBFBD>蠑剰ッ<E589B0>悪荳取クイ譟<EFBDB2>
|
||
|
||
**髣ョ鬚<EFBDAE>**<EFBFBD>哢ougat 霎灘<E99C8E> LaTeX 蜈ャ蠑擾シ悟燕遶ッ髴隕∵クイ譟薙?
|
||
|
||
**隗」蜀ウ譁ケ譯茨シ哮aTeX / MathJax**
|
||
```typescript
|
||
// 蜑咲ォッ貂イ譟<EFBDB2> LaTeX 蜈ャ蠑<EFBDAC>
|
||
import katex from 'katex';
|
||
import 'katex/dist/katex.min.css';
|
||
|
||
function renderFormula(latex: string) {
|
||
return katex.renderToString(latex, {
|
||
throwOnError: false,
|
||
displayMode: true,
|
||
});
|
||
}
|
||
```
|
||
|
||
**莨伜<E88EA8>郤ァ<E983A4>哺VP**<EFBFBD>域署蜊<EFBFBD>畑謌キ菴馴ェ鯉シ<EFBFBD>
|
||
|
||
#### <20>?<3F>臼DF 鬚<>ァ井ク取<EFBDB8><E58F96>ウ?
|
||
|
||
**髣ョ鬚<EFBDAE>**<EFBFBD>壻ココ蟾・螟肴<EFBFBD>ク譌カ髴隕∵衍逵句次譁<EFBFBD>シ悟ケカ鬮倅コョ譬<EFBFBD>ウィ縲?
|
||
|
||
**隗」蜀ウ譁ケ譯茨シ啀DF.js + Annotator.js**
|
||
```typescript
|
||
// React 扈<>サカ
|
||
import { Viewer } from '@react-pdf-viewer/core';
|
||
import '@react-pdf-viewer/core/lib/styles/index.css';
|
||
|
||
function PdfViewer({ pdfUrl, annotations }) {
|
||
return (
|
||
<Viewer
|
||
fileUrl={pdfUrl}
|
||
plugins={[
|
||
highlightPlugin({
|
||
highlights: annotations // 鬮倅コョ菴咲スョ
|
||
})
|
||
]}
|
||
/>
|
||
);
|
||
}
|
||
```
|
||
|
||
**莨伜<E88EA8>郤ァ<E983A4>哺VP**<EFBFBD>域<EFBFBD>ク蠢<EFBFBD>粥閭ス<EFBFBD><EFBFBD>
|
||
|
||
#### <20>?<3F>画枚迪ョ蜴サ驥?
|
||
|
||
**髣ョ鬚<EFBDAE>**<EFBFBD>哘xcel 荳贋シ<E8B48B>蜿ッ閭ス蛹<EFBDBD>性驥榊、肴枚迪ョ<E8BFAA>亥酔荳遽<C280>枚迪ョ荳榊酔迚域悽<E59F9F>峨?
|
||
|
||
**隗」蜀ウ譁ケ譯茨シ壼渕莠?DOI 蜥梧<E89CA5><E6A2A7>「倡噪蜴サ驥<EFBDBB>**
|
||
```typescript
|
||
function deduplicateLiteratures(literatures: Literature[]) {
|
||
const seen = new Set();
|
||
|
||
return literatures.filter(lit => {
|
||
// 莨伜<E88EA8>菴ソ逕ィ DOI
|
||
if (lit.doi) {
|
||
if (seen.has(lit.doi)) return false;
|
||
seen.add(lit.doi);
|
||
return true;
|
||
}
|
||
|
||
// 蜷ヲ蛻吩スソ逕ィ譬<EFBDA8>「假シ域<EFBDBC><E59F9F>㊥蛹門錘<E99680><E98C98>
|
||
const normalizedTitle = normalizeTitle(lit.title);
|
||
if (seen.has(normalizedTitle)) return false;
|
||
seen.add(normalizedTitle);
|
||
return true;
|
||
});
|
||
}
|
||
|
||
function normalizeTitle(title: string): string {
|
||
return title
|
||
.toLowerCase()
|
||
.replace(/[^\w\s]/g, '') // 蜴サ髯、譬<EFBDA4>せ
|
||
.replace(/\s+/g, ' ') // 隗<>激蛹也ゥコ譬?
|
||
.trim();
|
||
}
|
||
```
|
||
|
||
**莨伜<E88EA8>郤ァ<E983A4>哺VP**<EFBFBD>亥ソ<EFBFBD>。サ蜉溯<EFBFBD><EFBFBD><EFBFBD>
|
||
|
||
#### <20>?<3F>画枚迪ョ蜈<EFBDAE>焚謐ョ陦・蜈ィ
|
||
|
||
**髣ョ鬚<EFBDAE>**<EFBFBD>哘xcel 荳贋シ<E8B48B>逧<EFBFBD>焚謐ョ蜿ッ閭ス荳榊ョ梧紛<E6A2A7>育シコ DOI縲∝ケエ莉ス遲会シ峨?
|
||
|
||
**隗」蜀ウ譁ケ譯茨シ咾rossref API**
|
||
```typescript
|
||
// 騾夊ソ<E5A48A><EFBDBF><EFBFBD>「俶衍隸「 DOI
|
||
async function enrichMetadata(literature: Literature) {
|
||
if (literature.doi) return literature; // 蟾イ譛<EFBDB2> DOI
|
||
|
||
// 隹<>畑 Crossref API
|
||
const response = await axios.get(
|
||
`https://api.crossref.org/works?query.title=${literature.title}`
|
||
);
|
||
|
||
const match = response.data.message.items[0];
|
||
|
||
return {
|
||
...literature,
|
||
doi: match.DOI,
|
||
year: match['published-print']?.['date-parts'][0][0],
|
||
journal: match['container-title'][0],
|
||
};
|
||
}
|
||
```
|
||
|
||
**莨伜<E88EA8>郤ァ<E983A4>啖1.0**<EFBFBD>亥「槫シコ蜉溯<EFBFBD><EFBFBD><EFBFBD>
|
||
|
||
#### <20>?<3F>画音螟<E99FB3>炊霑帛コヲ謖∽ケ<E288BD><EFBDB9>?
|
||
|
||
**髣ョ鬚<EFBDAE>**<EFBFBD>壽音驥冗ュ幃芽玲慮髟ソ<EFBFBD><EFBFBD>1000 遽?> 10 蛻<>帖<EFBFBD>会シ碁怙謾ッ謖∵妙轤ケ扈ュ莨<EFBDAD>縲?
|
||
|
||
**隗」蜀ウ譁ケ譯茨シ啌edis + 莉サ蜉。髦溷<E9ABA6>**
|
||
```typescript
|
||
// 菴ソ逕ィ Bull 髦溷<E9ABA6>
|
||
import Queue from 'bull';
|
||
|
||
const screeningQueue = new Queue('literature-screening', {
|
||
redis: { host: 'localhost', port: 6379 }
|
||
});
|
||
|
||
// 豺サ蜉<EFBDBB>莉サ蜉。
|
||
screeningQueue.add({
|
||
projectId: 'xxx',
|
||
literatures: [...],
|
||
protocol: {...}
|
||
});
|
||
|
||
// 螟<>炊莉サ蜉。
|
||
screeningQueue.process(async (job) => {
|
||
const { projectId, literatures, protocol } = job.data;
|
||
|
||
for (let i = 0; i < literatures.length; i++) {
|
||
// 螟<>炊蜊慕ッ<E68595>枚迪ョ
|
||
await screenLiterature(literatures[i], protocol);
|
||
|
||
// 譖エ譁ー霑帛コヲ
|
||
job.progress((i + 1) / literatures.length * 100);
|
||
}
|
||
});
|
||
```
|
||
|
||
**莨伜<E88EA8>郤ァ<E983A4>啖1.0**<EFBFBD>井ス馴ェ御シ伜喧<EFBFBD><EFBFBD>
|
||
|
||
#### <20>?<3F>蛾漠隸ッ螟<EFBDAF>炊荳朱㍾隸<E38DBE>
|
||
|
||
**髣ョ鬚<EFBDAE>**<EFBFBD>哭LM 隹<>畑蜿ッ閭ス螟ア雍・<E99B8D>育ス醍サ懊∬カ<E288AC>慮縲<E685AE>剞豬<E5899E>シ峨?
|
||
|
||
**隗」蜀ウ譁ケ譯茨シ壽欠謨ー騾驕ソ驥崎ッ?*
|
||
```typescript
|
||
async function retryWithBackoff<T>(
|
||
fn: () => Promise<T>,
|
||
maxRetries: number = 3
|
||
): Promise<T> {
|
||
for (let i = 0; i < maxRetries; i++) {
|
||
try {
|
||
return await fn();
|
||
} catch (error) {
|
||
if (i === maxRetries - 1) throw error;
|
||
|
||
// 謖<>焚騾驕ソ<E9A995><EFBDBF>1s, 2s, 4s
|
||
const delay = Math.pow(2, i) * 1000;
|
||
await new Promise(resolve => setTimeout(resolve, delay));
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**莨伜<E88EA8>郤ァ<E983A4>哺VP**<EFBFBD>亥ソ<EFBFBD>。サ蜉溯<EFBFBD><EFBFBD><EFBFBD>
|
||
|
||
---
|
||
|
||
## <20>投 謚譛ッ騾牙梛諤サ扈<EFBDBB>
|
||
|
||
### MVP 髦カ谿オ蠢<EFBDB5>画橿譛?
|
||
|
||
| 螻らコァ | 謚譛?| 逕ィ騾?|
|
||
|------|------|------|
|
||
| **蜑咲ォッ** | `xlsx` | Excel 隗」譫<EFBDA3> |
|
||
| **蜑咲ォッ** | `PDF.js` | PDF 鬚<>ァ<EFBFBD> |
|
||
| **蜑咲ォッ** | `KaTeX` | 蜈ャ蠑乗クイ譟<EFBDB2> |
|
||
| **蜷守ォッ** | `ExtractionClient` | 隹<>畑 Python 蠕ョ譛榊<E8AD9B>?|
|
||
| **蜷守ォッ** | `UnpaywallClient` | 譁<>鍵荳玖スス |
|
||
| **Python** | `Nougat` | 闍ア譁<EFBDB1> PDF 謠仙叙 |
|
||
| **Python** | `PyMuPDF` | 蠢ォ騾?PDF 謠仙叙 |
|
||
| **謨ー謐ョ蠎?* | `asl_schema` | 謨ー謐ョ蟄伜お |
|
||
|
||
### V1.0 蠅槫シコ謚譛?
|
||
|
||
| 謚譛?| 逕ィ騾?|
|
||
|------|------|
|
||
| Crossref API | 蜈<>焚謐ョ陦・蜈?|
|
||
| Bull Queue | 莉サ蜉。髦溷<E9ABA6> |
|
||
| Redis | 霑帛コヲ謖∽ケ<E288BD><EFBDB9>?|
|
||
|
||
### V2.0 鬮倡コァ謚譛?
|
||
|
||
| 謚譛?| 逕ィ騾?|
|
||
|------|------|
|
||
| Table Transformer | 陦ィ譬シ邊セ遑ョ謠仙叙 |
|
||
| GROBID | 蠑慕畑隗」譫<EFBDA3> |
|
||
| Semantic Scholar API | 蟄ヲ譛ッ蝗セ隹ア |
|
||
|
||
---
|
||
|
||
## <20>刀 豬玖ッ墓焚謐ョ蟄俶叛蟒コ隶ョ
|
||
|
||
譬ケ謐ョ ASL 讓。蝮礼噪譁<E599AA>サカ螟ケ扈捺桷<E68DBA>梧オ玖ッ墓焚謐ョ蠎碑ッ・謾セ蝨ィ<E89DA8><EFBDA8>
|
||
|
||
```
|
||
AIclinicalresearch/docs/03-荳壼苅讓。蝮<EFBDA1>/ASL-AI譎コ閭ス譁<EFBDBD>鍵/
|
||
笏披楳笏 05-豬玖ッ墓枚譯」/
|
||
笏懌楳笏 01-豬玖ッ戊ョ。蛻<EFBDA1>.md
|
||
笏懌楳笏 02-譬<>「俶遭隕∝<E99A95>遲帶オ玖ッ慕畑萓<E79591>.md
|
||
笏披楳笏 03-豬玖ッ墓焚謐ョ/ 竊?譁ー蟒コ譁<EFBDBA>サカ螟?
|
||
笏懌楳笏 README.md 竊?隸エ譏取枚譯」
|
||
笏懌楳笏 screening-test-data/
|
||
笏? 笏懌楳笏 literature-list-199.xlsx 竊?199 遽<>枚迪ョ蛻苓。?
|
||
笏? 笏懌楳笏 picos-criteria.txt 竊?PICOS 譬<>㊥
|
||
笏? 笏披楳笏 expected-results.json 竊?鬚<>悄扈捺棡<E68DBA>磯≡譬<E289A1>㊥<EFBFBD>?
|
||
笏懌楳笏 pdf-samples/
|
||
笏? 笏懌楳笏 sample-rct-01.pdf
|
||
笏? 笏懌楳笏 sample-cohort-01.pdf
|
||
笏? 笏披楳笏 README.md
|
||
笏披楳笏 extraction-test-data/
|
||
笏披楳笏 README.md
|
||
```
|
||
|
||
**謗ィ闕千サ捺桷<E68DBA>?*
|
||
```
|
||
05-豬玖ッ墓枚譯」/
|
||
笏懌楳笏 01-豬玖ッ戊ョ。蛻<EFBDA1>.md
|
||
笏懌楳笏 02-譬<>「俶遭隕∝<E99A95>遲帶オ玖ッ慕畑萓<E79591>.md
|
||
笏披楳笏 03-豬玖ッ墓焚謐ョ/
|
||
笏懌楳笏 README.md 竊?驥崎ヲ<E5B48E>シ∬ッエ譏取オ玖ッ墓焚謐ョ譚・貅舌∫沿譚<E6B2BF>∽スソ逕ィ譁ケ豕?
|
||
笏懌楳笏 screening/
|
||
笏? 笏懌楳笏 literature-list-199.xlsx
|
||
笏? 笏懌楳笏 picos-criteria.txt
|
||
笏? 笏懌楳笏 inclusion-criteria.txt
|
||
笏? 笏懌楳笏 exclusion-criteria.txt
|
||
笏? 笏披楳笏 gold-standard.json 竊?莠コ蟾・譬<EFBDA5>ウィ逧<EFBDA8>ュ」遑ョ遲疲。?
|
||
笏披楳笏 pdf-extraction/
|
||
笏懌楳笏 sample-01-high-quality.pdf
|
||
笏懌楳笏 sample-02-with-tables.pdf
|
||
笏披楳笏 sample-03-chinese.pdf
|
||
```
|
||
|
||
**README.md 遉コ萓具シ?*
|
||
```markdown
|
||
# ASL 豬玖ッ墓焚謐ョ髮?
|
||
|
||
## <20>搭 謨ー謐ョ隸エ譏<EFBDB4>
|
||
|
||
### 1. 譬<>「俶遭隕∝<E99A95>遲帶オ玖ッ墓焚謐ョ
|
||
- **譁<>サカ**: `literature-list-199.xlsx`
|
||
- **謨ー驥<EFBDB0>**: 199 遽<>恭譁<E681AD>現蟄ヲ譁<EFBDA6><E8AD81>?
|
||
- **蟄玲ョオ**: 譬<>「倥∵遭隕√.OI縲∽ス懆<E68786>∝ケエ莉ス縲∵悄蛻?
|
||
- **譚・貅<EFBDA5>**: [謠剰ソー謨ー謐ョ譚・貅疹
|
||
- **迚域揀**: [隸エ譏守沿譚<E6B2BF>ソ。諱ッ]
|
||
|
||
### 2. PICOS 譬<>㊥
|
||
- **譁<>サカ**: `picos-criteria.txt`
|
||
- **蜀<>ョケ**: Population, Intervention, Comparison, Outcome, Study Design
|
||
- **郤ウ蜈・譬<EFBDA5>㊥**: 5 譚?
|
||
- **謗帝勁譬<E58B81>㊥**: 8 譚?
|
||
|
||
### 3. 驥第<E9A9A5><E7ACAC>㊥<EFBFBD>井ココ蟾・譬<EFBDA5>ウィ扈捺棡<E68DBA>?
|
||
- **譁<>サカ**: `gold-standard.json`
|
||
- **譬<>ウィ莠?*: [譬<>ウィ荳灘ョカ菫。諱ッ]
|
||
- **譬<>ウィ譌カ髣エ**: [譌カ髣エ]
|
||
- **鬚<>悄蜃<E68284>。ョ邇?*: 竕?90%
|
||
|
||
## <20>識 菴ソ逕ィ譁ケ豕<EFBDB9>
|
||
|
||
### 霑占。梧オ玖ッ<E78E96>
|
||
```bash
|
||
npm run test:asl:screening
|
||
```
|
||
|
||
### 隸<>シー蜃<EFBDB0>。ョ邇?
|
||
```bash
|
||
npm run test:asl:evaluate -- --gold-standard gold-standard.json
|
||
```
|
||
|
||
## <20>投 鬚<>悄扈捺棡
|
||
- 郤ウ蜈・: 45 遽?
|
||
- 謗帝勁: 132 遽?
|
||
- 荳咲。ョ螳? 22 遽?
|
||
```
|
||
|
||
---
|
||
|
||
## <20>答 逶ク蜈ウ譁<EFBDB3>。」
|
||
|
||
- [雍ィ驥丈ソ晞囿荳主庄霑ス貅ッ遲也払](./06-雍ィ驥丈ソ晞囿荳主庄霑ス貅ッ遲也払.md)
|
||
- [謨ー謐ョ蠎楢ョセ隶。](./01-謨ー謐ョ蠎楢ョセ隶?md)
|
||
- [API 隶セ隶。隗<EFBDA1>激](./02-API隶セ隶。隗<EFBDA1>激.md)
|
||
- [譁<EFBFBD>。」謠仙叙蠕ョ譛榊苅](../../../../extraction_service/README.md)
|
||
|
||
---
|
||
|
||
**譖エ譁ー譌・蠢<EFBDA5>**<EFBFBD>?
|
||
- 2025-11-15: 蛻帛サコ譁<EFBDBA>。」<EFBDA1>悟ョ壻ケ牙<EFBDB9>遲帙∝<C280>譁<EFBFBD>、<EFBFBD>炊縲∵枚迪ョ荳玖スス謚譛ッ騾牙梛
|
||
|
||
|
||
|