Files
AIclinicalresearch/docs/03-业务模块/ASL-AI智能文献/02-技术设计/07-文献处理技术选型.md
HaHafeng 1b53ab9d52 feat(aia): Complete AIA V2.0 with universal streaming capabilities
Major Changes:
- Add StreamingService with OpenAI Compatible format
- Upgrade Chat component V2 with Ant Design X integration
- Implement AIA module with 12 intelligent agents
- Update API routes to unified /api/v1 prefix
- Update system documentation

Backend (~1300 lines):
- common/streaming: OpenAI Compatible adapter
- modules/aia: 12 agents, conversation service, streaming integration
- Update route versions (RVW, PKB to v1)

Frontend (~3500 lines):
- modules/aia: AgentHub + ChatWorkspace (100% prototype restoration)
- shared/Chat: AIStreamChat, ThinkingBlock, useAIStream Hook
- Update API endpoints to v1

Documentation:
- AIA module status guide
- Universal capabilities catalog
- System overview updates
- All module documentation sync

Tested: Stream response verified, authentication working
Status: AIA V2.0 core completed (85%)
2026-01-14 19:15:01 +08:00

30 KiB
Raw Blame History

ASL 譁<>鍵螟<E98DB5>炊謚€譛ッ騾牙梛

*<EFBFBD>。」迚域悽<EFBFBD>? V1.0
*蛻帛サコ譌・譛滂シ? 2025-11-15
*騾ら畑讓。蝮暦シ? AI 譎コ閭ス譁<EFBDBD><EFBFBD><E98DB5>SL<53>? *逶ョ譬<EFBFBD>シ? 螳壻ケ牙<EFBDB9>遲帙€<C280><EFBFBD>、咲ュ帙€<C280><EFBFBD>署蜿也噪謚€譛ッ譬亥柱螳樒鴫霍ッ蠕?


<EFBFBD>搭 譁<>。」讎りソー

ASL 讓。蝮玲カ牙所荳臥ァ堺ク榊酔逧<E98594>枚迪ョ螟<EFBDAE>炊蝨コ譎ッ<E8AD8E>梧ッ冗ァ榊惻譎ッ譛我ク榊酔逧<E98594>橿譛ッ迚ケ轤ケ蜥悟ョ樒鴫譁ケ譯茨シ?

蝨コ譎ッ 霎灘<EFBFBD>譬シ蠑<EFBFBD> 譬ク蠢<EFBFBD>橿譛? 荳サ隕∵倦謌<EFBFBD>
<EFBFBD>「俶遭隕∝<EFBFBD><EFBFBD> Excel 譁<>サカ Excel 隗」譫<EFBDA3> + LLM 遲幃€? 謇ケ驥丞、<EFBFBD>炊謨育紫
蜈ィ譁<EFBFBD>、咲ュ<EFBFBD> PDF 蜈ィ譁<EFBDA8> PDF 謠仙叙 + LLM 遲幃€? PDF 隗」譫仙㊥遑ョ邇?
蜈ィ譁<EFBFBD>焚謐ョ謠仙叙 PDF 蜈ィ譁<EFBDA8> PDF 謠仙叙 + LLM 扈捺桷蛹匁署蜿? 陦ィ譬シ縲∝<EFBFBD>蠑丞㊥遑ョ謠仙<EFBFBD>?

<EFBFBD>識 謚€譛ッ譫カ譫<EFBDB6>€サ隗<EFBDBB>

笏娯楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏?
笏?                   ASL 譁<>鍵螟<E98DB5>炊豬∫ィ<E288AB>                        笏?
笏披楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏?
           笏?
           笏懌楳 蝨コ譎ッ 1: 譬<>「俶遭隕∝<E99A95><EFBFBD>
           笏?  笏披楳 逕ィ謌キ荳贋シ<E8B48B> Excel 竊?隗」譫<EFBDA3> 竊?LLM 謇ケ驥冗ュ幃€?竊?蟇シ蜃コ扈捺棡
           笏?
           笏懌楳 蝨コ譎ッ 2: 蜈ィ譁<EFBDA8>、咲ュ<E592B2>
           笏?  笏披楳 逕ィ謌キ荳贋シ<E8B48B> PDF 竊?PDF 謠仙叙 竊?LLM 遲幃€?竊?螟肴<E89E9F>ク
           笏?
           笏披楳 蝨コ譎ッ 3: 蜈ィ譁<EFBDA8>焚謐ョ謠仙叙
               笏披楳 PDF 竊?謠仙叙 + 扈捺桷蛹?竊?LLM 謠仙叙謨ー謐ョ 竊?莠コ蟾・螟肴<E89E9F>ク

笏娯楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏?
笏?              謚€譛ッ譬亥<E8ADAC>螻よ楔譫<E6A594>シ亥<EFBDBC>莠ォ<E88EA0><EFBDAB>                        笏?
笏懌楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏?
笏? 蜑咲ォッ螻? React 19 + Ant Design 5 + xlsx/exceljs          笏?
笏懌楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏?
笏? 蜷守ォッ螻? Node.js (Fastify) + TypeScript                  笏?
笏懌楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏?
笏? 譁<>。」螟<EFBDA3>炊螻? Python 蠕ョ譛榊<E8AD9B>?(extraction_service)           笏?
笏?   笏懌楳 PyMuPDF: 蠢ォ騾?PDF 謠仙叙                             笏?
笏?   笏懌楳 Nougat: 闍ア譁<EFBDB1>ァ大ュヲ譁<EFBDA6>鍵鬮倩エィ驥乗署蜿?箝?                  笏?
笏?   笏披楳 Language Detector: 閾ェ蜉ィ隸ュ險€譽€豬?                    笏?
笏懌楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏?
笏? LLM 螻? DeepSeek-V3 + Qwen3 / GPT-5 + Claude-4.5        笏?
笏懌楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏?
笏? 謨ー謐ョ蠎? PostgreSQL 15 (asl_schema)                      笏?
笏披楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏?

<EFBFBD>東 蝨コ譎ッ 1: 譬<>「俶遭隕∝<E99A95><EFBFBD>

1.1 謚€譛ッ迚ケ轤?

  • 霎灘<EFBFBD>譬シ蠑<EFBFBD>: Excel 譁<>サカ (.xlsx / .xls)
  • 謨ー謐ョ隗<EFBFBD>ィ。: 50-500 遽<>枚迪?謇ケ谺。
  • 荳サ隕∝ュ玲ョオ: 譬<>「倥€∵遭隕√€OI縲∽ス懆€<E68786>€∝書陦ィ蟷エ莉ス縲∵悄蛻?
  • <EFBFBD>炊驥咲せ: 謇ケ驥城ォ俶譜螟<E8AD9C><EFBFBD>梧裏髴€ PDF 隗」譫<EFBDA3>

1.2 謚€譛ッ騾牙梛

蜑咲ォッ<EFBFBD>哘xcel 荳贋シ<E8B48B>荳手ァ」譫?

謚€譛? 蠎? 逕ィ騾? 莨伜漢
Excel 荳贋シ<E8B48B> antd Upload <EFBFBD>サカ荳贋シ<EFBFBD><EFBFBD>サカ 諡匁郷荳贋シ<EFBFBD>縲∬ソ帛コヲ譚。
Excel 隗」譫<EFBDA3> xlsx / exceljs 蜑咲ォッ隗」譫<EFBFBD> Excel 郤ッ蜑咲ォッ螟<EFBFBD><EFBFBD>悟ソォ騾滄「<EFBFBD>ァ?
讓。譚ソ鬪瑚ッ<EFBFBD> 閾ェ螳壻ケ蛾€サ霎<EFBFBD> 譬。鬪悟<EFBFBD>蜷榊柱謨ー謐ョ譬シ蠑? 謠仙燕蜿醍鴫譬シ蠑城漠隸ッ

*謗ィ闕先婿譯茨シ啻xlsx` 蠎難シ<E99BA3>heetJS<4A>?

  • 笨?謾ッ謖<EFBDAF> .xlsx 蜥?.xls 譬シ蠑<EFBDBC>
  • 笨?郤?JavaScript<70>悟燕遶ッ逶エ謗・隗」譫?
  • 笨?菴鍋ァッ蟆擾シ<E693BE>600KB<4B>会シ梧€ァ閭ス螂?
  • 笨?謾ッ謖∝、ァ譁<EFBDA7>サカ<EFBDBB><EFBDB6>1000+ 陦鯉シ<E9AF89>

*莉」遐∫、コ萓具シ?

import * as XLSX from 'xlsx';

function parseExcel(file: File): Promise<Literature[]> {
  return new Promise((resolve, reject) => {
    const reader = new FileReader();
    
    reader.onload = (e) => {
      try {
        const data = new Uint8Array(e.target.result as ArrayBuffer);
        const workbook = XLSX.read(data, { type: 'array' });
        
        // 隸サ蜿也ャャ荳€荳ェ蟾・菴懆。ィ
        const sheetName = workbook.SheetNames[0];
        const worksheet = workbook.Sheets[sheetName];
        
        // 霓ャ謐「荳?JSON
        const jsonData = XLSX.utils.sheet_to_json(worksheet);
        
        // 譏<><EFBFBD>クコ譬<EFBDBA>㊥譬シ蠑?
        const literatures = jsonData.map((row: any) => ({
          title: row['Title'] || row['譬<><EFBFBD>'],
          abstract: row['Abstract'] || row['鞫倩ヲ<E580A9>'],
          doi: row['DOI'],
          authors: row['Authors'] || row['菴懆€?],
          year: row['Year'] || row['蟷エ莉ス'],
          journal: row['Journal'] || row['譛溷<EFBFBD>'],
        }));
        
        resolve(literatures);
      } catch (error) {
        reject(new Error('Excel 隗」譫仙、ア雍・'));
      }
    };
    
    reader.onerror = () => reject(new Error('<EFBFBD>サカ隸サ蜿門、ア雍・'));
    reader.readAsArrayBuffer(file);
  });
}

蜷守ォッ<EFBFBD>壽音驥冗ュ幃€牙、<EFBFBD><EFBFBD>?

*<EFBFBD>炊豬∫ィ具シ?

Excel 謨ー謐ョ 竊?謇ケ驥丞<E9A9A5><EFBFBD>シ?0-20 遽?扈<>シ俄<EFBDBC>?蟷カ陦瑚ー<E7919A>畑 LLM 竊?豎<>€サ扈捺<E68988>?

*蜈ウ髞ョ謚€譛ッ轤ケ<EFBFBD>?

  1. 謇ケ驥丞<EFBFBD><EFBFBD><EFBFBD>夐∩蜈榊黒谺。隸キ豎りソ<EFBFBD>、ァ<EFBFBD><EFBFBD>10-20 遽?扈<>怙莨?
  2. 蟷カ陦悟、<EFBFBD><EFBFBD>壻スソ逕?Promise.all 蟷カ陦瑚ー<E7919A>畑 LLM
  3. **霑帛コヲ謗ィ騾?*<2A>啗ebSocket 螳樊慮謗ィ騾∝、<E2889D>炊霑帛コ?
  4. 譁ュ轤ケ扈ュ莨<EFBFBD><EFBFBD>壽髪謖∽ササ蜉。荳ュ譁ュ蜷守サァ扈ュ

*莉」遐∫、コ萓具シ?

async function batchScreening(
  literatures: Literature[],
  protocol: Protocol,
  progressCallback: (progress: number) => void
) {
  const batchSize = 15;
  const batches = chunk(literatures, batchSize);
  const results = [];
  
  for (let i = 0; i < batches.length; i++) {
    const batch = batches[i];
    
    // 蟷カ陦悟、<E6829F>炊蠖灘燕謇ケ谺。
    const batchResults = await Promise.all(
      batch.map(lit => dualModelScreening(lit, protocol))
    );
    
    results.push(...batchResults);
    
    // 謗ィ騾∬ソ帛コ?
    const progress = Math.round(((i + 1) / batches.length) * 100);
    progressCallback(progress);
  }
  
  return results;
}

1.3 謨ー謐ョ豬?

逕ィ謌キ謫堺ス<EFBFBD>             蜑咲ォッ螟<EFBDAF>炊              蜷守ォッ螟<EFBDAF>炊            LLM 螟<>炊
   笏?                   笏?                    笏?                  笏?
   笏懌楳 荳贋シ<E8B48B> Excel        笏?                    笏?                  笏?
   笏?   笏披楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€竊停狽                     笏?                  笏?
   笏?                   笏懌楳 隗」譫<EFBDA3> Excel         笏?                  笏?
   笏?                   笏懌楳 鬪瑚ッ∵<EFBDAF>シ蠑<EFBDBC>           笏?                  笏?
   笏?                   笏懌楳 譏セ遉コ鬚<EFBDBA><EFBFBD>           笏?                  笏?
   笏?                   笏?                    笏?                  笏?
   笏?                   笏懌楳 謠蝉コ、遲幃€我ササ蜉?      笏?                  笏?
   笏?                   笏?   笏披楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€竊停狽                   笏?
   笏?                   笏?                    笏懌楳 菫晏ュ倅ササ蜉。         笏?
   笏?                   笏?                    笏懌楳 蛻<><EFBFBD>シ?5 遽?扈<><EFBFBD> 笏?
   笏?                   笏?                    笏?                  笏?
   笏?                   笏?                    笏懌楳 謇ケ谺。 1           笏?
   笏?                   笏?                    笏?   笏披楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€竊停狽
   笏?                   笏?                    笏?                  笏懌楳 DeepSeek 遲幃€?
   笏?                   笏?                    笏?                  笏懌楳 Qwen3 遲幃€?
   笏?                   笏?                    笏?                  笏懌楳 蟇ケ豈皮サ捺棡
   笏?                   笏?                    笏?   竊絶楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏?
   笏?                   笏?                    笏懌楳 菫晏ュ倡サ捺棡         笏?
   笏?                   笏?                    笏?                  笏?
   笏?                   笏?                    笏懌楳 謇ケ谺。 2...        笏?
   笏?                   笏?                    笏?                  笏?
   笏?                   笏?   竊絶楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏?霑泌屓螳梧紛扈捺棡       笏?
   笏?   竊絶楳笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏€笏?譏セ遉コ扈捺棡            笏?                  笏?
   笏披楳 莠コ蟾・螟肴<E89E9F>ク          笏?                    笏?                  笏?

<EFBFBD>東 蝨コ譎ッ 2 & 3: 蜈ィ譁<EFBDA8>、咲ュ帑ク取焚謐ョ謠仙<E8ACA0>?

2.1 謚€譛ッ迚ケ轤?

  • 霎灘<EFBFBD>譬シ蠑<EFBFBD>: PDF 譁<>サカ<EFBDBB>郁恭譁<E681AD>現蟄ヲ譁<EFBDA6><EFBFBD><E98DB5>
  • <EFBFBD>サカ迚ケ轤ケ:
    • 遘大ュヲ隶コ譁<EFBFBD><EFBFBD>シ蠑擾シ域<EFBFBD><EFBFBD>「倥€∵遭隕√€∝シ戊ィ€縲∵婿豕輔€∫サ捺棡縲∬ョィ隶コ縲∝盾閠<EFBFBD>枚迪ョ<EFBFBD><EFBFBD>
    • <EFBFBD>性螟肴揩陦ィ譬シ縲∝<EFBFBD>蠑上€∝崟陦?
    • 騾壼クク 10-30 鬘?
  • <EFBFBD>炊驥咲せ: 鬮伜㊥遑ョ邇<EFBDAE>署蜿厄シ御ソ晉蕗扈捺桷蜥梧<E89CA5>シ蠑<EFBDBC>

2.2 謚€譛ッ騾牙梛<E78999>啀DF 謠仙叙

譬ク蠢<EFBFBD>婿譯茨シ哢ougat + PyMuPDF 鬘コ蠎城剄郤ァ遲也払 箝?

邇ー譛画楔譫<EFBFBD><EFBFBD>亥キイ螳樒鴫<EFBFBD>御ス堺コ?extraction_service/<EFBFBD>会シ<EFBFBD>

# 鬘コ蠎城剄郤ァ遲也払
def extract_pdf(file_path: str):
    # Step 1: 譽€豬玖ッュ險€
    language = detect_language(file_path)
    
    # Step 2: 荳ュ譁<EFBDAD> PDF 竊?PyMuPDF<44>亥ソォ騾滂シ<E6BB82>
    if language == 'chinese':
        return extract_pdf_pymupdf(file_path)
    
    # Step 3: 闍ア譁<EFBDB1> PDF 竊?蟆晁ッ<E69981> Nougat
    if check_nougat_available():
        result = extract_pdf_nougat(file_path)
        
        # 雍ィ驥乗」€譟・<E8AD9F><EFBFBD>蛟?0.7<EFBFBD>?
        if result['quality_score'] >= 0.7:
            return result  # 笨?Nougat 謌仙粥
    
    # Step 4: 髯咲コァ蛻?PyMuPDF
    return extract_pdf_pymupdf(file_path)

謚€譛ッ蟇ケ豈?

譁ケ譯<EFBFBD> 莨伜漢 蜉」蜉ソ 騾ら畑蝨コ譎ッ
Nougat 箝? 窶?荳謎クコ遘大ュヲ譁<EFBDA6>鍵隶セ隶。
窶?蜈ャ蠑上€∬。ィ譬シ蜃<EFBDBC>。ョ邇<EFBDAE>ォ?br>窶?霎灘<E99C8E> Markdown 譬シ蠑<EFBDBC>
窶?菫晉蕗譁<E89597>。」扈捺桷
窶?騾溷コヲ諷「<E8ABB7><EFBDA2>1-2 蛻<>帖/20 鬘オ<E9AC98><EFBDB5>
窶?髴€隕?GPU 蜉<>騾?br>窶?蜀<>ュ伜頃逕ィ螟ァ<E89E9F><EFBDA7>4GB<47>?
闍ア譁<EFBFBD>現蟄ヲ譁<EFBFBD>鍵蜈ィ譁<EFBFBD>署蜿<EFBFBD>
PyMuPDF 窶?騾溷コヲ蠢ォ<E8A0A2>育ァ堤コァ<EFBDBA>?br>窶?蜀<>ュ伜頃逕ィ菴?br>窶?驛ィ鄂イ邂€蜊? 窶?蜈ャ蠑上€∬。ィ譬シ譏謎ク「螟ア
窶?郤ッ譁<EFBDAF>悽霎灘<E99C8E>?br>窶?蟶<>€譏捺キキ荵?
荳ュ譁<EFBFBD>枚迪ョ縲∝ソォ騾滄「<EFBFBD>ァ?
Adobe API 窶?蝠<>ク夂コァ蜃<EFBDA7>。ョ邇<EFBDAE>
窶?莠醍ォッ螟<EFBDAF>
窶?髴€莉倩エケ
窶?鄂醍サ應セ晁オ<E69981>
窶?髫千ァ<E58D83>」朱勦
荳肴耳闕撰シ域<EFBFBD>譛ャ鬮假シ<EFBFBD>
Tesseract OCR 窶?蠑€貅仙<E8B285>雍?br>窶?謾ッ謖∝、夊ッュ險€ 窶?髴€隕∝崟蜒城「<E59F8E><EFBFBD>
窶?蜃<>。ョ邇<EFBDAE>ク咲ィウ螳<EFBDB3>
謇ォ謠冗<EFBFBD>?PDF<44>亥、<E4BAA5>€会シ<E4BC9A>

*謗ィ闕先婿譯茨シ哢ougat<EFBFBD>井クサ<EFBFBD>?+ PyMuPDF<44>磯剄郤ァ<E983A4><EFBDA7> 箝?

Nougat 譬ク蠢<EFBDB8>シ伜漢<E4BC9C>亥現蟄ヲ譁<EFBDA6>鍵蝨コ譎ッ<E8AD8E><EFBDAF>

笨?荳謎クコ遘大ュヲ譁<EFBDA6>鍵隶セ隶。
   笏懌楳 隶ュ扈<EFBDAD>焚謐ョ<E8AC90>啾rXiv 隶コ譁<EFBDBA> + 遘大ュヲ譛溷<E8AD9B>
   笏懌楳 蜈ャ蠑剰ッ<E589B0><EFBFBD>哭aTeX 譬シ蠑剰セ灘<EFBDBE>
   笏懌楳 陦ィ譬シ菫晉蕗<E69989>哺arkdown 陦ィ譬シ譬シ蠑<EFBDBC>
   笏披楳 扈捺桷蛹冶セ灘<EFBDBE><E78198>夂ォ<E5A482>闃ゅ€∵ョオ關ス貂<EFBDBD><E8B282>?

笨?霎灘<E99C8E>譬シ蠑擾シ哺arkdown
   笏懌楳 譬<>「伜アらコァ<EFBDBA>? ## ###
   笏懌楳 陦ィ譬シ<E8ADAC>嘶 Header | Data |
   笏懌楳 蜈ャ蠑擾シ?$ formula $$
   笏披楳 蠑慕畑<E68595>喙1] [2] [3]

笨?雍ィ驥剰ッ<E589B0>シー譛コ蛻カ
   笏懌楳 閾ェ蜉ィ雍ィ驥剰ッ<E589B0><EFBDAF><EFBFBD>?-1<>?
   笏懌楳 菴手エィ驥剰<E9A9A5>蜉ィ髯咲コ?PyMuPDF
   笏披楳 菫晁ッ∵署蜿匁<E89CBF>蜉溽<E89C89>?

螳樒鴫扈<EFBFBD>

*譛榊苅譫カ譫<EFBFBD>シ?

Node.js Backend (Port 3001)
    笏?
    笏懌楳 隹<>畑 ExtractionClient.ts
    笏?  笏披楳 HTTP 隸キ豎<EFBDB7> 竊?Python 蠕ョ譛榊<E8AD9B>?
    笏?
Python Extraction Service (Port 8000)
    笏?
    笏懌楳 /api/extract/pdf
    笏?  笏懌楳 detect_language()
    笏?  笏懌楳 extract_pdf_nougat() 竊?Nougat Model
    笏?  笏披楳 extract_pdf_pymupdf() 竊?PyMuPDF
    笏?
    笏披楳 /api/health
        笏披楳 譽€譟?Nougat 蜿ッ逕ィ諤?

*Node.js 隹<>畑莉」遐<EFBDA3>シ?

import { extractionClient } from '@common/document/ExtractionClient';

async function extractLiteraturePDF(file: Buffer, filename: string) {
  try {
    // 譁ケ豕<EFBDB9> 1: 閾ェ蜉ィ騾画叫<E794BB>域耳闕撰シ<E692B0>
    const result = await extractionClient.extractPdf(
      file, 
      filename, 
      'auto'
    );
    
    // 譁ケ豕<EFBDB9> 2: 蠑コ蛻カ菴ソ逕ィ Nougat
    // const result = await extractionClient.extractPdf(file, filename, 'nougat');
    
    return {
      text: result.text,
      method: result.method,  // "nougat" | "pymupdf"
      quality: result.metadata.quality_score,
      pageCount: result.metadata.page_count,
      hasTables: result.metadata.has_tables,
      hasFormulas: result.metadata.has_formulas
    };
  } catch (error) {
    console.error('PDF extraction failed:', error);
    throw error;
  }
}

*Python 謠仙叙莉」遐<EFBDA3>シ?

# extraction_service/services/nougat_extractor.py

def extract_pdf_nougat(file_path: str) -> Dict[str, Any]:
    """
    菴ソ逕ィ Nougat 謠仙叙 PDF 譁<>    
    蜻ス莉、陦瑚ー<E7919A><EFBFBD><E79591>
    nougat <pdf_path> -o <output_dir> --markdown --no-skipping
    """
    cmd = [
        'nougat',
        file_path,
        '-o', output_dir,
        '--markdown',      # 霎灘<E99C8E> Markdown 譬シ蠑<EFBDBC>
        '--no-skipping'    # 荳崎キウ霑<EFBDB3>ササ菴暮。オ髱?
    ]
    
    # 謇ァ陦<EFBDA7> Nougat<61>郁カ<E98381><EFBDB6>?5 蛻<><EFBFBD>?
    process = subprocess.Popen(cmd, ...)
    stdout, stderr = process.communicate(timeout=300)
    
    # 隸サ蜿冶セ灘<EFBDBE><EFBFBD>サカ<EFBDBB>?mmd<6D>?
    markdown_text = read_output_file()
    
    # 雍ィ驥剰ッ<E589B0>シー
    quality_score = evaluate_nougat_quality(markdown_text)
    
    return {
        "success": True,
        "method": "nougat",
        "text": markdown_text,
        "format": "markdown",
        "metadata": {
            "quality_score": quality_score,
            "has_tables": detect_tables(markdown_text),
            "has_formulas": detect_formulas(markdown_text)
        }
    }

2.3 譁<>悽蜷主、<E4B8BB><EFBDA4>?

*Nougat 霎灘<E99C8E>莨伜喧<E4BC9C>?

function postProcessNougatOutput(markdown: string): ProcessedText {
  return {
    // 蜴溷ァ<E6BAB7> Markdown
    raw: markdown,
    
    // 遶<>闃ょ<E99783>蜑イ
    sections: extractSections(markdown),  // {abstract, methods, results, ...}
    
    // 陦ィ譬シ謠仙叙
    tables: extractTables(markdown),
    
    // 蜈ャ蠑乗署蜿<E7BDB2>
    formulas: extractFormulas(markdown),
    
    // 郤ッ譁<EFBDAF><EFBFBD>亥悉髯、譬シ蠑擾シ?
    plainText: markdownToPlainText(markdown),
    
    // 扈捺桷蛹匁焚謐ョ<E8AC90>育畑莠<E79591> LLM<4C>?
    structured: {
      title: extractTitle(markdown),
      abstract: extractAbstract(markdown),
      methodology: extractMethodology(markdown),
      results: extractResults(markdown),
    }
  };
}

<EFBFBD>東 蝨コ譎ッ 4: 譁<>鍵荳玖スス<EFBDBD><EFBDBD>npaywall API<50>俄ュ<E4BF84>

3.1 謚€譛ッ閭梧<E996AD>?

Unpaywall 譏ッ荳€荳ェ蜈崎エケ逧<EFBDB9>€謾セ闔キ蜿厄シ<E58E84>pen Access<73>画枚迪?API<50>悟庄莉・<E88E89><EFBDA5>

  • 笨?騾夊ソ<E5A48A> DOI 譟・隸「譁<EFBDA2>鍵譏ッ蜷ヲ譛牙<E8AD9B>雍ケ蜈ィ譁?
  • 笨?闔キ蜿門粋豕慕<E8B195>?PDF 荳玖スス體セ謗・
  • 笨?螳悟<E89EB3>蜈崎エケ<EFBDB4>梧裏髴€莉倩エケ
  • 笨?謨ー謐ョ蠎楢ヲ<E6A5A2><EFBDA6>?3000+ 荳<><EFBFBD>枚迪ョ

螳倡ス<EFBFBD>: https://unpaywall.org/products/api

3.2 謚€譛ッ騾牙梛

API 隹<>畑譁ケ蠑<EFBDB9>

*蝓コ遑€菫。諱ッ<EFBFBD>?

  • API 遶ッ轤ケ: https://api.unpaywall.org/v2/{doi}?email={your_email}
  • 隸キ豎よ婿豕<EFBFBD>: GET
  • 隶、隸∵婿蠑<EFBFBD>: 譌<>€ API Key<65>御サ<E5BEA1>怙謠蝉セ幃ぐ邂ア
  • 騾溽紫髯仙宛: 100,000 谺?螟ゥ<E89E9F><EFBFBD>雍ケ<E99B8D>?

*遉コ萓玖ッキ豎ゑシ?

curl "https://api.unpaywall.org/v2/10.1038/nature12373?email=YOUR_EMAIL"

*蜩榊コ皮、コ萓具シ?

{
  "doi": "10.1038/nature12373",
  "title": "The genome of the woodland strawberry",
  "is_oa": true,
  "oa_status": "gold",
  "best_oa_location": {
    "url": "https://www.nature.com/articles/nature12373.pdf",
    "url_for_pdf": "https://www.nature.com/articles/nature12373.pdf",
    "url_for_landing_page": "https://www.nature.com/articles/nature12373",
    "license": "cc-by",
    "version": "publishedVersion"
  },
  "oa_locations": [...]
}

Node.js 螳樒鴫

*譛榊苅蟆∬」<EFBFBD>シ?

// backend/src/common/literature/UnpaywallClient.ts

import axios from 'axios';
import { config } from '../../config/env';

export interface UnpaywallResult {
  doi: string;
  title: string;
  isOA: boolean;              // 譏ッ蜷ヲ蠑€謾セ闔キ蜿?
  oaStatus: string;           // "gold" | "green" | "hybrid" | "bronze" | "closed"
  pdfUrl: string | null;      // PDF 荳玖スス體セ謗・
  landingPageUrl: string;     // 譁<>鍵鬘オ髱「體セ謗・
  license: string | null;     // 隶ク蜿ッ蜊剰ョョ
  version: string | null;     // "publishedVersion" | "acceptedVersion"
}

class UnpaywallClient {
  private baseUrl = 'https://api.unpaywall.org/v2';
  private email: string;

  constructor(email: string = config.unpaywallEmail) {
    this.email = email;
  }

  /**
   * 騾夊ソ<E5A48A> DOI 譟・隸「譁<EFBDA2>鍵菫。諱ッ
   */
  async getByDoi(doi: string): Promise<UnpaywallResult> {
    try {
      const url = `${this.baseUrl}/${doi}?email=${this.email}`;
      const response = await axios.get(url, {
        timeout: 10000,  // 10 遘定カ<E5AE9A><EFBDB6>?
      });

      const data = response.data;

      // 闔キ蜿匁怙菴ウ荳玖スス菴咲ス?
      const bestOA = data.best_oa_location;

      return {
        doi: data.doi,
        title: data.title,
        isOA: data.is_oa,
        oaStatus: data.oa_status,
        pdfUrl: bestOA?.url_for_pdf || null,
        landingPageUrl: bestOA?.url_for_landing_page || data.doi_url,
        license: bestOA?.license || null,
        version: bestOA?.version || null,
      };
    } catch (error) {
      if (axios.isAxiosError(error)) {
        if (error.response?.status === 404) {
          throw new Error(`DOI not found: ${doi}`);
        }
      }
      throw new Error(`Unpaywall API error: ${error.message}`);
    }
  }

  /**
   * 謇ケ驥乗衍隸「<E99AB8>亥クヲ騾溽紫髯仙宛<E4BB99>?
   */
  async getBatch(dois: string[]): Promise<UnpaywallResult[]> {
    const results = [];
    
    for (const doi of dois) {
      try {
        const result = await this.getByDoi(doi);
        results.push(result);
        
        // 騾溽紫髯仙宛<E4BB99>?00ms/隸キ豎<EFBDB7>
        await new Promise(resolve => setTimeout(resolve, 100));
      } catch (error) {
        console.error(`Failed to fetch ${doi}:`, error.message);
        results.push(null);  // 螟ア雍・鬘ケ譬<EFBDB9>ョー荳コ null
      }
    }
    
    return results.filter(r => r !== null);
  }

  /**
   * 荳玖スス PDF 譁<>サカ
   */
  async downloadPdf(pdfUrl: string, outputPath: string): Promise<void> {
    try {
      const response = await axios.get(pdfUrl, {
        responseType: 'arraybuffer',
        timeout: 60000,  // 1 蛻<>帖雜<E5B896>      });

      const fs = require('fs');
      fs.writeFileSync(outputPath, response.data);
    } catch (error) {
      throw new Error(`PDF download failed: ${error.message}`);
    }
  }
}

export const unpaywallClient = new UnpaywallClient();

*邇ッ蠅<EFBFBD>序驥城<EFBFBD>鄂ョ<EFBFBD>?

# .env
UNPAYWALL_EMAIL=your-email@example.com

荳壼苅髮<EFBFBD><EFBFBD>

蝨コ譎ッ 1<>壽音驥乗」€譟・譁<EFBDA5>鍵譏ッ蜷ヲ蜿ッ荳玖スス

async function checkLiteratureAvailability(literatures: Literature[]) {
  const dois = literatures
    .map(lit => lit.doi)
    .filter(doi => doi);  // 霑<>サ、遨?DOI

  const results = await unpaywallClient.getBatch(dois);

  return literatures.map(lit => ({
    ...lit,
    downloadable: results.find(r => r.doi === lit.doi)?.isOA || false,
    pdfUrl: results.find(r => r.doi === lit.doi)?.pdfUrl || null,
  }));
}

*蝨コ譎ッ 2<>夂畑謌キ轤ケ蜃サ荳玖スス蜈ィ譁?

async function downloadLiteratureFullText(doi: string) {
  // Step 1: 譟・隸「 Unpaywall
  const unpaywallResult = await unpaywallClient.getByDoi(doi);

  if (!unpaywallResult.pdfUrl) {
    throw new Error('隸・譁<EFBDA5>鍵譌<E98DB5>蜈崎エケ蜈ィ譁<EFBDA8>');
  }

  // Step 2: 荳玖スス PDF
  const filename = `${doi.replace(/\//g, '_')}.pdf`;
  const outputPath = `./downloads/${filename}`;
  
  await unpaywallClient.downloadPdf(unpaywallResult.pdfUrl, outputPath);

  // Step 3: 謠仙叙譁<E58F99><EFBFBD>郁ー<E98381><EFBDB0>?extraction_service<63>?
  const extractionResult = await extractionClient.extractPdf(
    fs.readFileSync(outputPath),
    filename,
    'auto'
  );

  return {
    pdfPath: outputPath,
    text: extractionResult.text,
    method: extractionResult.method,
  };
}

3.3 蜑咲ォッ髮<EFBDAF><E9ABAE>

*謇ケ驥丈ク玖スス謖蛾聴<EFBFBD>?

// 謇ケ驥乗」€譟・蜿ッ荳玖スス諤?
async function checkDownloadable(selectedRows: Literature[]) {
  setLoading(true);
  
  const results = await api.checkLiteratureAvailability(selectedRows);
  
  const downloadableCount = results.filter(r => r.downloadable).length;
  
  message.success(`蜿醍鴫 ${downloadableCount}<>庄荳玖スス蜈ィ譁㌔);
  setLiteratures(results);
  setLoading(false);
}

// 荳玖スス蜈ィ譁<EFBDA8>
async function downloadFullText(literature: Literature) {
  if (!literature.downloadable) {
    message.warning('隸・譁<EFBDA5>鍵譌<E98DB5>蜈崎エケ蜈ィ譁<EFBDA8>');
    return;
  }

  try {
    const result = await api.downloadLiteratureFullText(literature.doi);
    message.success('荳玖スス謌仙粥');
    
    // 謇灘シ€ PDF 譟・逵句<E980B5>?
    openPdfViewer(result.pdfPath);
  } catch (error) {
    message.error(`荳玖スス螟ア雍・: ${error.message}`);
  }
}

<EFBFBD>剥 陦・蜈<EFBDA5>橿譛ッ轤ケ

4.1 謔ィ謠仙芦逧<E88AA6>橿譛ッ轤ケ諤サ扈<EFBDBB>

謚€譛ッ轤ケ 迥カ諤? 隸エ譏<EFBFBD>
笨?Nougat 讓。蝙<EFBDA1> 蟾イ螳樒<EFBFBD>? extraction_service/services/nougat_extractor.py
笨?PyMuPDF 蟾イ螳樒<EFBFBD>? extraction_service/services/pdf_extractor.py
笨?鬘コ蠎城剄郤ァ遲也払 蟾イ螳樒<EFBFBD>? 闍ア譁<EFBFBD><EFBFBD>Nougat<EFBFBD>御クュ譁<EFBFBD><EFBFBD>PyMuPDF
<EFBFBD><EFBFBD> Unpaywall API €譁ー蠅<EFBFBD> 譛ャ譁<EFBFBD>。」謠蝉セ帛ョ樒鴫譁ケ譯?
笨?Excel 隗」譫<EFBDA3> €譁ー蠅<EFBFBD> 菴ソ逕ィ xlsx 蠎難シ亥燕遶ッ<E981B6>?

4.2 蜿ッ閭ス驕玲シ冗噪謚€譛ッ轤ケ 箝?

<EFBFBD>?<3F>芽。ィ譬シ謠仙叙蠅槫シ?

髣ョ鬚<EFBFBD><EFBFBD>哢ougat 陌ス辟カ菫晉蕗陦ィ譬シ扈捺桷<E68DBA>御ス<E5BEA1> LLM 逶エ謗・螟<EFBDA5>炊 Markdown 陦ィ譬シ蜿ッ閭ス荳榊㊥遑ョ縲?

隗」蜀ウ譁ケ譯茨シ啜able Transformer

# 菴ソ逕ィ蠕ョ霓ッ逧?Table Transformer 讓。蝙<EFBDA1>
# https://github.com/microsoft/table-transformer

from transformers import TableTransformerForObjectDetection
import torch

def extract_tables_enhanced(pdf_path: str):
    """
    菴ソ逕ィ Table Transformer 邊セ遑ョ螳壻ス崎。ィ譬シ
    """
    model = TableTransformerForObjectDetection.from_pretrained(
        "microsoft/table-transformer-detection"
    )
    
    # 譽€豬玖。ィ譬シ菴咲ス?
    tables = model.detect_tables(pdf_path)
    
    # 謠仙叙豈丈クェ陦ィ譬シ
    for table in tables:
        table_image = crop_table(pdf_path, table.bbox)
        table_data = ocr_table(table_image)
        
    return structured_tables

莨伜<EFBFBD>郤ァ<EFBFBD>啖2.0<EFBFBD><EFBFBD>VP 髦カ谿オ Nougat 雜ウ螟滂シ?

<EFBFBD>?<3F>牙シ慕畑隗」譫蝉ク朱得謗・

髣ョ鬚<EFBFBD><EFBFBD>夂ァ大ュヲ譁<EFBFBD>鍵蛹<EFBFBD>性螟ァ驥丞シ慕<EFBFBD>?[1] [2] [3]<EFBFBD>碁怙隕∬ァ」譫仙ケカ體セ謗・蛻ー蜿り€<EFBFBD>枚迪ョ縲?

隗」蜀ウ譁ケ譯茨シ哦ROBID

# GROBID: 蠑€貅千ァ大ュヲ譁<EFBDA6>鍵隗」譫仙キ・蜈?
# https://github.com/kermitt2/grobid

import requests

def parse_references(pdf_path: str):
    """
    菴ソ逕ィ GROBID 隗」譫仙盾閠<E79BBE>枚迪?
    """
    with open(pdf_path, 'rb') as f:
        files = {'input': f}
        response = requests.post(
            'http://localhost:8070/api/processFulltextDocument',
            files=files
        )
    
    # 霑泌屓扈捺桷蛹也噪蠑慕畑蛻苓。ィ
    return response.json()['references']

莨伜<EFBFBD>郤ァ<EFBFBD>啖2.0<EFBFBD>磯撼譬ク蠢<EFBFBD>粥閭ス<EFBFBD>?

<EFBFBD>?<3F><EFBFBD>蠑剰ッ<E589B0>悪荳取クイ譟<EFBDB2>

髣ョ鬚<EFBFBD><EFBFBD>哢ougat 霎灘<E99C8E> LaTeX 蜈ャ蠑擾シ悟燕遶ッ髴€隕∵クイ譟薙€?

隗」蜀ウ譁ケ譯茨シ哮aTeX / MathJax

// 蜑咲ォッ貂イ譟<EFBDB2> LaTeX 蜈ャ蠑<EFBDAC>
import katex from 'katex';
import 'katex/dist/katex.min.css';

function renderFormula(latex: string) {
  return katex.renderToString(latex, {
    throwOnError: false,
    displayMode: true,
  });
}

莨伜<EFBFBD>郤ァ<EFBFBD>哺VP<EFBFBD>域署蜊<EFBFBD>畑謌キ菴馴ェ鯉シ<EFBFBD>

<EFBFBD>?<3F>臼DF 鬚<>ァ井ク取<EFBDB8><E58F96>ウ?

髣ョ鬚<EFBFBD><EFBFBD>壻ココ蟾・螟肴<EFBFBD>ク譌カ髴€隕∵衍逵句次譁<EFBFBD>シ悟ケカ鬮倅コョ譬<EFBFBD>ウィ縲?

隗」蜀ウ譁ケ譯茨シ啀DF.js + Annotator.js

// React 扈<>サカ
import { Viewer } from '@react-pdf-viewer/core';
import '@react-pdf-viewer/core/lib/styles/index.css';

function PdfViewer({ pdfUrl, annotations }) {
  return (
    <Viewer
      fileUrl={pdfUrl}
      plugins={[
        highlightPlugin({
          highlights: annotations  // 鬮倅コョ菴咲スョ
        })
      ]}
    />
  );
}

莨伜<EFBFBD>郤ァ<EFBFBD>哺VP<EFBFBD><EFBFBD>ク蠢<EFBFBD>粥閭ス<EFBFBD><EFBFBD>

<EFBFBD>?<3F>画枚迪ョ蜴サ驥?

髣ョ鬚<EFBFBD><EFBFBD>哘xcel 荳贋シ<E8B48B>蜿ッ閭ス蛹<EFBDBD>性驥榊、肴枚迪ョ<E8BFAA>亥酔荳€<C280>枚迪ョ荳榊酔迚域悽<E59F9F>€?

隗」蜀ウ譁ケ譯茨シ壼渕莠?DOI 蜥梧<E89CA5><E6A2A7>「倡噪蜴サ驥<EFBDBB>

function deduplicateLiteratures(literatures: Literature[]) {
  const seen = new Set();
  
  return literatures.filter(lit => {
    // 莨伜<E88EA8>菴ソ逕ィ DOI
    if (lit.doi) {
      if (seen.has(lit.doi)) return false;
      seen.add(lit.doi);
      return true;
    }
    
    // 蜷ヲ蛻吩スソ逕ィ譬<EFBDA8>「假シ域<EFBDBC><E59F9F>㊥蛹門錘<E99680><E98C98>
    const normalizedTitle = normalizeTitle(lit.title);
    if (seen.has(normalizedTitle)) return false;
    seen.add(normalizedTitle);
    return true;
  });
}

function normalizeTitle(title: string): string {
  return title
    .toLowerCase()
    .replace(/[^\w\s]/g, '')  // 蜴サ髯、譬<EFBDA4>    .replace(/\s+/g, ' ')      // 隗<>激蛹也ゥコ譬?
    .trim();
}

莨伜<EFBFBD>郤ァ<EFBFBD>哺VP<EFBFBD>亥ソ<EFBFBD>。サ蜉溯<EFBFBD><EFBFBD><EFBFBD>

<EFBFBD>?<3F>画枚迪ョ蜈<EFBDAE>焚謐ョ陦・蜈ィ

髣ョ鬚<EFBFBD><EFBFBD>哘xcel 荳贋シ<E8B48B><EFBFBD>焚謐ョ蜿ッ閭ス荳榊ョ梧紛<E6A2A7>育シコ DOI縲∝ケエ莉ス遲会シ峨€?

隗」蜀ウ譁ケ譯茨シ咾rossref API

// 騾夊ソ<E5A48A><EFBDBF><EFBFBD>「俶衍隸「 DOI
async function enrichMetadata(literature: Literature) {
  if (literature.doi) return literature;  // 蟾イ譛<EFBDB2> DOI

  // 隹<>畑 Crossref API
  const response = await axios.get(
    `https://api.crossref.org/works?query.title=${literature.title}`
  );

  const match = response.data.message.items[0];
  
  return {
    ...literature,
    doi: match.DOI,
    year: match['published-print']?.['date-parts'][0][0],
    journal: match['container-title'][0],
  };
}

莨伜<EFBFBD>郤ァ<EFBFBD>啖1.0<EFBFBD>亥「槫シコ蜉溯<EFBFBD><EFBFBD><EFBFBD>

<EFBFBD>?<3F>画音螟<E99FB3>炊霑帛コヲ謖∽ケ<E288BD><EFBDB9>?

髣ョ鬚<EFBFBD><EFBFBD>壽音驥冗ュ幃€€玲慮髟ソ<EFBFBD><EFBFBD>1000 遽?> 10 蛻<><EFBFBD>会シ碁怙謾ッ謖∵妙轤ケ扈ュ莨<EFBDAD>縲?

隗」蜀ウ譁ケ譯茨シ啌edis + 莉サ蜉。髦溷<E9ABA6>

// 菴ソ逕ィ Bull 髦溷<E9ABA6>
import Queue from 'bull';

const screeningQueue = new Queue('literature-screening', {
  redis: { host: 'localhost', port: 6379 }
});

// 豺サ蜉<EFBDBB>莉サ蜉。
screeningQueue.add({
  projectId: 'xxx',
  literatures: [...],
  protocol: {...}
});

// 螟<>炊莉サ蜉。
screeningQueue.process(async (job) => {
  const { projectId, literatures, protocol } = job.data;
  
  for (let i = 0; i < literatures.length; i++) {
    // 螟<>炊蜊慕ッ<E68595>枚迪ョ
    await screenLiterature(literatures[i], protocol);
    
    // 譖エ譁ー霑帛コヲ
    job.progress((i + 1) / literatures.length * 100);
  }
});

莨伜<EFBFBD>郤ァ<EFBFBD>啖1.0<EFBFBD>井ス馴ェ御シ伜喧<EFBFBD><EFBFBD>

<EFBFBD>?<3F>蛾漠隸ッ螟<EFBDAF>炊荳朱㍾隸<E38DBE>

髣ョ鬚<EFBFBD><EFBFBD>哭LM 隹<>畑蜿ッ閭ス螟ア雍・<E99B8D>育ス醍サ懊€∬カ<E288AC>慮縲<E685AE>剞豬<E5899E>シ峨€?

*隗」蜀ウ譁ケ譯茨シ壽欠謨ー騾€驕ソ驥崎ッ?

async function retryWithBackoff<T>(
  fn: () => Promise<T>,
  maxRetries: number = 3
): Promise<T> {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      
      // 謖<>焚騾€驕ソ<E9A995><EFBDBF>1s, 2s, 4s
      const delay = Math.pow(2, i) * 1000;
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}

莨伜<EFBFBD>郤ァ<EFBFBD>哺VP<EFBFBD>亥ソ<EFBFBD>。サ蜉溯<EFBFBD><EFBFBD><EFBFBD>


<EFBFBD>投 謚€譛ッ騾牙梛諤サ扈<EFBDBB>

MVP 髦カ谿オ蠢<EFBDB5>€画橿譛?

螻らコァ 謚€譛? 逕ィ騾?
蜑咲ォッ xlsx Excel 隗」譫<EFBDA3>
蜑咲ォッ PDF.js PDF 鬚<><EFBFBD>
蜑咲ォッ KaTeX 蜈ャ蠑乗クイ譟<EFBFBD>
蜷守ォッ ExtractionClient <EFBFBD>畑 Python 蠕ョ譛榊<E8AD9B>?
蜷守ォッ UnpaywallClient <EFBFBD>鍵荳玖スス
Python Nougat 闍ア譁<EFBFBD> PDF 謠仙叙
Python PyMuPDF 蠢ォ騾?PDF 謠仙叙
*謨ー謐ョ蠎? asl_schema 謨ー謐ョ蟄伜お

V1.0 蠅槫シコ謚€譛?

謚€譛? 逕ィ騾?
Crossref API <EFBFBD>焚謐ョ陦・蜈?
Bull Queue 莉サ蜉。髦溷<EFBFBD>
Redis 霑帛コヲ謖∽ケ<EFBFBD><EFBFBD>?

V2.0 鬮倡コァ謚€譛?

謚€譛? 逕ィ騾?
Table Transformer 陦ィ譬シ邊セ遑ョ謠仙叙
GROBID 蠑慕畑隗」譫<EFBFBD>
Semantic Scholar API 蟄ヲ譛ッ蝗セ隹ア

<EFBFBD>刀 豬玖ッ墓焚謐ョ蟄俶叛蟒コ隶ョ

譬ケ謐ョ ASL 讓。蝮礼噪譁<E599AA>サカ螟ケ扈捺桷<E68DBA>梧オ玖ッ墓焚謐ョ蠎碑ッ・謾セ蝨ィ<E89DA8><EFBDA8>

AIclinicalresearch/docs/03-荳壼苅讓。蝮<EFBDA1>/ASL-AI譎コ閭ス譁<EFBDBD>鍵/
笏披楳笏€ 05-豬玖ッ墓枚譯」/
    笏懌楳笏€ 01-豬玖ッ戊ョ。蛻<EFBDA1>.md
    笏懌楳笏€ 02-譬<>「俶遭隕∝<E99A95>遲帶オ玖ッ慕畑萓<E79591>.md
    笏披楳笏€ 03-豬玖ッ墓焚謐ョ/  竊?譁ー蟒コ譁<EFBDBA>サカ螟?
        笏懌楳笏€ README.md  竊?隸エ譏取枚譯」
        笏懌楳笏€ screening-test-data/
        笏?  笏懌楳笏€ literature-list-199.xlsx  竊?199 遽<>枚迪ョ蛻苓。?
        笏?  笏懌楳笏€ picos-criteria.txt        竊?PICOS 譬<>㊥
        笏?  笏披楳笏€ expected-results.json     竊?鬚<>悄扈捺棡<E68DBA>磯≡譬<E289A1><EFBFBD>?
        笏懌楳笏€ pdf-samples/
        笏?  笏懌楳笏€ sample-rct-01.pdf
        笏?  笏懌楳笏€ sample-cohort-01.pdf
        笏?  笏披楳笏€ README.md
        笏披楳笏€ extraction-test-data/
            笏披楳笏€ README.md

*謗ィ闕千サ捺桷<EFBFBD>?

05-豬玖ッ墓枚譯」/
笏懌楳笏€ 01-豬玖ッ戊ョ。蛻<EFBDA1>.md
笏懌楳笏€ 02-譬<>「俶遭隕∝<E99A95>遲帶オ玖ッ慕畑萓<E79591>.md
笏披楳笏€ 03-豬玖ッ墓焚謐ョ/
    笏懌楳笏€ README.md  竊?驥崎ヲ<E5B48E>シ∬ッエ譏取オ玖ッ墓焚謐ョ譚・貅舌€∫沿譚<E6B2BF>€∽スソ逕ィ譁ケ豕?
    笏懌楳笏€ screening/
    笏?  笏懌楳笏€ literature-list-199.xlsx
    笏?  笏懌楳笏€ picos-criteria.txt
    笏?  笏懌楳笏€ inclusion-criteria.txt
    笏?  笏懌楳笏€ exclusion-criteria.txt
    笏?  笏披楳笏€ gold-standard.json  竊?莠コ蟾・譬<EFBDA5>ウィ逧<EFBDA8>ュ」遑ョ遲疲。?
    笏披楳笏€ pdf-extraction/
        笏懌楳笏€ sample-01-high-quality.pdf
        笏懌楳笏€ sample-02-with-tables.pdf
        笏披楳笏€ sample-03-chinese.pdf

*README.md 遉コ萓具シ?

# ASL 豬玖ッ墓焚謐ョ髮?

## <20>搭 謨ー謐ョ隸エ譏<EFBDB4>

### 1. 譬<>「俶遭隕∝<E99A95>遲帶オ玖ッ墓焚謐ョ
- **譁<>サカ**: `literature-list-199.xlsx`
- **謨ー驥<EFBDB0>**: 199 遽<>恭譁<E681AD>現蟄ヲ譁<EFBDA6><E8AD81>?
- **蟄玲ョオ**: 譬<>「倥€∵遭隕√€OI縲∽ス懆€<E68786>€∝ケエ莉ス縲∵悄蛻?
- **譚・貅<EFBDA5>**: [謠剰ソー謨ー謐ョ譚・貅疹
- **迚域揀**: [隸エ譏守沿譚<E6B2BF>ソ。諱ッ]

### 2. PICOS 譬<>- **譁<>サカ**: `picos-criteria.txt`
- **蜀<>ョケ**: Population, Intervention, Comparison, Outcome, Study Design
- **郤ウ蜈・譬<EFBDA5>㊥**: 5 譚?
- **謗帝勁譬<E58B81>㊥**: 8 譚?

### 3. 驥第<E9A9A5><E7ACAC><EFBFBD>井ココ蟾・譬<EFBDA5>ウィ扈捺棡<E68DBA>?
- **譁<>サカ**: `gold-standard.json`
- **譬<>ウィ莠?*: [譬<>ウィ荳灘ョカ菫。諱ッ]
- **譬<>ウィ譌カ髣エ**: [譌カ髣エ]
- **鬚<>悄蜃<E68284>。ョ邇?*: 竕?90%

## <20>識 菴ソ逕ィ譁ケ豕<EFBDB9>

### 霑占。梧オ玖ッ<E78E96>
```bash
npm run test:asl:screening

<EFBFBD>シー蜃<EFBFBD>。ョ邇?

npm run test:asl:evaluate -- --gold-standard gold-standard.json

<EFBFBD>投 鬚<>悄扈捺棡

  • 郤ウ蜈・: 45 遽?
  • 謗帝勁: 132 遽?
  • 荳咲。ョ螳? 22 遽?

---

## <20>答 逶ク蜈ウ譁<EFBDB3>。」

- [雍ィ驥丈ソ晞囿荳主庄霑ス貅ッ遲也払](./06-雍ィ驥丈ソ晞囿荳主庄霑ス貅ッ遲也払.md)
- [謨ー謐ョ蠎楢ョセ隶。](./01-謨ー謐ョ蠎楢ョセ隶?md)
- [API 隶セ隶。隗<EFBDA1>激](./02-API隶セ隶。隗<EFBDA1>激.md)
- [譁<>。」謠仙叙蠕ョ譛榊苅](../../../../extraction_service/README.md)

---

**譖エ譁ー譌・蠢<EFBDA5>**<2A>?
- 2025-11-15: 蛻帛サコ譁<EFBDBA>。」<EFBDA1>悟ョ壻ケ牙<EFBDB9>遲帙€<C280><EFBFBD><EFBFBD>炊縲∵枚迪ョ荳玖スス謚€譛ッ騾牙梛