Files
AIclinicalresearch/docs/05-部署文档/04-Python微服务-SAE容器部署指南.md
HaHafeng 1b53ab9d52 feat(aia): Complete AIA V2.0 with universal streaming capabilities
Major Changes:
- Add StreamingService with OpenAI Compatible format
- Upgrade Chat component V2 with Ant Design X integration
- Implement AIA module with 12 intelligent agents
- Update API routes to unified /api/v1 prefix
- Update system documentation

Backend (~1300 lines):
- common/streaming: OpenAI Compatible adapter
- modules/aia: 12 agents, conversation service, streaming integration
- Update route versions (RVW, PKB to v1)

Frontend (~3500 lines):
- modules/aia: AgentHub + ChatWorkspace (100% prototype restoration)
- shared/Chat: AIStreamChat, ThinkingBlock, useAIStream Hook
- Update API endpoints to v1

Documentation:
- AIA module status guide
- Universal capabilities catalog
- System overview updates
- All module documentation sync

Tested: Stream response verified, authentication working
Status: AIA V2.0 core completed (85%)
2026-01-14 19:15:01 +08:00

37 KiB
Raw Blame History

Python 敺格<E695BA><E6A0BC>?SAE 摰孵膥<E5ADB5>函蔡摰<E894A1><E691B0><EFBFBD><EFBFBD><EFBFBD>

<EFBFBD><EFBFBD><EFBFBD><EFBFBD>𧋦: v1.1 (靽桀<E99DBD><E6A180><EFBFBD><EFBFBD><EFBFBD><EFBFBD><E595A3>䔶葩<E494B6><EFBFBD>隞園䔮憸?
<EFBFBD>𥕦遣<EFBFBD>園𡢿: 2025-12-13
**<2A><><EFBFBD>𦒘耨霈?: 2025-12-13
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>: AIclinicalresearch 撟喳蝱 - Python 敺格<E695BA><E6A0BC><EFBFBD>extraction_service嚗? **<2A><EFBFBD>霂餉<E99C82>?
: 餈鞟輕撌亦<E6928C><EFBFBD><E692A3><EFBFBD><EFBFBD>蝡臬<E89DA1><E887AC>穃極蝔见<E89D94>

v1.1 <20>湔鰵<E6B994><EFBFBD>:

  • <EFBFBD>?靽桀<E99DBD>嚗𡁜<E59A97>蝵穃𧑐<E7A983><F0A79190>雿輻鍂 SAE <20><EFBFBD><E689B9>唳遬蝷箇<E89DB7><E7AE87><EFBFBD> IP嚗<50><E59A97><EFBFBD>𨀣<EFBFBD><F0A880A3><EFBFBD>嚗?- <20>?隡睃<E99AA1>嚗鋽ockerfile 蝟餌<E89D9F>靘肽<E99D98>霂湔<E99C82>嚗ēibmupdf-dev <20><EFBFBD><EFBFBD>
  • <EFBFBD>?<3F><EFBFBD>嚗𡁶靽?/tmp <20><EFBFBD><E6A180><EFBFBD><EFBFBD><EFBFBD><E4B98B>辣銝湔𧒄摮睃<E691AE>嚗?- <20>?摰<><E691B0>嚗𡁜<E59A97><F0A1819C><EFBFBD><EFBFBD><E99C82>蝔见<E89D94><E8A781>烐綉<E78390><E7B689><EFBFBD>

<EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD>

  1. 銝箔<EFBFBD><EFBFBD><EFBFBD>㗇𥋘 SAE 摰孵膥<E5ADB5>函蔡
  2. [<5B>函蔡<E587BD><EFBFBD><E59786>霄(#<23>函蔡<E587BD><EFBFBD><E59786>?
  3. <EFBFBD>滨蔭<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
  4. Python <20>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>
  5. 靘肽<EFBFBD>隡睃<EFBFBD>蝑𣇉裦
  6. <EFBFBD><EFBFBD>遣 Docker <20>𨅯<EFBFBD>
  7. <EFBFBD>函蔡<EFBFBD>?SAE
  8. [瘚贝<E7989A>銝𡡞<E98A9D><EFBFBD>(#瘚贝<E7989A>銝𡡞<E98A9D>霂?
  9. [<5B>烐綉銝𡒊輕<F0A1928A>也(#<23>烐綉銝𡒊輕<F0A1928A>?
  10. <EFBFBD><EFBFBD><EFBFBD><EFBFBD>埝䰻
  11. [瘜冽<E7989C>鈭钅★銝𡒊<E98A9D>敹䀉(#瘜冽<E7989C>鈭钅★銝𡒊<E98A9D>敹?

銝箔<EFBFBD><EFBFBD><EFBFBD>㗇𥋘 SAE 摰孵膥<E5ADB5>函蔡

<EFBFBD>?SAE 摰孵膥<E5ADB5>函蔡 vs. SAE Python 餈鞱<E9A488><E99EB1>?

撖寞<EFBFBD>蝏游漲 SAE Python 餈鞱<E9A488><E99EB1>? SAE 摰孵膥<E5ADB5>函蔡 (<28><EFBFBD>)
蝟餌<EFBFBD>靘肽<EFBFBD> <EFBFBD>?<3F><EFBFBD>摰㕑<E691B0>蝟餌<E89D9F>摨? <EFBFBD>?摰<><E691B0><EFBFBD>舀綉
憭齿<EFBFBD>靘肽<EFBFBD> <EFBFBD>?PyMuPDF/OpenCV <20>仿<EFBFBD> <EFBFBD>?摰𣬚<E691B0><F0A3AC9A><EFBFBD>
*<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <EFBFBD>𩤃<EFBFBD> 鈭睲<E988AD><E79DB2>峕𧋦<E5B395>啣虾<E595A3><EFBFBD><E8B3AD>? <EFBFBD>?<3F>砍𧑐頝煾<E9A09D>?= 鈭睲<E988AD>頝煾<E9A09D>?
Nougat (Torch) <EFBFBD>?<3F><>𧋦<EFBFBD><EFBFBD>憌𡡞埯擃? <EFBFBD>?頧餅𠹭<E9A485><EFBFBD>
<EFBFBD>函蔡<EFBFBD><EFBFBD> 銝𠹺<EFBFBD> ZIP <20>? <EFBFBD><EFBFBD>?Docker <20>𨅯<EFBFBD>
<EFBFBD>臬𢆡<EFBFBD>笔漲 敹恬<EFBFBD>< 5蝘𡜐<E89D98> <EFBFBD>翰嚗?0-20蝘𡜐<E89D98>
*餈鞟輕憭齿<EFBFBD>摨? 雿? 銝?
*<EFBFBD><EFBFBD>摨? <EFBFBD>?銝齿綫<E9BDBF>? <EFBFBD>?撘箇<E69298><E7AE87><EFBFBD>

<EFBFBD><20><EFBFBD><E8A9A8><EFBFBD>

1. *蝟餌<EFBFBD>蝥找<EFBFBD>韏𣇉撩憭梧<EFBFBD><EFBFBD>游𦶢<EFBFBD><EFBFBD>嚗?

# <20><EFBFBD><EFBFBD><E99A9E>雿輻鍂鈭<E98D82><E988AD>鈭𥕦<E988AD>嚗?import fitz  # PyMuPDF <20>?靘肽<E99D98> libmupdf.so, libfreetype.so
import cv2   # OpenCV <20>?靘肽<E99D98> libGL.so.1, libgthread-2.0.so
import polars  # Polars <20>?靘肽<E99D98> libgomp.so

**SAE Python 餈鞱<E9A488><E99EB1>?*嚗?```bash <0A>?<3F><EFBFBD>靘𥟇<E99D98><F0A59F87>?Python <20><EFBFBD> <0A>?<3F><EFBFBD><E4ADBE><EFBFBD> apt-get install <0A>?餈鞱<E9A488><E99EB1>嗆𥁒<E59786><EFBFBD>ImportError: libGL.so.1: cannot open shared object file


**SAE 摰孵膥<E5ADB5>函蔡**嚗?```dockerfile
<0A>?Dockerfile 銝剛䌊<E5899B><EFBFBD><EFBFBD><E98B86>
RUN apt-get update && apt-get install -y \
    libgl1-mesa-glx \
    libglib2.0-0 \
    libgomp1

2. <EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>舀綉

<EFBFBD>砍𧑐撘<EFBFBD><EFBFBD>𤑳㴓憓?= Docker <20>𨅯<EFBFBD> = SAE <20>煺漣<E785BA><EFBFBD>
  • <EFBFBD>典銁<EFBFBD>砍𧑐 Docker 銝剛<E98A9D><E5899B><EFBFBD>嚗峕綫<E5B395>?SAE 撠曹<E692A0>摰朞<E691B0>頝煾<E9A09D>?- 瘝⊥<E7989D>"<22>砍𧑐憟賜鍂<E8B39C><E98D82><EFBFBD>銝𦠜𥁒<F0A6A09C>?<3F><>䔮憸?

3. <EFBFBD><EFBFBD><EFBFBD>批撩

<EFBFBD>芣䔉<EFBFBD><EFBFBD><EFBFBD><EFBFBD>
  <20><EFBFBD> 瘛餃<E7989B> Nougat OCR (<28><>閬?PyTorch + GPU <20><EFBFBD>)
  <20><EFBFBD> 瘛餃<E7989B><E9A483><EFBFBD><EFBFBD><E686B8><EFBFBD>?(<28><>閬?OpenCV)
  <20><EFBFBD> 瘛餃<E7989B><E9A483><EFBFBD><E6B8B8><EFBFBD><EFBFBD><EFBFBD> (<28><><EFBFBD>凒憭𡁶頂蝏笔<E89D8F>)
  <20><EFBFBD> 摰孵膥<E5ADB5>函蔡<E587BD><EFBFBD>頧餅𠹭<E9A485><EFBFBD>

4. 餈鞟輕蝏煺<EFBFBD>

<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>嚗?  <20><EFBFBD> <20>滨垢 Nginx <20>?SAE 摰孵膥
  <20><EFBFBD> <20>𡒊垢 Node.js <20>?SAE 摰孵膥
  <20><EFBFBD> Python <20>滚𦛚 <20>?SAE 摰孵膥 <20>?(蝏煺<E89D8F>蝞∠<E89D9E>)

<EFBFBD>函蔡<EFBFBD><EFBFBD><EFBFBD>?

<EFBFBD>𢞖<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?<3F>?                    <20><EFBFBD>鈭烐沲<E78390>?                               <20>?<3F>?                                                              <20>?<3F>? <20>𢞖<EFBFBD><F0A29E96><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?        <20>𢞖<EFBFBD><F0A29E96><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?  <20>?<3F>? <20>? SAE (<28>𡒊垢)  <20>?<3F>𣂼<EFBFBD>蝵爗<E89DB5>  <20>?  SAE (Python 敺格<E695BA><E6A0BC>?        <20>?  <20>?<3F>? <20>?            <20>?        <20>?                             <20>?  <20>?<3F>? <20>? Node.js    <20>?        <20>? <20>𢞖<EFBFBD><F0A29E96><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?   <20>?  <20>?<3F>? <20>? Backend    <20>?        <20>? <20>? Docker 摰孵膥:         <20>?   <20>?  <20>?<3F>? <20>?            <20>?        <20>? <20>? - FastAPI           <20>?   <20>?  <20>?<3F>? <20>?            <20>?        <20>? <20>? - PyMuPDF           <20>?   <20>?  <20>?<3F>? <20>?            <20>?        <20>? <20>? - Polars            <20>?   <20>?  <20>?<3F>? <20><EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?        <20>? <20>? - Mammoth           <20>?   <20>?  <20>?<3F>?        <20>?               <20>? <20><EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?   <20>?  <20>?<3F>?        <20>?               <20><EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?  <20>?<3F>?        <20>?                                                   <20>?<3F>?        <20><EFBFBD><E98EBF><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? RDS PostgreSQL 15                <20>?<3F>?        <20><EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? OSS (<28><>﹝摮睃<E691AE>)                    <20>?<3F><EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?```

**<2A>喲睸<E596B2>?*嚗?- Python 敺格<E695BA><E6A0BC><EFBFBD> Node.js <20>𡒊垢<F0A1928A><EFBFBD>蝵脣銁 SAE 銝𠺪<E98A9D><F0A0BAAA><EFBFBD> VPC嚗?- <20><EFBFBD><E69C9E><EFBFBD><EFBFBD><EFBFBD>帋縑嚗<E7B891><EFBFBD>?< 5ms嚗?- <20>曹澈 RDS <20>?OSS 韏<><E99F8F>

---

## <20>滨蔭<E6BBA8><E894AD><EFBFBD><EFBFBD><E79A9C>

### <20>?敹<><E695B9><EFBFBD><E99F8F>

| 韏<><E99F8F>蝐餃<E89D90> | <20>滨蔭撱箄悅 | 憸<>摯韐寧鍂 | <20><EFBFBD>?|
|---------|---------|---------|-----|
| **SAE 摨𠉛鍂** | 1<>?G / 1摰硺<E691B0> | ~100<30>?<3F>?| 餈鞱<E9A488> Python <20>滚𦛚 |
| **摰孵膥<E5ADB5>𨅯<EFBFBD>隞枏<E99A9E>** | <20><EFBFBD>鈭?ACR 銝芯犖<E88AAF>?| <20>滩晶嚗?GB嚗?| 摮睃<E691AE> Docker <20>𨅯<EFBFBD> |
| **OSS 摮睃<E691AE>** | 撌脫<E6928C><EFBFBD><E59A97><EFBFBD><EFBFBD> | 0<><30><EFBFBD>憓鮋<E68693>嚗?| <20><>﹝摮睃<E691AE> |
| **RDS PostgreSQL** | 撌脫<E6928C><EFBFBD><E59A97><EFBFBD><EFBFBD> | 0<>?| <20>唳旿摨?|

### <20>?頧臭辣<E887AD><E8BEA3><EFBFBD>

```bash
# <20>砍𧑐撘<F0A79190><E69298>烐㦤<E78390><EFBFBD><EFBFBD><E996AC>鋆?- Docker Desktop
- <20><EFBFBD>鈭?CLI嚗<49><EFBFBD><EFBFBD>

# 銝漤<E98A9D><EFBFBD>銁 SAE 銝𠰴<E98A9D><EFBFBD>遙雿蓥<E99BBF>镼選<E995BC>摰孵膥撌脣<E6928C><E884A3><EFBFBD>

<EFBFBD>?韐血噡銝擧<E98A9D><E693A7>?

  • <EFBFBD><EFBFBD>鈭𤏸揭<EFBFBD><EFBFBD>撌脫<EFBFBD>嚗?- 摰孵膥<E5ADB5>𨅯<EFBFBD>隞枏<E99A9E>霈輸䔮<E8BCB8><E494AE><EFBFBD>
  • SAE 摨𠉛鍂<F0A0899B>𥕦遣<F0A595A6><E981A3><EFBFBD>

Python <20>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>

<EFBFBD><EFBFBD> 敶枏<E695B6><E69E8F>滚𦛚璁<F0A69B9A><E79281>

<EFBFBD>滚𦛚 1: extraction_service嚗<65><E59A97><EFBFBD><E78DA2><EFBFBD><EFBFBD>

雿滨蔭: AIclinicalresearch/extraction_service/

**<2A><EFBFBD>?*:

  • PKB 璅<E79285>: 銝𠹺<E98A9D><F0A0B9BA><EFBFBD><EFBFBD>?Dify <20><EFBFBD><E3B5AA><EFBFBD><EFBFBD><EFBFBD>𡝗<EFBFBD><F0A19D97>?- ASL 璅<E79285>: <20>𣂼<EFBFBD> PDF <20><EFBFBD><E586BD><EFBFBD>瘛勗漲<E58B97><E6BCB2>
  • DC 璅<E79285>: <20>𣂼<EFBFBD> Excel/CSV <20>唳旿

<EFBFBD><EFBFBD><EFBFBD><EFBFBD>:

extraction_service/
<0A><EFBFBD><E98EBF><EFBFBD> main.py                 # FastAPI <20>亙藁
<0A><EFBFBD><E98EBF><EFBFBD> requirements.txt        # 靘肽<E99D98><E882BD>𡑒”
<0A><EFBFBD><E98EBF><EFBFBD> services/
<0A>?  <20><EFBFBD><E98EBF><EFBFBD> pdf_extractor.py    # PDF <20>𣂼<EFBFBD><EFBFBD><E59A97>摨血膥嚗?<3F>?  <20><EFBFBD><E98EBF><EFBFBD> pymupdf_extractor.py # PyMuPDF 摰䂿緵
<0A>?  <20><EFBFBD><E98EBF><EFBFBD> nougat_extractor.py # Nougat OCR 摰䂿緵
<0A>?  <20><EFBFBD><E98EBF><EFBFBD> docx_extractor.py   # Word <20>𣂼<EFBFBD>
<0A>?  <20><EFBFBD><E5A999><EFBFBD> txt_extractor.py    # 蝥舀<E89DA5><E88880><EFBFBD><E7A586>?<3F><EFBFBD><E5A999><EFBFBD> operations/
    <20><EFBFBD><E5A999><EFBFBD> fillna_operations.py # <20>唳旿皜<E697BF><E79A9C>嚗㇊olars嚗?```

**<2A>喲睸蝡舐<E89DA1>**:
```python
POST /extract/pdf      # PDF <20>𣂼<EFBFBD>
POST /extract/docx     # Word <20>𣂼<EFBFBD>
POST /extract/txt      # <20><>𧋦<EFBFBD>𣂼<EFBFBD>
POST /operations/fillna # <20>唳旿皜<E697BF><E79A9C>

<EFBFBD><EFBFBD> 靘肽<E99D98><E882BD><EFBFBD><EFBFBD>

敶枏<EFBFBD> requirements.txt <20><>捆嚗?

fastapi==0.115.5
uvicorn[standard]==0.32.1
python-multipart==0.0.20
PyMuPDF==1.24.14
pdfplumber==0.11.4
nougat-ocr==0.1.17
torch==2.1.0
torchvision==0.16.0
mammoth==1.8.0
python-docx==1.1.2
langdetect==1.0.9
chardet==5.2.0
polars==1.17.1
numpy==1.26.4

靘肽<EFBFBD>憭批<EFBFBD><EFBFBD>摯嚗?

<EFBFBD><EFBFBD><EFBFBD> 憭批<EFBFBD> <EFBFBD><EFBFBD>? <EFBFBD>臬炏敹<EFBFBD><EFBFBD>
PyMuPDF ~50MB PDF <20>𣂼<EFBFBD><EFBFBD>瓲敹<E793B2><E695B9> <EFBFBD>?敹<><E695B9>
pdfplumber ~10MB PDF 銵冽聢<E586BD>𣂼<EFBFBD> <EFBFBD>𩤃<EFBFBD> <20><EFBFBD><EFBFBD><E39A81><EFBFBD>𧊋雿輻鍂嚗?
nougat-ocr ~300MB 摮行钟霈箸<EFBFBD> OCR <EFBFBD>𩤃<EFBFBD> <20>嗆挾<E59786><EFBFBD><EFBFBD><E996AB><EFBFBD><EFBFBD><EFBFBD>
torch ~800MB Nougat 靘肽<E99D98> <EFBFBD>𩤃<EFBFBD> <20>嗆挾<E59786>?
torchvision ~100MB Nougat 靘肽<E99D98> <EFBFBD>𩤃<EFBFBD> <20>嗆挾<E59786>?
mammoth ~5MB Word <20>𣂼<EFBFBD> <EFBFBD>?敹<><E695B9>
python-docx ~3MB Word <20>𣂼<EFBFBD> <EFBFBD>?敹<><E695B9>
polars ~50MB <EFBFBD>唳旿皜<EFBFBD><EFBFBD> <EFBFBD>?敹<><E695B9>
numpy ~20MB <EFBFBD><EFBFBD>潸恣蝞? <EFBFBD>?敹<><E695B9>
fastapi ~10MB Web 獢<> <EFBFBD>?敹<><E695B9>
uvicorn ~5MB ASGI <20>滚𦛚<E6BB9A>? <EFBFBD>?敹<><E695B9>
<EFBFBD><EFBFBD> ~10MB <EFBFBD>𨭌摨? <EFBFBD>?敹<><E695B9>
*<EFBFBD>餉恣嚗<EFBFBD>鉄 Nougat嚗? ~1.4GB - -
*<EFBFBD>餉恣嚗<EFBFBD><EFBFBD><EFBFBD>?Nougat嚗? ~163MB - -

靘肽<EFBFBD>隡睃<EFBFBD>蝑𣇉裦

<EFBFBD><20>嗆挾 1嚗𡁏<E59A97>撠誩<E692A0><E8AAA9>函蔡嚗<E894A1><EFBFBD>鞟鍂鈭𡡞<E988AD>甈⊿<E79488>蝵莎<E89DB5>

<EFBFBD><EFBFBD>: 敹恍<E695B9><EFBFBD>蝥選<E89DA5>撉諹<E69289><E8ABB9><EFBFBD><E8A9A8><EFBFBD>

蝑𣇉裦:

  • <EFBFBD>?靽萘<E99DBD> PyMuPDF嚗<46>瓲敹?PDF <20>𣂼<EFBFBD>嚗?- <20>?靽萘<E99DBD> Mammoth/python-docx嚗Áord <20>𣂼<EFBFBD>嚗?- <20>?靽萘<E99DBD> Polars嚗<73><EFBFBD><EFBFBD>瘣梹<E798A3>
  • <EFBFBD>?<3F><>𧒄蝘駁膄 Nougat嚗<74><E59A97>蝘臬之嚗䔶蝙<E494B6><EFBFBD><E588B8><EFBFBD><EFBFBD>嚗? 隡睃<EFBFBD><EFBFBD>𡒊<EFBFBD> requirements.txt:
# Web 獢<>沲
fastapi==0.115.5
uvicorn[standard]==0.32.1
python-multipart==0.0.20

# <20><><EFBFBD>𣂼<EFBFBD><EFBFBD>瓲敹<E793B2><E695B9>
PyMuPDF==1.24.14
mammoth==1.8.0
python-docx==1.1.2

# <20>唳旿憭<E697BF><E686AD>
polars==1.17.1
numpy==1.26.4

# 颲<>𨭌撌亙<E6928C>
langdetect==1.0.9
chardet==5.2.0

# <20><EFBFBD><E4BA99>𣬚<EFBFBD><F0A3AC9A>?python-json-logger==2.0.7

<EFBFBD>𨅯<EFBFBD>憭批<EFBFBD><EFBFBD>: ~500MB嚗<42>鉄 Python <20><EFBFBD><E7AE87>𨅯<EFBFBD>嚗? <EFBFBD><EFBFBD>靽格㺿:

# services/pdf_extractor.py

# 瘜券<E7989C><E588B8>?Nougat <20><EFBFBD><EFBFBD><E99A9E>
# from .nougat_extractor import extract_pdf_nougat, check_nougat_available

async def extract_pdf(pdf_path: str, filename: str):
    """PDF <20>𣂼<EFBFBD><EFBFBD>𧫴畾?嚗帋<E59A97> PyMuPDF嚗?""
    
    # 璉<>瘚贝祗閮<E7A597><E996AE><EFBFBD><EFBFBD><EFBFBD>?    language = detect_language(pdf_path)
    is_academic = detect_academic_paper(pdf_path)
    
    # <20>嗆挾1嚗𡁶凒<F0A181B6>乩蝙<E4B9A9>?PyMuPDF
    text = extract_pdf_pymupdf(pdf_path)
    
    # <20>嗆挾2嚗𡁜虾隞亙<E99A9E><E4BA99>?Nougat <20>滨漣<E6BBA8><EFBFBD>
    # if language == 'english' and is_academic:
    #     try:
    #         if check_nougat_available():
    #             text = extract_pdf_nougat(pdf_path)
    #     except:
    #         text = extract_pdf_pymupdf(pdf_path)  # <20>滨漣
    
    return {
        'text': text,
        'method': 'pymupdf',
        'language': language,
        'is_academic': is_academic
    }

<EFBFBD><20>嗆挾 2嚗𡁜<E59A97><F0A1819C><EFBFBD>蝵莎<E89DB5><E88E8E>芣䔉<E88AA3><E49489><EFBFBD>𧒄嚗?

<EFBFBD>嗆㦤:

  • 敶梶鍂<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>航捏<EFBFBD><EFBFBD><EFBFBD><EFBFBD>𤥁捶<EFBFBD><EFBFBD>雿單𧒄
  • <EFBFBD>㕑雲憭毺<EFBFBD> GPU 韏<><E99F8F><EFBFBD>? 蝑𣇉裦:
  • <EFBFBD>?<3F><EFBFBD> Nougat + Torch
  • <EFBFBD>?雿輻鍂 GPU 摰硺<E691B0>嚗𠄎AE <20><EFBFBD>銝齿𣈲<E9BDBF>?GPU嚗屸<E59A97><EFBFBD><EFBFBD>?ECS嚗? 摰峕㟲<EFBFBD>?requirements.txt:
# <20><EFBFBD><EFBCB7><EFBFBD>靘肽<E99D98><EFBFBD><E59A97><EFBFBD>?Nougat嚗?fastapi==0.115.5
uvicorn[standard]==0.32.1
python-multipart==0.0.20
PyMuPDF==1.24.14
pdfplumber==0.11.4
nougat-ocr==0.1.17
torch==2.1.0
torchvision==0.16.0
mammoth==1.8.0
python-docx==1.1.2
langdetect==1.0.9
chardet==5.2.0
polars==1.17.1
numpy==1.26.4

<EFBFBD>𨅯<EFBFBD>憭批<EFBFBD><EFBFBD>: ~2GB


<EFBFBD><EFBFBD>遣 Docker <20>𨅯<EFBFBD>

甇仿炊 1嚗𡁜<E59A97>撱箔<E692B1><E7AE94>𣇉<EFBFBD> Dockerfile

<EFBFBD>?extraction_service/ <20><EFBFBD>銝见<E98A9D>撱?Dockerfile:

# ========================================
# 憭𡁻𧫴畾菜<E795BE>撱綽<E692B1><E7B6BD><EFBFBD><E8AAA9>𨅯<EFBFBD>雿梶妖
# ========================================

# <20>嗆挾 1: <20><><EFBFBD>嗆挾嚗<E68CBE><E59A97><EFBFBD><E98B86>韏吔<E99F8F>
FROM python:3.11-slim as builder

# 霈曄蔭撌乩<E6928C><E4B9A9><EFBFBD>
WORKDIR /app

# 摰㕑<E691B0>蝟餌<E89D9F>靘肽<E99D98><EFBFBD><E59A97>撱箸𧒄<E7AEB8><F0A79284><EFBFBD><E996AC>
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    g++ \
    make \
    libffi-dev \
    libssl-dev \
    && rm -rf /var/lib/apt/lists/*

# 憭滚<E686AD>靘肽<E99D98><E882BD><EFBFBD>
COPY requirements.txt .

# 摰㕑<E691B0> Python 靘肽<E99D98><E882BD><EFBFBD><E59597>毺㴓憓?RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir -r requirements.txt

# ========================================
# <20>嗆挾 2: 餈鞱<E9A488><E99EB1>嗆挾嚗<E68CBE><E59A97>撠誩<E692A0><E8AAA9>𨅯<EFBFBD>嚗?# ========================================
FROM python:3.11-slim

# 霈曄蔭撌乩<E6928C><E4B9A9><EFBFBD>
WORKDIR /app

# 摰㕑<E691B0>餈鞱<E9A488><E99EB1><EFBFBD>韏吔<E99F8F>蝟餌<E89D9F>蝥批<E89DA5> + <20>嗅躹<E59785>唳旿嚗?RUN apt-get update && apt-get install -y --no-install-recommends \
    # PyMuPDF 靘肽<E99D98>
    # 瘜剁<E7989C>libmupdf-dev <20>𡁜虜<F0A1819C><EFBFBD>蝻𤥁<E89DBB>嚗俰ip 摰㕑<E691B0><E39591>?PyMuPDF wheel <20><><EFBFBD>芸蒂<E88AB8><EFBFBD><E586BD><EFBFBD>
    # 靽萘<E99DBD><EFBFBD><E691B0>銝箔<E98A9D><E7AE94><EFBFBD><EFBFBD><E68692><EFBFBD>西澈<E8A5BF><EFBFBD>霂閧宏<E996A7><EFBFBD>撉諹<E69289>
    libmupdf-dev \
    libfreetype6 \
    libjpeg62-turbo \
    libopenjp2-7 \
    # Polars 靘肽<E99D98>
    libgomp1 \
    # <20><EFBFBD>撌亙<E6928C>
    curl \
    # <20>嗅躹<E59785>唳旿
    tzdata \
    && rm -rf /var/lib/apt/lists/*

# <20>𩤃<EFBFBD> 蝏煺<E89D8F><E785BA>嗅躹嚗鋫sia/Shanghai
ENV TZ=Asia/Shanghai
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

# 蝖桐<E89D96>銝湔𧒄<E6B994><EFBFBD><E6A180><EFBFBD><EFBFBD><EFBFBD><E4B98B>辣銝𠹺<E98A9D><F0A0B9BA><EFBFBD><EFBFBD><E996AC>
RUN mkdir -p /tmp && chmod 1777 /tmp

# 隞擧<E99A9E>撱粹𧫴畾萄<E795BE><E89084><EFBFBD><E59789>毺㴓憓?COPY --from=builder /opt/venv /opt/venv

# 憭滚<E686AD>摨𠉛鍂隞<E98D82><E99A9E>
COPY . .

# 霈曄蔭<E69B84><EFBFBD><E887AC><EFBFBD>
ENV PATH="/opt/venv/bin:$PATH" \
    PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PORT=8000

# <20>湧蠧蝡臬藁
EXPOSE 8000

# <20>亙熒璉<E78692><E79289>?HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

# <20>臬𢆡<E887AC>賭誘
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]

甇仿炊 2嚗𡁜<E59A97>撱?.dockerignore

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
env/
ENV/

# IDE
.vscode/
.idea/
*.swp
*.swo

# 瘚贝<E7989A><E8B49D><EFBFBD>獢?tests/
test_files/
*.md
README.md

# Git
.git/
.gitignore

# <20><EFBFBD>
*.log

# 銝湔𧒄<E6B994><F0A79284>辣
tmp/
temp/

甇仿炊 3嚗𡁏𧋦<F0A1818F><EFBFBD>撱粹<E692B1><E7B2B9>?

# 餈𥕦<E9A488> extraction_service <20><EFBFBD>
cd d:\MyCursor\AIclinicalresearch\extraction_service

# <20><><EFBFBD>𨅯<EFBFBD><EFBFBD>𧋦<EFBFBD><EFBFBD>霂𤏪<E99C82>
docker build -t extraction-service:latest .

# <20><EFBFBD><E4BAA6>𨅯<EFBFBD>憭批<E686AD>
docker images extraction-service

甇仿炊 4嚗𡁏𧋦<F0A1818F><EFBFBD>霂閖<E99C82><E99696>?

# <20>臬𢆡摰孵膥嚗<E886A5>𧋦<EFBFBD><EFBFBD>霂𤏪<E99C82>
docker run -d \
  --name extraction-test \
  -p 8000:8000 \
  -e DATABASE_URL="postgresql://user:pass@host:5432/dbname" \
  extraction-service:latest

# <20><EFBFBD><E4BAA6><EFBFBD>
docker logs -f extraction-test

# 瘚贝<E7989A><E8B49D>亙熒璉<E78692><E79289>?curl http://localhost:8000/health

# 瘚贝<E7989A> PDF <20>𣂼<EFBFBD>
curl -X POST \
  -F "file=@test.pdf" \
  http://localhost:8000/extract/pdf

# <20>𨀣迫撟嗅<E6929F><E59785><EFBFBD>霂訫捆<E8A8AB>?docker stop extraction-test
docker rm extraction-test

甇仿炊 5嚗𡁏綫<F0A1818F><E7B6AB><EFBFBD><EFBFBD><EFBFBD>鈭穃捆<E7A983><EFBFBD><E588B8><EFBFBD>摨?

5.1 <20>𥕦遣<F0A595A6>𨅯<EFBFBD>隞枏<E99A9E><EFBFBD><E59A97>甈⊿<E79488>蝵莎<E89DB5>

  1. <EFBFBD><EFBFBD><EFBFBD><EFBFBD>鈭烐綉<EFBFBD>嗅蝱 <20>?摰孵膥<EFBFBD>𨅯<EFBFBD><EFBFBD>滚𦛚 ACR

  2. **<2A>𥕦遣銝芯犖摰硺<E691B0>**嚗<><E59A97>韐寧<E99F90>嚗?

    摰硺<EFBFBD><EFBFBD>滨妍: extraction-service
    <0A><EFBFBD>: <20>𦒘<EFBFBD>1嚗<31>㜺撌痹<E6928C>
    
  3. <EFBFBD>𥕦遣<EFBFBD><EFBFBD>蝛粹𡢿:

    <EFBFBD><EFBFBD>蝛粹𡢿: clinical-research
    
  4. <EFBFBD>𥕦遣<EFBFBD>𨅯<EFBFBD>隞枏<EFBFBD>:

    隞枏<EFBFBD><EFBFBD>滨妍: extraction-service
    隞<><E99A9E>皞? <20>砍𧑐隞枏<E99A9E>
    

5.2 <20><EFBFBD><E588B8><EFBFBD><EFBFBD>?

# 1. <20><EFBFBD><E9A483><EFBFBD>鈭穃捆<E7A983><EFBFBD><E588B8>𤩺<EFBFBD><F0A4A9BA>?# <20><EFBFBD><E79195><EFBFBD><E9A483>賭誘嚗𡁻燵<F0A181BB><EFBFBD><E494B6><EFBFBD><E689B9>?<3F>?摰孵膥<E5ADB5>𨅯<EFBFBD><F0A885AF>滚𦛚 <20>?霈輸䔮<E8BCB8><EFBFBD> <20>?霈曄蔭Registry<72><EFBFBD><EFBFBD><E69296>
docker login --username=<your-username> registry.cn-beijing.aliyuncs.com

# 2. 蝏䠷<E89D8F><E4A0B7>𤩺<EFBFBD><F0A4A9BA><EFBFBD>
docker tag extraction-service:latest \
  registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.0

# 3. <20><EFBFBD><E588B8><EFBFBD><EFBFBD><EFBFBD>鈭?docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.0

# 4. <20><EFBFBD>?latest <20><>倌嚗<E5808C>噶鈭𤾸<E988AD>蝏剜凒<E5899C><EFBFBD>
docker tag extraction-service:latest \
  registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:latest
docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:latest

<EFBFBD>函蔡<EFBFBD>?SAE

甇仿炊 1嚗𡁜<E59A97>撱?SAE 摨𠉛鍂

  1. <EFBFBD><EFBFBD><EFBFBD><EFBFBD>鈭烐綉<EFBFBD>嗅蝱 <20>?Serverless 摨𠉛鍂撘閙<E69298> SAE

  2. <EFBFBD>𥕦遣摨𠉛鍂:

    摨𠉛鍂<EFBFBD>滨妍: extraction-service
    <0A><EFBFBD>蝛粹𡢿: <20>㗇𥋘<E39787>𡒊垢<F0A1928A><E59EA2><EFBFBD><EFBFBD><E587BD><EFBFBD>蝛粹𡢿嚗<F0A1A2BF><E59A97> VPC嚗?   <20>函蔡<E587BD><EFBFBD>: <20>𨅯<EFBFBD>
    
  3. <EFBFBD>𨅯<EFBFBD><EFBFBD>滨蔭:

    <EFBFBD>𨅯<EFBFBD><EFBFBD><EFBFBD>: registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:latest
    <0A>𨅯<EFBFBD><F0A885AF><EFBFBD>𧋦: latest
    <0A>𨅯<EFBFBD><F0A885AF><EFBFBD>蝑𣇉裦: Always嚗<73><E59A97>甈⊿<E79488>蝵脤<E89DB5><E884A4><EFBFBD><E58CA7><EFBFBD><EFBFBD><EFBFBD><E59C88>𧶏<EFBFBD>
    
  4. <EFBFBD><EFBFBD>滨蔭:

    CPU: 1<>?   <20><><EFBFBD>: 2GB
    摰硺<E691B0><E7A1BA>? 1嚗<31><E59A97>憪页<E686AA>
    撘寞<E69298><EFBFBD>蝻拙捆: 
      - <20><>撠誩<E692A0>靘𧢲㺭: 1
      - <20><>憭批<E686AD>靘𧢲㺭: 3
      - CPU 閫血<E996AB><E8A180><EFBFBD><EFBFBD>? 70%
    
  5. 蝵𤑳<EFBFBD><EFBFBD>滨蔭:

    銝𤘪<EFBFBD>蝵𤑳<EFBFBD> VPC: <20>㗇𥋘<E39787>𡒊垢<F0A1928A><E59EA2><EFBFBD><EFBFBD> VPC
    vSwitch: <20>㗇𥋘<E39787>𡒊垢<F0A1928A><E59EA2><EFBFBD><EFBFBD>鈭斗揢<E69697>?   摰匧<E691B0>蝏? <20><>捂 VPC <20><><EFBFBD>?   ```
    
    

甇仿炊 2嚗𡁻<E59A97>蝵桃㴓憓<E3B493><E68693><EFBFBD>?

<EFBFBD>?SAE 摨𠉛鍂<F0A0899B>滨蔭銝剜溶<E5899C>牐誑銝讠㴓憓<E3B493><E68693><EFBFBD>𧶏<EFBFBD>

# ========= <20>唳旿摨㯄<E691A8>蝵?=========
DATABASE_URL=postgresql://user:password@rm-xxxx.pg.rds.aliyuncs.com:5432/clinical_research

# ========= 摮睃<E691AE><E79D83>滨蔭 =========
OSS_ENDPOINT=oss-cn-hangzhou-internal.aliyuncs.com
OSS_BUCKET=your-bucket-name
OSS_ACCESS_KEY_ID=<your-id>
OSS_ACCESS_KEY_SECRET=<your-secret>

# ========= <20>滚𦛚<E6BB9A>滨蔭 =========
SERVICE_NAME=extraction-service
SERVICE_VERSION=v1.0
LOG_LEVEL=INFO

# ========= <20><EFBFBD><E689AF>滨蔭 =========
WORKERS=2
TIMEOUT=300
MAX_FILE_SIZE=52428800

# ========= <20>嗅躹 =========
TZ=Asia/Shanghai

甇仿炊 3嚗𡁻<E59A97>蝵桀<E89DB5>摨瑟<E691A8><E7919F>?

<EFBFBD>亙熒璉<EFBFBD><EFBFBD>亥楝敺? /health
<0A>亙熒璉<E78692><E79289>亦垢<E4BAA6>? 8000
<0A>亙熒璉<E78692><E79289><EFBFBD>霈? HTTP
<0A><EFBFBD>撱嗉<E692B1>: 30蝘?璉<><E79289>仿𡢿<E4BBBF>? 10蝘?頞<>𧒄<EFBFBD>園𡢿: 5蝘?<3F>亙熒<E4BA99><E78692><EFBFBD>? 2甈?銝滚<E98A9D>摨琿<E691A8><E790BF>? 3甈?```

### 甇仿炊 4嚗𡁻<E59A97>蝵格𠯫敹?
```bash
<0A><EFBFBD><E4BA99><EFBFBD>: /app/logs
<0A><EFBFBD><E4BA99><EFBFBD>辣: extraction-service.log
<0A><EFBFBD>蝥批<E89DA5>: INFO
<0A><EFBFBD>靽萘<E99DBD>憭拇㺭: 7憭?```

### 甇仿炊 5嚗𡁻<E59A97>蝵?SLB嚗<42><EFBFBD><EFBFBD><EFBFBD><E68692><EFBFBD><EFBFBD><EFBFBD><E996AC>蝵𤏸挪<F0A48FB8><EFBFBD>

```bash
# <20>𡁜虜 Python 敺格<E695BA><E6A0BC><E288AA><E898A8><EFBFBD><E996AC>蝵𤏸挪<F0A48FB8><EFBFBD>鋡怠<E98BA1>蝡航<E89DA1><E888AA><EFBFBD>
# 憒<><E68692><EFBFBD><EFBFBD><EFBFBD><E996AC>蝵𤏸挪<F0A48FB8><EFBFBD><EFBFBD><E68692><EFBFBD><E99D9A><EFBFBD><EFBFBD>洵銝㗇䲮<E39787><E4B2AE><EFBFBD>嚗㚁<E59A97>

韐蠘蝸<E8A098><E89DB8>﹛蝐餃<E89D90>: <20><EFBFBD>
<0A>穃𨯬蝡臬藁: 80
<0A>𡒊垢蝡臬藁: 8000
<0A>亙熒璉<E78692><E79289>? <20>舐鍂

甇仿炊 6嚗𡁻<E59A97>蝵脣<E89DB5><E884A3>?

  1. <EFBFBD>孵稬"<22>函蔡摨𠉛鍂"

  2. **蝑匧<E89D91><E58CA7>函蔡摰峕<E691B0>**嚗<>漲 2-3 <20><><EFBFBD>嚗?

  3. <EFBFBD><EFBFBD><EFBFBD>函蔡<EFBFBD><EFBFBD>:

    [INFO] Pulling image...
    [INFO] Image pulled successfully
    [INFO] Starting container...
    [INFO] Container started successfully
    [INFO] Health check passed
    [INFO] Application is running
    

瘚贝<EFBFBD>銝𡡞<EFBFBD>霂?

甇仿炊 1嚗朞繮<E69C9E><EFBFBD>蝵穃𧑐<E7A983><F0A79190><EFBFBD><E59A97><EFBFBD>格郊撉歹<E69289>

<EFBFBD>𩤃<EFBFBD> <20><EFBFBD>嚗锭AE 摰硺<E691B0><E7A1BA>湔糓頝其蜓<E585B6><EFBFBD><EFBFBD><E59A97>憿颱蝙<E9A2B1>?SAE <20>𣂷<EFBFBD><F0A382B7><EFBFBD><EFBFBD>蝵穃𧑐<E7A983><F0A79190>

<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>迤蝖格䲮瘜𤏪<EFBFBD>

  1. <EFBFBD><EFBFBD> SAE <20><EFBFBD><E689B9>? <20>?摨𠉛鍂<EFBFBD>𡑒” <20>?<EFBFBD>孵稬 extraction-service 摨𠉛鍂*

  2. <EFBFBD><EFBFBD><EFBFBD>刻祕<EFBFBD><EFBFBD>△嚗峕𪄳<EFBFBD>?摨𠉛鍂霈輸䔮<E8BCB8>滨蔭"<22>?VPC <20><><EFBFBD>霈輸䔮"<22><EFBFBD>

  3. **<2A><EFBFBD>撟嗅<E6929F><E59785>?<3F><><EFBFBD>霈輸䔮<E8BCB8><EFBFBD>"**嚗屸<E59A97>𡁜虜<F0A1819C>臭誑銝𧢲聢撘譍<E69298><EFBFBD>嚗? ```

    <EFBFBD><EFBFBD> 1: <20><><EFBFBD> IP + 蝡臬藁嚗<E89781><E59A97>潃鐥<E6BD83>潃鐥<E6BD83> 撘箇<E69298><E7AE87><EFBFBD>嚗峕<E59A97>蝔喳<E89D94>嚗? 172.17.x.x:8000

    <EFBFBD><EFBFBD> 2: SAE <20><><EFBFBD> Service <20><EFBFBD><EFBFBD><E59A97><EFBFBD><E996AC>憭㚚<E686AD>蝵格<E89DB5><E6A0BC><EFBFBD><E288AA><EFBFBD>銝齿綫<E9BDBF><EFBFBD>

    extraction-service-xxxxx.cn-hangzhou.sae.aliyuncs.com:8000

    <EFBFBD><EFBFBD> 3: K8s Service <20><EFBFBD><EFBFBD><E59A97><EFBFBD><E996AC>蝵堉8s<38>滚𦛚<E6BB9A>𤑳緵嚗<E7B7B5><E59A97><EFBFBD><EFBFBD><EFBFBD>銝齿綫<E9BDBF><EFBFBD>

    extraction-service.namespace.svc.cluster.local:8000

    
    
  4. **<2A>?<3F>躰秤<E8BAB0>𡁏<EFBFBD><EFBFBD><E59A97>撖潸稲餈墧𦻖憭梯揖嚗?*嚗? ```bash

    <EFBFBD>?銝滩<E98A9D><E6BBA9>𨀣<EFBFBD><F0A880A3><EFBFBD>霈曉<E99C88><E69B89><EFBFBD>100%憭梯揖嚗? http://extraction-service.sae:8000 # .sae <20><EFBFBD>銝滚<E98A9D><E6BB9A>? http://extraction-service.internal:8000 # .internal <20><EFBFBD>銝滚<E98A9D><E6BB9A>? http://extraction-service.cluster.local:8000 # <20><><EFBFBD>8s<38>滚𦛚<E6BB9A>𤑳緵<F0A491B3>滨蔭

    <EFBFBD>?銝滩<E98A9D>雿輻鍂 localhost

    http://localhost:8000 # SAE 摰硺<E691B0><E7A1BA>湔糓頝其蜓<E585B6><EFBFBD>

    <EFBFBD>?銝滩<E98A9D>雿輻鍂 Docker <20>滚𦛚<E6BB9A>? http://extraction-service:8000 # 餈嗘<E9A488><E59798><EFBFBD><E887AC>?Docker Compose

    
    
  5. **<2A>?<3F><EFBFBD><E588BB>𡁏<EFBFBD><EFBFBD><E59A97>隡睃<E99AA1>蝥扳<E89DA5>摨𧶏<E691A8>**嚗? ```bash

    潃鐥<EFBFBD>潃鐥<EFBFBD>潃?<3F><EFBFBD>A嚗𡁶凒<F0A181B6>乩蝙<E4B9A9><EFBFBD>蝵飡P嚗<50><EFBFBD><E692A9><EFBFBD><EFBFBD>

    EXTRACTION_SERVICE_URL=http://172.17.x.x:8000

    <EFBFBD><EFBFBD><EFBFBD><EFBFBD>嚗锭AE<EFBFBD><EFBFBD><EFBFBD>?> Python摨𠉛鍂 > 摰硺<E691B0><E7A1BA>𡑒” > <20><EFBFBD><E4BAA6><EFBFBD><EFBFBD>IP

    潃鐥<EFBFBD>潃?<3F><EFBFBD>B嚗帋蝙<E5B88B>沒AE<41>滚𦛚<E6BB9A>𤑳緵嚗<E7B7B5><E59A97><EFBFBD><E996AC>憭㚚<E686AD>蝵殷<E89DB5>銝齿綫<E9BDBF>𣂼<EFBFBD><F0A382BC>煺蝙<E785BA><EFBFBD>

    <EFBFBD><EFBFBD><EFBFBD>銁SAE<EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝵?敺格<E695BA><E6A0BC>⊥釣<E28AA5>䔶葉敹?

    EXTRACTION_SERVICE_URL=http://extraction-service-xxxxx.cn-hangzhou.sae.aliyuncs.com:8000

    
    

甇仿炊 2嚗𡁻<E59A97>蝵桀<E89DB5>蝡舐㴓憓<E3B493><E68693><EFBFBD>?

<EFBFBD>?SAE <20>𡒊垢摨𠉛鍂<F0A0899B><E98D82>㴓憓<E3B493><E68693><EFBFBD>譍葉瘛餃<E7989B>嚗?

# <20>𩤃<EFBFBD> 雿輻鍂 SAE <20><EFBFBD><E689B9>唳遬蝷箇<E89DB7><E7AE87><EFBFBD><E7AC94><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
EXTRACTION_SERVICE_URL=http://172.17.x.x:8000

# 瘜冽<E7989C>嚗?# 1. 銝滩<E98A9D>雿輻鍂<E8BCBB>𨀣<EFBFBD><F0A880A3><EFBFBD><EFBFBD><EFBFBD>?# 2. 敹<>◆隞?SAE <20><EFBFBD><E689B9><EFBFBD>"摨𠉛鍂霈輸䔮<E8BCB8>滨蔭"銝剛繮<E5899B>?# 3. 憒<><E68692> IP <20><EFBFBD><EFBFBD><E59A97><EFBFBD>齿鰵<E9BDBF>函蔡嚗㚁<E59A97><E39A81><EFBFBD><EFBFBD><E996AC>甇交凒<E4BAA4><EFBFBD>銝芰㴓憓<E3B493><E68693><EFBFBD>?```

**<2A>滨蔭<E6BBA8>𡡞<EFBFBD><F0A1A19E><EFBFBD>蝡臬<E89DA1><E887AC>?*嚗?- SAE <20><EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?<3F>滚鍳

### 甇仿炊 3嚗帋<E59A97><E5B88B>𡒊垢<F0A1928A>滚𦛚瘚贝<E7989A>

<0A><EFBFBD><E586BD>?Node.js <20>𡒊垢<F0A1928A>滚𦛚銝剜溶<E5899C><EFBFBD>霂閧垢<E996A7><EFBFBD>

```typescript
// backend/src/tests/test-extraction-service.ts

import axios from 'axios';
import FormData from 'form-data';
import fs from 'fs';

const EXTRACTION_SERVICE_URL = process.env.EXTRACTION_SERVICE_URL || 'http://extraction-service.internal:8000';

export async function testExtractionService() {
  try {
    // 1. <20>亙熒璉<E78692><E79289>?    console.log('Testing health endpoint...');
    const healthRes = await axios.get(`${EXTRACTION_SERVICE_URL}/health`);
    console.log('Health check:', healthRes.data);

    // 2. 瘚贝<E7989A> PDF <20>𣂼<EFBFBD>
    console.log('Testing PDF extraction...');
    const form = new FormData();
    form.append('file', fs.createReadStream('./test.pdf'));
    
    const pdfRes = await axios.post(
      `${EXTRACTION_SERVICE_URL}/extract/pdf`,
      form,
      { headers: form.getHeaders() }
    );
    console.log('PDF extraction result:', pdfRes.data);

    // 3. 瘚贝<E7989A> Word <20>𣂼<EFBFBD>
    console.log('Testing Word extraction...');
    const form2 = new FormData();
    form2.append('file', fs.createReadStream('./test.docx'));
    
    const docxRes = await axios.post(
      `${EXTRACTION_SERVICE_URL}/extract/docx`,
      form2,
      { headers: form2.getHeaders() }
    );
    console.log('Word extraction result:', docxRes.data);

    console.log('<27>?All tests passed!');
  } catch (error) {
    console.error('<27>?Test failed:', error.message);
    if (error.response) {
      console.error('Response:', error.response.data);
    }
  }
}

甇仿炊 4嚗𡁻<E59A97><EFBFBD><EFBFBD>啁垢瘚<E59EA2><E7989A><EFBFBD><E59A97><EFBFBD><EFBFBD><E6B8AF><E288AA><EFBFBD>

瘚贝<EFBFBD>隞乩<EFBFBD>銝𡁜𦛚瘚<EFBFBD><EFBFBD>嚗?

<EFBFBD>箸艶 1: PKB <20><>﹝銝𠹺<E98A9D>

**銝𡁜𦛚瘚<F0A69B9A><E7989A>**嚗?``` <0A><EFBFBD>銝𠹺<E98A9D> PDF <0A>?Node.js <20>𡒊垢<F0A1928A>交𤣰 <0A>?HTTP POST 頧砍<E9A0A7><E7A08D><EFBFBD>辣瘚<E8BEA3><E7989A> Python <20>滚𦛚 (EXTRACTION_SERVICE_URL) <0A>?Python <20>滚𦛚閫<F0A69B9A><E996AB> PDF嚗諹<E59A97><E8ABB9>?JSON <20><>𧋦 <0A>?Node.js <20>𡒊垢<F0A1928A><EFBFBD><E59785><EFBFBD>𧋦 <0A>?銝𠹺<E98A9D><F0A0B9BA>?Dify <0A>?餈𥪜<E9A488><F0A5AA9C>滨垢


**瘚贝<E7989A>甇仿炊**嚗?1. <20><EFBFBD>蝡臭<E89DA1>隡牐<E99AA1>銝?PDF <20><>﹝嚗<EFB99D>遣霈?< 5MB <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><E78DA2>

2. **<2A><EFBFBD> Node.js <20>𡒊垢<F0A1928A><EFBFBD>**嚗𠄎AE <20><EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?<3F><EFBFBD>嚗㚁<E59A97>

[INFO] Calling extraction service: http://172.17.x.x:8000/extract/pdf [INFO] Extraction completed in 2.3s [INFO] Extracted text preview: "This is a test document..."


3. **<2A><EFBFBD> Python <20>滚𦛚<E6BB9A><EFBFBD>**嚗𠄎AE <20><EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?<3F><EFBFBD>嚗㚁<E59A97>

INFO: Request: POST /extract/pdf INFO: File size: 1.2MB, filename: test.pdf INFO: Using PyMuPDF extraction INFO: Response: 200 (took 2.10s)


4. **<2A>?Dify Web UI 銝剔霈斗<E99C88><EFBFBD>歇銝𠹺<E98A9D>**

**憒<><E68692>憭梯揖嚗峕<E59A97><E5B395>?*嚗?- <20>𡒊垢<F0A1928A><EFBFBD><E4BA99>臬炏<E887AC>曄內 "Connection refused" <20>?璉<><E79289>?EXTRACTION_SERVICE_URL <20>滨蔭
- Python <20><EFBFBD><E4BA99>臬炏<E887AC>曄內 "ImportError" <20>?璉<><E79289>?Dockerfile 蝟餌<E89D9F>靘肽<E99D98>
- <20>𣂼<EFBFBD><EFBFBD>𧒄嚗? 300s嚗争<E59A97> <20><>辣憭芸之<E88AB8><EFBFBD><EFBFBD><E996AC><EFBFBD><EFBFBD><E3B098><EFBFBD>蝵?
#### <20>箸艶 2: ASL 瘛勗漲<E58B97><E6BCB2>

<EFBFBD><EFBFBD><EFBFBD>孵稬"瘛勗漲<E58B97><E6BCB2>粉" <20>?<3F>𡒊垢靚<E59EA2>鍂 Python <20>滚𦛚<E6BB9A>𣂼<EFBFBD><F0A382BC><EFBFBD> <20>?餈𥪜<E9A488> LLM <20><><EFBFBD>蝏𤘪<E89D8F>


**瘚贝<E7989A>甇仿炊**嚗?1. <20>?ASL 璅<E79285><E288AA>孵稬"瘛勗漲<E58B97><E6BCB2>粉"
2. <20><EFBFBD><E4BAA6>𡒊垢<F0A1928A><EFBFBD><EFBFBD>霈方<E99C88><E696B9>?Python <20>滚𦛚嚗?3. <20><EFBFBD> Python <20>滚𦛚<E6BB9A><EFBFBD><EFBFBD>霈斗<E99C88><E69697>𡝗<EFBFBD><F0A19D97><EFBFBD><EFBFBD>
4. <20>滨垢<E6BBA8>曄內<E69B84><E585A7><EFBFBD>蝏𤘪<E89D8F>

#### <20>箸艶 3: DC <20>唳旿皜<E697BF><E79A9C>

<EFBFBD><EFBFBD>銝𠹺<EFBFBD> Excel <20>?<3F>𡒊垢靚<E59EA2>鍂 Python <20>滚𦛚 fillna <20>?餈𥪜<E9A488><EFBFBD><E79A9C><EFBFBD>擧㺭<E693A7>?```

瘚贝<EFBFBD>甇仿炊嚗?1. <20>?DC 璅<E79285>銝𠹺<E98A9D> Excel <20><>辣 2. <20><EFBFBD> fillna <20><EFBFBD> 3. <20><EFBFBD> Python <20>滚𦛚<E6BB9A><EFBFBD> 4. 撉諹<E69289><EFBFBD><E79A9C>蝏𤘪<E89D8F>


<EFBFBD>烐綉銝𡒊輕<EFBFBD>?

<EFBFBD><EFBFBD> SAE <20>芸蒂<E88AB8>烐綉

1. <20><EFBFBD>摨𠉛鍂<F0A0899B>烐綉

SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>烐綉

**<2A>喲睸<E596B2><E79DB8><EFBFBD>**嚗?- **CPU 雿輻鍂<E8BCBB>?*嚗? 70%嚗㚁<E59A97>PDF <20>𣂼<EFBFBD><F0A382BC>?CPU 撖<><E69296><EFBFBD>衤遙<E8A1A4>?- **<2A><><EFBFBD>雿輻鍂<E8BCBB>?*嚗? 80%嚗㚁<E59A97>憭扳<E686AD>隞嗅<E99A9E><E59785><EFBFBD>𧒄隡𡁜<E99AA1><F0A1819C><EFBFBD>憭𡁜<E686AD>摮?- 霂瑟<EFBFBD> QPS<EFBFBD><EFBFBD>蝘埝䰻霂㺭嚗㚁<EFBFBD><EFBFBD>圾韐蠘蝸<EFBFBD><EFBFBD><EFBFBD>

  • 撟喳<EFBFBD><EFBFBD><EFBFBD><EFBFBD>園𡢿嚗? 1000ms嚗㚁<E59A97>撠𤩺<E692A0>隞嗅<E99A9E> < 2s嚗<73><EFBFBD><E4B98B>辣 < 30s
  • **<2A>躰秤<E8BAB0>?*嚗? 1%嚗㚁<E59A97><E39A81>烐綉<E78390><E7B689>辣閫<E8BEA3><E996AB>憭梯揖<E6A2AF>? **<2A><EFBFBD><E689AF><EFBFBD><EFBFBD><E59A97><EFBFBD><EFBFBD><EFBFBD>**嚗? 撠𤩺<E692A0>隞塚<E99A9E>< 1MB PDF嚗㚁<E59A97><E39A81><EFBFBD><E6BB9A>園𡢿 1-3s 銝剔<E98A9D><E58994><EFBFBD>辣嚗?-10MB PDF嚗㚁<E59A97><E39A81><EFBFBD><E6BB9A>園𡢿 5-15s 憭扳<E686AD>隞塚<E99A9E>10-50MB PDF嚗㚁<E59A97><E39A81><EFBFBD><E6BB9A>園𡢿 20-60s 頞<><EFBFBD><E4B98B>辣嚗? 50MB嚗㚁<E59A97>撱箄悅<E7AE84>𣂼<EFBFBD><F0A382BC>𡝗<EFBFBD>蝏?

2. 摰墧𧒄<E5A2A7><EFBFBD><E4BA99><EFBFBD>

SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F><EFBFBD> <20>?摰墧𧒄<E5A2A7><EFBFBD>

**<2A><EFBFBD>蝐餃<E89D90>**嚗?- 摨𠉛鍂<F0A0899B><EFBFBD>嚗ìtdout/stderr嚗㚁<E59A97>uvicorn <20>臬𢆡靽⊥<E99DBD><E28AA5><EFBFBD>窈瘙<E7AA88>𠯫敹?- 霈輸䔮<E8BCB8><EFBFBD>嚗𠃍TTP 霂瑟<E99C82>嚗㚁<E59A97>霂瑟<E99C82>頝臬<E9A09D><E887AC><EFBFBD><EFBFBD>摨娍𧒄<E5A88D><EFBFBD><E6B0AC>𠶖<EFBFBD><F0A0B696><EFBFBD>

  • <EFBFBD>躰秤<EFBFBD><EFBFBD><EFBFBD><EFBFBD>撣詨<EFBFBD><EFBFBD><EFBFBD><EFBFBD>嚗䥪ython 撘<>虜霂行<E99C82>

**<2A>喲睸<E596B2><EFBFBD>蝷箔<E89DB7>**嚗?```bash

<EFBFBD>?甇<><EFBFBD>臬𢆡

INFO: Started server process [1] INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000

<EFBFBD>?甇<>虜霂瑟<E99C82>

INFO: Request: POST /extract/pdf INFO: File: test.pdf (1.2MB) INFO: Response: 200 (took 2.10s)

<EFBFBD>?<3F>躰秤<E8BAB0><EFBFBD><EFBFBD><E59A97><EFBFBD>單釣嚗?ERROR: ImportError: libGL.so.1: cannot open shared object file

ERROR: Timeout: PDF extraction took > 300s ERROR: Memory error: Cannot allocate memory


#### 3. 撘寞<E69298>找撓蝻拚<E89DBB>蝵?

SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?撘寞<E69298>找撓蝻?```

<EFBFBD><EFBFBD><EFBFBD>滨蔭嚗?``` <0A><>撠誩<E692A0>靘𧢲㺭: 1嚗<31>靽脲<E99DBD><E884B2><EFBFBD>銝剜鱏嚗?<3F><>憭批<E686AD>靘𧢲㺭: 3嚗<33><EFBFBD><EFBFBD><E6A180><EFBFBD><EFBFBD>頧質<E9A0A7><E8B3AA><EFBFBD>

閫血<EFBFBD><EFBFBD>∩辣:

  • CPU 雿輻鍂<E8BCBB>?> 70% <20><>賒 3 <20><><EFBFBD> <20>?<3F>拙捆 1 銝芸<E98A9D>靘? - CPU 雿輻鍂<E8BCBB>?< 30% <20><>賒 5 <20><><EFBFBD> <20>?蝻拙捆 1 銝芸<E98A9D>靘?```

**瘜冽<E7989C>鈭钅★**嚗?- PDF <20>𣂼<EFBFBD><F0A382BC>?CPU 撖<><E69296><EFBFBD><EFBFBD><E9A1B5>拙捆銝餉<E98A9D><E9A489>?CPU

  • <EFBFBD><EFBFBD>蝏誩虜<EFBFBD>拙捆嚗諹<EFBFBD><EFBFBD><EFBFBD><EFBFBD>湔𦻖憓𧼮<EFBFBD>摰硺<EFBFBD><EFBFBD>聢嚗?<3F>?<3F>?4<><EFBFBD>
  • SAE 隡朞䌊<E69C9E><EFBFBD>頧賢<E9A0A7>銵∴<E98AB5><E288B4>𣳇<EFBFBD><F0A3B387>见𢆡<E8A781>滨蔭

<EFBFBD><EFBFBD> 摨𠉛鍂<F0A0899B><E98D82><EFBFBD><EFBFBD>?

瘛餃<EFBFBD><EFBFBD>亙熒璉<EFBFBD><EFBFBD>亦垢<EFBFBD>?

# main.py

from fastapi import FastAPI
import psutil
import os

app = FastAPI()

@app.get("/health")
async def health_check():
    """<22>亙熒璉<E78692><E79289>亦垢<E4BAA6>?""
    return {
        "status": "healthy",
        "service": "extraction-service",
        "version": os.getenv("SERVICE_VERSION", "unknown")
    }

@app.get("/metrics")
async def metrics():
    """<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝡舐<EFBFBD>"""
    cpu_percent = psutil.cpu_percent(interval=1)
    memory = psutil.virtual_memory()
    disk = psutil.disk_usage('/app')
    
    return {
        "cpu": {
            "percent": cpu_percent,
            "count": psutil.cpu_count()
        },
        "memory": {
            "total": memory.total,
            "available": memory.available,
            "percent": memory.percent
        },
        "disk": {
            "total": disk.total,
            "used": disk.used,
            "free": disk.free,
            "percent": disk.percent
        }
    }

瘛餃<EFBFBD>霂瑟<EFBFBD><EFBFBD><EFBFBD>

# main.py

import logging
from fastapi import Request
import time

# <20>滨蔭<E6BBA8><EFBFBD>
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('/app/logs/extraction-service.log'),
        logging.StreamHandler()
    ]
)
logger = logging.getLogger(__name__)

@app.middleware("http")
async def log_requests(request: Request, call_next):
    """霂瑟<E99C82><E7919F><EFBFBD>銝剝𡢿隞?""
    start_time = time.time()
    
    # 霈啣<E99C88>霂瑟<E99C82>
    logger.info(f"Request: {request.method} {request.url}")
    
    # <20><EFBFBD>霂瑟<E99C82>
    response = await call_next(request)
    
    # 霈啣<E99C88><E595A3><EFBFBD>
    process_time = time.time() - start_time
    logger.info(
        f"Response: {response.status_code} "
        f"(took {process_time:.2f}s)"
    )
    
    return response

<EFBFBD><EFBFBD> 摰𡁏<E691B0>蝏湔擪隞餃𦛚

瘥誩𪂹隞餃𦛚

# 1. 璉<><E79289>交𠯫敹堒之撠?du -sh /app/logs

# 2. <20><EFBFBD><E4BAA6>躰秤<E8BAB0><EFBFBD>
tail -n 100 /app/logs/extraction-service.log | grep ERROR

# 3. <20>滚鍳摨𠉛鍂嚗<E98D82><E59A97><EFBFBD>𨀣<EFBFBD><F0A880A3><EFBFBD><EFBFBD><EFBFBD><E7989C>嚗?# SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>滚鍳

瘥𤩺<EFBFBD>隞餃𦛚

# 1. <20>湔鰵 Python 靘肽<E99D98>
pip list --outdated

# 2. <20>滚遣<E6BB9A>𨅯<EFBFBD><EFBFBD><E59A97><EFBFBD><EFBFBD><E680A0>冽凒<E586BD><EFBFBD>
docker build -t extraction-service:v1.1 .
docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.1

# 3. <20>?SAE 銝剜凒<E5899C><EFBFBD><E59C88><EFBFBD><E8AE90>?```

---

## <20><><EFBFBD><EFBFBD>埝䰻

### <20>𤣳 撣貉<E692A3><E8B289><EFBFBD>

#### <20><EFBFBD> 1嚗𡁜捆<F0A1819C>典鍳<E585B8>典仃韐?
**<2A><>𠶖**嚗?```
SAE <20>曄內嚗𡁜<E59A97><F0A1819C>典鍳<E585B8>典仃韐?<3F><EFBFBD><E4BA99>曄內嚗䥑mportError: libXXX.so: cannot open shared object file

**<2A><EFBFBD>**嚗𡁶撩撠𤑳頂蝏煺<E89D8F>韏? **閫<><E996AB>**嚗?```dockerfile

<EFBFBD>?Dockerfile 銝剜溶<E5899C>删撩憭梁<E686AD>摨?RUN apt-get update && apt-get install -y \

libgl1-mesa-glx \      # OpenCV
libglib2.0-0 \         # OpenCV
libgomp1 \             # Polars
libmupdf-dev \         # PyMuPDF
&& rm -rf /var/lib/apt/lists/*

#### <20><EFBFBD> 2嚗䥪DF <20>𣂼<EFBFBD><EFBFBD>𧒄

**<2A><>𠶖**嚗?```
霂瑟<E99C82><EFBFBD>𧒄嚗? 300蝘𡜐<E89D98>
<0A><EFBFBD><E4BA99>曄內嚗関imeout error

<EFBFBD>埝䰻甇仿炊嚗?```bash

1. 璉<><E79289><EFBFBD>隞嗅之撠?# 憒<><E68692><EFBFBD><EFBFBD>辣 > 50MB嚗諹<E59A97><E8ABB9><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><E686AD>

2. 憓𧼮<E68693><EFBFBD>𧒄<EFBFBD>園𡢿

SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂<F0A0899B>滨蔭 <20>?<3F><EFBFBD><E887AC><EFBFBD>

TIMEOUT=600

3. 隡睃<E99AA1><E79D83>𣂼<EFBFBD><F0A382BC><EFBFBD>

頝唾<EFBFBD><EFBFBD><EFBFBD>憿萸<EFBFBD><EFBFBD><EFBFBD>蝻拙㦛<EFBFBD><EFBFBD><EFBFBD>


#### <20><EFBFBD> 3嚗𡁜<E59A97>摮䀹滯<E480B9><EFBFBD>OOM嚗?
**<2A><>𠶖**嚗?```
摰孵膥<E5ADB5>芸𢆡<E88AB8>滚鍳
<0A><EFBFBD><E4BA99>曄內嚗鐗illed (signal 9)

**閫<><E996AB>**嚗?```bash

1. 憓𧼮<E68693><F0A7BCAE><EFBFBD><EFBFBD><EFBFBD>滨蔭

SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂<F0A0899B>滨蔭 <20>?閫<>

<EFBFBD><EFBFBD><EFBFBD>: 2GB <20>?4GB

2. 隡睃<E99AA1><EFBFBD><E99A9E><EFBFBD><E59A97>撘誩<E69298><E8AAA9><EFBFBD><EFBFBD>

銝滩<EFBFBD><EFBFBD>甈⊥<EFBFBD><EFBFBD>頧賣㟲銝芣<EFBFBD>隞嗅<EFBFBD><EFBFBD><EFBFBD><EFBFBD>

with open(pdf_path, 'rb') as f: # <20><><EFBFBD><EFBFBD><E686AD> for chunk in read_in_chunks(f): process(chunk)


#### <20><EFBFBD> 4嚗𡁜<E59A97>蝡舀<E89DA1>瘜閗<E7989C><E99697>?Python <20>滚𦛚嚗<F0A69B9A><E59A97>憸煾<E686B8>霂荔<E99C82>

**<2A><>𠶖**嚗?```
<0A>𡒊垢<F0A1928A><EFBFBD>嚗鋴onnection refused
<0A>?ECONNREFUSED: connect ECONNREFUSED 172.17.x.x:8000
<0A>?Error: getaddrinfo ENOTFOUND extraction-service.internal

<EFBFBD>寞𧋦<EFBFBD><EFBFBD><EFBFBD>埝䰻嚗? *<EFBFBD><EFBFBD> 1嚗𡁜<E59A97>蝵穃𧑐<E7A983><F0A79190><EFBFBD>滨蔭<E6BBA8>躰秤嚗<E7A7A4><E59A97>撣貉<E692A3>嚗?

# <20>?<3F>躰秤<E8BAB0>滨蔭嚗<E894AD><E59A97>瘚讠<E7989A><E8AEA0><EFBFBD>嚗?EXTRACTION_SERVICE_URL=http://extraction-service.internal:8000

# <20>?甇<><EFBFBD>滨蔭嚗𠄎AE <20><EFBFBD><E689B9>唳遬蝷箇<E89DB7><E7AE87><EFBFBD><E7AC94><EFBFBD>嚗?EXTRACTION_SERVICE_URL=http://172.17.x.x:8000

**閫<><E996AB><EFBFBD><EFBFBD>**嚗?```bash

1. <20><EFBFBD><E79195><EFBFBD><E7AC94><EFBFBD><EFBFBD><EFBFBD><EFBFBD>

SAE <20><EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?摨𠉛鍂霂行<E99C82> <20>?摨𠉛鍂霈輸䔮<E8BCB8>滨蔭

憭滚<EFBFBD><EFBFBD>曄內<EFBFBD>?VPC <20><><EFBFBD>霈輸䔮<E8BCB8><EFBFBD>"

2. <20>湔鰵<E6B994>𡒊垢<F0A1928A><EFBFBD><E887AC><EFBFBD>

SAE <20><EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?摨𠉛鍂<F0A0899B>滨蔭 <20>?<3F><EFBFBD><E887AC><EFBFBD>

EXTRACTION_SERVICE_URL=http://<<3C><EFBFBD><E7AC94><EFBFBD><EFBFBD>IP>:8000

3. <20>滚鍳<E6BB9A>𡒊垢摨𠉛鍂

SAE <20><EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?<3F>滚鍳


**<2A><EFBFBD> 2嚗䥪ython <20>滚𦛚<E6BB9A>芸鍳<E88AB8>?*
```bash
# 璉<><E79289>?Python <20>滚𦛚<E6BB9A><EFBFBD>?# SAE <20><EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?摰硺<E691B0><E7A1BA>𡑒”
# 蝖株恕摰硺<E691B0><E7A1BA><EFBFBD><E59786>蛹"餈鞱<E9A488>銝?

# <20><EFBFBD><E4BAA6>臬𢆡<E887AC><EFBFBD>
# SAE <20><EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?<3F><EFBFBD>
# 摨磰砲<E7A3B0><EFBFBD>嚗?# INFO: Application startup complete.
# INFO: Uvicorn running on http://0.0.0.0:8000

<EFBFBD><EFBFBD> 3嚗𡁜<E59A97><F0A1819C><EFBFBD><EFBFBD><E996AB><EFBFBD>𣂼<EFBFBD>

# SAE 暺䁅恕<E48185>?VPC <20><><EFBFBD><EFBFBD>典虾鈭垍㮾霈輸䔮
# 憒<><E68692>隞齿<E99A9E>瘜閗<E7989C><E99697><EFBFBD><EFBFBD><E79289><EFBFBD>
# SAE <20><EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?蝵𤑳<E89DB5><F0A491B3>滨蔭 <20>?摰匧<E691B0>蝏?# 蝖株恕<E6A0AA><EFBFBD><EFBFBD><E996AB><EFBFBD><EFBFBD>捂 VPC <20><><EFBFBD>?8000 蝡臬藁

**瘚贝<E7989A><E8B49D><EFBFBD><EFBFBD>餈鮋<E9A488>𡁏<EFBFBD>?*嚗?```bash

<EFBFBD><EFBFBD> 1嚗𡁜銁 SAE <20><EFBFBD><E689B9><EFBFBD>"Webshell"銝剜<E98A9D>霂𤏪<E99C82><EFBFBD><E68692><EFBFBD><EFBFBD>嚗?curl http://<Python<6F>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>IP>:8000/health

<EFBFBD><EFBFBD> 2嚗𡁜銁<F0A1819C>𡒊垢摨𠉛鍂<F0A0899B><E98D82><EFBFBD><EFBFBD><E588BB>砌葉瘛餃<E7989B>瘚贝<E7989A>

echo "Testing extraction service connectivity..." curl -f http://<Python<6F>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>IP>:8000/health || echo "<22>?Cannot connect to extraction service"

<EFBFBD><EFBFBD> 3嚗帋蝙<E5B88B>?telnet 瘚贝<E7989A>蝡臬藁

telnet <Python<6F>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>IP> 8000


---

## 瘜冽<E7989C>鈭钅★銝𡒊<E98A9D>敹?
### <20>?<3F><>雿喳<E99BBF>頝?
#### 1. **<2A>𨅯<EFBFBD>隡睃<E99AA1>**

```dockerfile
# <20>?雿輻鍂憭𡁻𧫴畾菜<E795BE>撱?FROM python:3.11-slim as builder
# ... <20><>遣 ...
FROM python:3.11-slim
COPY --from=builder /opt/venv /opt/venv

# <20>?皜<><E79A9C>蝻枏<E89DBB>
RUN apt-get update && apt-get install -y ... \
    && rm -rf /var/lib/apt/lists/*

# <20>?雿輻鍂 .dockerignore
# <20><EFBFBD><EFBFBD><E692A0><EFBFBD><E695B9><EFBFBD><EFBFBD><EFBFBD>隞嗆<E99A9E><E59786><EFBFBD><EFBFBD><EFBFBD>𨅯<EFBFBD>

2. <EFBFBD><EFBFBD>𧋦蝞∠<EFBFBD>

# <20>?雿輻鍂霂凋<E99C82><E5878B>𣇉<EFBFBD><F0A38789>?v1.0.0  # 銝餌<E98A9D><E9A48C>?甈∠<E79488><E288A0>?銵乩<E98AB5><E4B9A9><EFBFBD>𧋦

# <20>?靽萘<E99DBD>憭帋葵<E5B88B><E891B5>𧋦
docker tag ... extraction-service:v1.0.0
docker tag ... extraction-service:v1.0
docker tag ... extraction-service:latest

# <20>?霈啣<E99C88><E595A3>䀹凒
# CHANGELOG.md
## v1.0.1 (2025-12-20)
- 靽桀<E99DBD>: PDF <20>𣂼<EFBFBD><EFBFBD>𧒄<EFBFBD><EFBFBD>
- 隡睃<E99AA1>: <20><EFBFBD><E8AAA9>𨅯<EFBFBD>雿梶妖 30%

3. 摰匧<EFBFBD><EFBFBD>惩𤐄

# <20>?<3F><>辣憭批<E686AD><E689B9>𣂼<EFBFBD>
MAX_FILE_SIZE = 50 * 1024 * 1024  # 50MB

@app.post("/extract/pdf")
async def extract_pdf(file: UploadFile):
    if file.size > MAX_FILE_SIZE:
        raise HTTPException(
            status_code=413,
            detail="File too large"
        )

# <20>?<3F><>辣蝐餃<E89D90>撉諹<E69289>
ALLOWED_TYPES = {'application/pdf', 'application/msword'}

if file.content_type not in ALLOWED_TYPES:
    raise HTTPException(
        status_code=415,
        detail="Unsupported file type"
    )

4. <EFBFBD><EFBFBD>隡睃<EFBFBD>

# <20>?撘<>郊憭<E9838A><E686AD>憭扳<E686AD>隞?import asyncio

async def extract_large_pdf(pdf_path: str):
    # 雿輻鍂撘<E98D82>郊 I/O
    async with aiofiles.open(pdf_path, 'rb') as f:
        content = await f.read()
    
    # <20>函瑪蝔𧢲<E89D94>銝剜<E98A9D>銵?CPU 撖<><E69296><EFBFBD>衤遙<E8A1A4>?    loop = asyncio.get_event_loop()
    text = await loop.run_in_executor(None, pymupdf_extract, content)
    
    return text

# <20>?餈墧𦻖瘙?from sqlalchemy.pool import NullPool

engine = create_engine(
    DATABASE_URL,
    poolclass=NullPool,  # SAE <20><EFBFBD><E887AC><EFBFBD>
    echo=False
)

<EFBFBD>?蝏嘥笆蝳<E7AC86>

1. <EFBFBD><EFBFBD>𨀣<EFBFBD><EFBFBD><EFBFBD>霈曉<EFBFBD>蝵穃𧑐<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>霂荔<EFBFBD>

# <20>?<3F>躰秤<E8BAB0>𡁏<EFBFBD><EFBFBD><E59A97>撖潸稲餈墧𦻖憭梯揖嚗?EXTRACTION_SERVICE_URL=http://extraction-service.internal:8000
EXTRACTION_SERVICE_URL=http://localhost:8000
EXTRACTION_SERVICE_URL=http://extraction-service:8000

# <20>?甇<><EFBFBD>𡁏<EFBFBD>嚗帋<E59A97> SAE <20><EFBFBD><E689B9>啗繮<E59597>𣇉<EFBFBD>摰𧼮𧑐<F0A7BCAE><F0A79190>
# SAE <20><EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?摨𠉛鍂霈輸䔮<E8BCB8>滨蔭
# 憭滚<E686AD><E6BB9A>曄內<E69B84>?VPC <20><><EFBFBD>霈輸䔮<E8BCB8><EFBFBD>"
EXTRACTION_SERVICE_URL=http://172.17.x.x:8000

**<2A><EFBFBD>**嚗?- SAE 摰硺<E691B0><E7A1BA>湔糓頝其蜓<E585B6><EFBFBD>嚗䔶<E59A97><E494B6>賭蝙<E8B3AD>?Docker <20>滚𦛚<E6BB9A>?- SAE <20>?K8s Service <20><EFBFBD><E7AC94><EFBFBD><E6BE86>𣳇<EFBFBD>蝵株<E89DB5><E6A0AA><EFBFBD>嚗䔶<E59A97><E494B6><EFBFBD>霈?- <20><>蝔喳戎<E596B3><E6888E>糓雿輻鍂 SAE <20><EFBFBD><E689B9>唳遬蝷箇<E89DB7> IP <20><EFBFBD>

2. *<EFBFBD><EFBFBD><EFBFBD><EFBFBD>譍葉蝖祉<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>煺縑<EFBFBD>?

# <20>?<3F>躰秤蝷箔<E89DB7>
ENV DATABASE_PASSWORD=my-secret-password

# <20>?甇<><EFBFBD>𡁏<EFBFBD>嚗𡁜銁 SAE <20><EFBFBD><E887AC><EFBFBD>銝剝<E98A9D>蝵?```

#### 3. **蝳<>迫雿輻鍂<E8BCBB>砍𧑐<E7A08D><F0A79190><EFBFBD><E8BEA3><EFBFBD><EFBFBD><EFBFBD><E7A18B>?*

```python
# <20>?<3F>躰秤蝷箔<E89DB7><EFBFBD><EFBFBD><EFBFBD><E588B8><EFBFBD>仃嚗?output_path = '/app/output/result.txt'
with open(output_path, 'w') as f:
    f.write(result)

# <20>?甇<><EFBFBD>𡁏<EFBFBD>嚗帋蝙<E5B88B>?/tmp 摮䀝葩<E4809D><EFBFBD>隞塚<E99A9E>蝏𤘪<E89D8F>銝𠹺<E98A9D><F0A0B9BA>?OSS
import tempfile
with tempfile.NamedTemporaryFile(mode='w', delete=False) as f:
    f.write(result)
# 銝𠹺<E98A9D><F0A0B9BA>?OSS嚗<53><EFBFBD>?oss2 摨橒<E691A8>
# <20><><EFBFBD>𤾸<EFBFBD><F0A4BEB8>支葩<E694AF><EFBFBD>隞?```

#### 4. **蝳<>迫雿輻鍂 :latest <20><><EFBFBD><EFBFBD>鈭抒㴓憓?*

```bash
# <20>?<3F>躰秤<E8BAB0>𡁏<EFBFBD><EFBFBD><E59A97>瘜訫<E7989C>皛𡄯<E79A9B>
image: extraction-service:latest

# <20>?甇<><EFBFBD>𡁏<EFBFBD><EFBFBD>祗銋匧<E98A8B><E58CA7><EFBFBD>𧋦嚗?image: extraction-service:v1.0.0

5. <EFBFBD><EFBFBD>典捆<EFBFBD><EFBFBD>靽格㺿隞<EFBFBD><EFBFBD>

# <20>?<3F>躰秤<E8BAB0><EFBFBD><EFBFBD><EFBFBD><EFBFBD><E588B8><EFBFBD>仃嚗?# SAE Webshell <20>?vi /app/main.py

# <20>?甇<><EFBC86><E7989A>嚗?# 1. <20>砍𧑐靽格㺿隞<E3BABF><E99A9E>
# 2. <20>滚遣<E6BB9A>𨅯<EFBFBD>
# 3. <20><EFBFBD><E588B8><EFBFBD> ACR
# 4. SAE 銝剜凒<E5899C><EFBFBD><E59C88><EFBFBD><E8AE90>?```

#### 6. **蝳<>迫雿輻鍂<E8BCBB>𣳇<EFBFBD>憓鮋鵭<E9AE8B><E9B5AD><EFBFBD><EFBFBD><E69285><EFBFBD>**

```python
# <20>?<3F>躰秤蝷箔<E89DB7><EFBFBD><E59A97>摮䀹<E691AE>瞍𧶏<E79E8D>
CACHE = {}  # <20><EFBFBD>蝻枏<E89DBB>嚗峕<E59A97><E5B395>𣂼<EFBFBD><F0A382BC>?
@app.post("/extract/pdf")
async def extract_pdf(file: UploadFile):
    key = file.filename
    if key not in CACHE:
        CACHE[key] = extract(file)  # <20><><EFBFBD>隡𡁏<E99AA1>蝏剖<E89D8F><E58996><EFBFBD>
    return CACHE[key]

# <20>?甇<><EFBFBD>𡁏<EFBFBD>嚗帋蝙<E5B88B><EFBFBD><E586BD>𣂼捆<F0A382BC><EFBFBD>蝻枏<E89DBB>
from functools import lru_cache

@lru_cache(maxsize=100)  # <20><>憭𡁶<E686AD>摮?100 銝芰<E98A9D><E88AB0>?def extract_with_cache(file_hash: str):
    return extract(file_hash)

7. *<EFBFBD>迫敹賜裦 /tmp <20><EFBFBD><E6A180><EFBFBD>之撠誯<E692A0><E8AAAF>?

# <20>𩤃<EFBFBD> 瘜冽<E7989C>嚗锭AE 摰孵膥<E5ADB5>?/tmp <20><EFBFBD><E6A180>𡁜虜<F0A1819C>匧之撠誯<E692A0><E8AAAF><EFBFBD>憒?1-2GB嚗?# 憭<><E686AD>憭扳<E686AD>隞嗅<E99A9E><EFBFBD>◆皜<E29786><E79A9C>銝湔𧒄<E6B994><F0A79284>

import os
import tempfile

async def extract_large_pdf(file: UploadFile):
    # 靽嘥<E99DBD><E598A5>唬葩<E594AC><EFBFBD>隞?    with tempfile.NamedTemporaryFile(delete=False, suffix='.pdf') as tmp:
        content = await file.read()
        tmp.write(content)
        tmp_path = tmp.name
    
    try:
        # 憭<><E686AD><EFBFBD><EFBFBD>
        result = extract_pdf_pymupdf(tmp_path)
        return result
    finally:
        # <20>?<3F>喲睸嚗𡁜<E59A97>憿餅<E686BF><E9A485><EFBFBD><EFBFBD><EFBFBD>隞?        if os.path.exists(tmp_path):
            os.unlink(tmp_path)

<EFBFBD><EFBFBD> <20><><EFBFBD>

A. 摰峕㟲<E5B395>?requirements.txt嚗<74>𧫴畾?嚗?

# Web 獢<>沲
fastapi==0.115.5
uvicorn[standard]==0.32.1
python-multipart==0.0.20

# <20><><EFBFBD>𣂼<EFBFBD>
PyMuPDF==1.24.14
mammoth==1.8.0
python-docx==1.1.2

# <20>唳旿憭<E697BF><E686AD>
polars==1.17.1
numpy==1.26.4

# 颲<>𨭌撌亙<E6928C>
langdetect==1.0.9
chardet==5.2.0
aiofiles==23.2.1

# <20>唳旿摨?sqlalchemy==2.0.25
asyncpg==0.29.0

# <20><EFBFBD>鈭?OSS
oss2==2.18.3

# <20><EFBFBD><E4BA99>𣬚<EFBFBD><F0A3AC9A>?python-json-logger==2.0.7
psutil==5.9.8

B. Dockerfile 摰峕㟲<E5B395>?

<EFBFBD><EFBFBD><EFBFBD><EFBFBD>齿<EFBFBD> <EFBFBD><EFBFBD>遣 Docker <20>𨅯<EFBFBD> - 甇仿炊 1

C. <20>砍𧑐瘚贝<E7989A><E8B49D>𡁏𧋦

#!/bin/bash
# test-local.sh

echo "Building Docker image..."
docker build -t extraction-service:test .

echo "Starting container..."
docker run -d \
  --name extraction-test \
  -p 8000:8000 \
  -e DATABASE_URL="postgresql://user:pass@host:5432/db" \
  extraction-service:test

echo "Waiting for service to start..."
sleep 10

echo "Testing health endpoint..."
curl http://localhost:8000/health

echo "Testing PDF extraction..."
curl -X POST \
  -F "file=@test.pdf" \
  http://localhost:8000/extract/pdf

echo "Cleaning up..."
docker stop extraction-test
docker rm extraction-test

echo "Done!"

D. <20><EFBFBD><E8A9A8><EFBFBD><EFBFBD>暹𦻖


<EFBFBD>㴓 敹恍<E695B9><EFBFBD><E7AC94>?

撣貊鍂<EFBFBD>賭誘

# <20><><EFBFBD>𨅯<EFBFBD>
docker build -t extraction-service:v1.0 .

# <20><EFBFBD><E588B8><EFBFBD><EFBFBD>?docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.0

# <20><EFBFBD> SAE <20><EFBFBD>
# SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F><EFBFBD>

# <20>滚鍳 SAE 摨𠉛鍂
# SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>滚鍳

# 瘚贝<E7989A><E8B49D><EFBFBD><EFBFBD>餈鮋<E9A488>𡁏<EFBFBD>?curl http://extraction-service.internal:8000/health

# <20><EFBFBD>摰孵膥韏<E886A5><E99F8F>
docker stats extraction-service

<EFBFBD>喲睸<EFBFBD>滨蔭

<EFBFBD>滨蔭憿? <EFBFBD><EFBFBD><EFBFBD>? 霂湔<EFBFBD>
CPU 1<EFBFBD>? <EFBFBD><EFBFBD><EFBFBD>滨蔭
<EFBFBD><EFBFBD><EFBFBD> 2GB 銝滚鉄 Nougat
摰硺<EFBFBD><EFBFBD>? 1-3 <EFBFBD>芸𢆡撘寞<EFBFBD>找撓蝻?
<EFBFBD>𧒄<EFBFBD>園𡢿 300蝘? 憭扳<EFBFBD>隞嗅<EFBFBD><EFBFBD>?
<EFBFBD>亙熒璉<EFBFBD><EFBFBD>? 30蝘? <EFBFBD><EFBFBD>撱嗉<EFBFBD>
Worker <20><EFBFBD> 2 Uvicorn workers

<EFBFBD><EFBFBD>﹝蝏湔擪嚗?- 憒<><E68692><EFBFBD><EFBFBD><E6A185>硋遣霈殷<E99C88>霂瑁<E99C82>蝟餅<E89D9F><E9A485><EFBFBD><EFBFBD>

  • <EFBFBD><EFBFBD><EFBFBD>擧凒<EFBFBD><EFBFBD>2025-12-13
  • 銝𧢲活摰⊥䰻嚗?025-03-13