Files
AIclinicalresearch/docs/05-部署文档/04-Python微服务-SAE容器部署指南.md
HaHafeng 1b53ab9d52 feat(aia): Complete AIA V2.0 with universal streaming capabilities
Major Changes:
- Add StreamingService with OpenAI Compatible format
- Upgrade Chat component V2 with Ant Design X integration
- Implement AIA module with 12 intelligent agents
- Update API routes to unified /api/v1 prefix
- Update system documentation

Backend (~1300 lines):
- common/streaming: OpenAI Compatible adapter
- modules/aia: 12 agents, conversation service, streaming integration
- Update route versions (RVW, PKB to v1)

Frontend (~3500 lines):
- modules/aia: AgentHub + ChatWorkspace (100% prototype restoration)
- shared/Chat: AIStreamChat, ThinkingBlock, useAIStream Hook
- Update API endpoints to v1

Documentation:
- AIA module status guide
- Universal capabilities catalog
- System overview updates
- All module documentation sync

Tested: Stream response verified, authentication working
Status: AIA V2.0 core completed (85%)
2026-01-14 19:15:01 +08:00

1351 lines
37 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Python 敺格<E695BA><E6A0BC>?SAE 摰孵膥<E5ADB5>函蔡摰<E894A1><E691B0><EFBFBD><EFBFBD><EFBFBD>
**<EFBFBD><EFBFBD><EFBFBD><EFBFBD>𧋦**: v1.1 (靽桀<E99DBD><E6A180><EFBFBD><EFBFBD><EFBFBD><EFBFBD><E595A3>䔶葩<E494B6><EFBFBD>隞園䔮憸?
**<EFBFBD>𥕦遣<EFBFBD>園𡢿**: 2025-12-13
**<EFBFBD><EFBFBD><EFBFBD>𦒘耨霈?*: 2025-12-13
**<EFBFBD><EFBFBD><EFBFBD><EFBFBD>凒**: AIclinicalresearch 撟喳蝱 - Python 敺格<E695BA><E6A0BC><EFBFBD>extraction_service嚗?
**<EFBFBD><EFBFBD>霂餉<EFBFBD>?*: 餈鞟輕撌亦<E6928C><EFBFBD><E692A3><EFBFBD><EFBFBD>蝡臬<E89DA1><E887AC>穃極蝔见<E89D94>
**v1.1 <20>湔鰵<E6B994><EFBFBD>**:
- <20>?靽桀<E99DBD>嚗𡁜<E59A97>蝵穃𧑐<E7A983><F0A79190>雿輻鍂 SAE <20><EFBFBD><E689B9>唳遬蝷箇<E89DB7><E7AE87><EFBFBD> IP嚗<50><E59A97><EFBFBD>𨀣<EFBFBD><F0A880A3><EFBFBD>嚗?- <20>?隡睃<E99AA1>嚗鋽ockerfile 蝟餌<E89D9F>靘肽<E99D98>霂湔<E99C82>嚗ēibmupdf-dev <20><EFBFBD><EFBFBD>
- <20>?<3F><EFBFBD>嚗𡁶靽?/tmp <20><EFBFBD><E6A180><EFBFBD><EFBFBD><EFBFBD><E4B98B>辣銝湔𧒄摮睃<E691AE>嚗?- <20>?摰<><E691B0>嚗𡁜<E59A97><F0A1819C><EFBFBD><EFBFBD><E99C82>蝔见<E89D94><E8A781>烐綉<E78390><E7B689><EFBFBD>
---
## <20><> <20><><EFBFBD><EFBFBD>
1. [銝箔<EFBFBD><EFBFBD><EFBFBD>㗇𥋘 SAE 摰孵膥<E5ADB5>函蔡](#銝箔<E98A9D><EFBFBD><E98A8B>㗇𥋘-sae-摰孵膥<E5ADB5>函蔡)
2. [<5B>函蔡<E587BD><EFBFBD><E59786>霄(#<23>函蔡<E587BD><EFBFBD><E59786>?
3. [<EFBFBD>滨蔭<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>](#<23>滨蔭<E6BBA8><E894AD><EFBFBD><EFBFBD><E79A9C>)
4. [Python <20>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>](#python-<2D>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>)
5. [靘肽<EFBFBD>隡睃<EFBFBD>蝑𣇉裦](#靘肽<E99D98>隡睃<E99AA1>蝑𣇉裦)
6. [<EFBFBD><EFBFBD>遣 Docker <20>𨅯<EFBFBD>](#<23><>遣-docker-<2D>𨅯<EFBFBD>)
7. [<EFBFBD>函蔡<EFBFBD>?SAE](#<23>函蔡<E587BD>?sae)
8. [瘚贝<E7989A>銝𡡞<E98A9D><EFBFBD>(#瘚贝<EFBFBD>銝𡡞<EFBFBD>霂?
9. [<5B>烐綉銝𡒊輕<F0A1928A>也(#<23>烐綉銝𡒊輕<F0A1928A>?
10. [<EFBFBD><EFBFBD><EFBFBD><EFBFBD>埝䰻](#<23><><EFBFBD><EFBFBD>埝䰻)
11. [瘜冽<E7989C>鈭钅★銝𡒊<E98A9D>敹䀉(#瘜冽<EFBFBD>鈭钅★銝𡒊<EFBFBD>敹?
---
## 銝箔<E98A9D><EFBFBD><E98A8B>㗇𥋘 SAE 摰孵膥<E5ADB5>函蔡
### <20>?SAE 摰孵膥<E5ADB5>函蔡 vs. SAE Python 餈鞱<E9A488><E99EB1>?
| 撖寞<E69296>蝏游漲 | SAE Python 餈鞱<E9A488><E99EB1>?| SAE 摰孵膥<E5ADB5>函蔡 (<28><EFBFBD>) |
|---------|-----------------|------------------|
| **蝟餌<E89D9F>靘肽<E99D98>** | <20>?<3F><EFBFBD>摰㕑<E691B0>蝟餌<E89D9F>摨?| <20>?摰<><E691B0><EFBFBD>舀綉 |
| **憭齿<E686AD>靘肽<E99D98>** | <20>?PyMuPDF/OpenCV <20>仿<EFBFBD> | <20>?摰𣬚<E691B0><F0A3AC9A><EFBFBD> |
| **<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?* | <20>𩤃<EFBFBD> 鈭睲<E988AD><E79DB2>峕𧋦<E5B395>啣虾<E595A3><EFBFBD><E8B3AD>?| <20>?<3F>砍𧑐頝煾<E9A09D>?= 鈭睲<E988AD>頝煾<E9A09D>?|
| **Nougat (Torch)** | <20>?<3F><>𧋦<EFBFBD><EFBFBD>憌𡡞埯擃?| <20>?頧餅𠹭<E9A485><EFBFBD> |
| **<EFBFBD>函蔡<EFBFBD><EFBFBD>** | 銝𠹺<E98A9D> ZIP <20>?| <20><EFBFBD>?Docker <20>𨅯<EFBFBD> |
| **<EFBFBD>臬𢆡<EFBFBD>笔漲** | 敹恬<E695B9>< 5蝘𡜐<E89D98> | 颲<>翰嚗?0-20蝘𡜐<E89D98> |
| **餈鞟輕憭齿<E686AD>摨?* | 雿?| 銝?|
| **<EFBFBD><EFBFBD>摨?* | <20>?銝齿綫<E9BDBF>?| <20>?撘箇<E69298><E7AE87><EFBFBD> |
### <20><20><EFBFBD><E8A9A8><EFBFBD>
#### 1. **蝟餌<E89D9F>蝥找<E89DA5>韏𣇉撩憭梧<E686AD><E6A2A7>游𦶢<E6B8B8><EFBFBD>嚗?*
```python
# <20><EFBFBD><EFBFBD><E99A9E>雿輻鍂鈭<E98D82><E988AD>鈭𥕦<E988AD>嚗?import fitz # PyMuPDF <20>?靘肽<E99D98> libmupdf.so, libfreetype.so
import cv2 # OpenCV <20>?靘肽<E99D98> libGL.so.1, libgthread-2.0.so
import polars # Polars <20>?靘肽<E99D98> libgomp.so
```
**SAE Python 餈鞱<E9A488><E99EB1>?*嚗?```bash
<EFBFBD>?<3F><EFBFBD>靘𥟇<E99D98><F0A59F87>?Python <20><EFBFBD>
<EFBFBD>?<3F><EFBFBD><E4ADBE><EFBFBD> apt-get install
<EFBFBD>?餈鞱<E9A488><E99EB1>嗆𥁒<E59786><EFBFBD>ImportError: libGL.so.1: cannot open shared object file
```
**SAE 摰孵膥<E5ADB5>函蔡**嚗?```dockerfile
<EFBFBD>?Dockerfile 銝剛䌊<E5899B><EFBFBD><EFBFBD><E98B86>
RUN apt-get update && apt-get install -y \
libgl1-mesa-glx \
libglib2.0-0 \
libgomp1
```
#### 2. **<2A><EFBFBD><EFBFBD><E691B0><EFBFBD>舀綉**
```
<EFBFBD>砍𧑐撘<EFBFBD><EFBFBD>𤑳㴓憓?= Docker <20>𨅯<EFBFBD> = SAE <20>煺漣<E785BA><EFBFBD>
```
- <20>典銁<E585B8>砍𧑐 Docker 銝剛<E98A9D><E5899B><EFBFBD>嚗峕綫<E5B395>?SAE 撠曹<E692A0>摰朞<E691B0>頝煾<E9A09D>?- 瘝⊥<E7989D>"<22>砍𧑐憟賜鍂<E8B39C><E98D82><EFBFBD>銝𦠜𥁒<F0A6A09C>?<3F><>䔮憸?
#### 3. **<2A><EFBFBD><E68B99>批撩**
```
<EFBFBD>芣䔉<EFBFBD><EFBFBD><EFBFBD><EFBFBD>
<20><EFBFBD> 瘛餃<E7989B> Nougat OCR (<28><>閬?PyTorch + GPU <20><EFBFBD>)
<20><EFBFBD> 瘛餃<E7989B><E9A483><EFBFBD><EFBFBD><E686B8><EFBFBD>?(<28><>閬?OpenCV)
<20><EFBFBD> 瘛餃<E7989B><E9A483><EFBFBD><E6B8B8><EFBFBD><EFBFBD><EFBFBD> (<28><><EFBFBD>凒憭𡁶頂蝏笔<E89D8F>)
<20><EFBFBD> 摰孵膥<E5ADB5>函蔡<E587BD><EFBFBD>頧餅𠹭<E9A485><EFBFBD>
```
#### 4. **餈鞟輕蝏煺<E89D8F>**
```
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>嚗? <20><EFBFBD> <20>滨垢 Nginx <20>?SAE 摰孵膥
<20><EFBFBD> <20>𡒊垢 Node.js <20>?SAE 摰孵膥
<20><EFBFBD> Python <20>滚𦛚 <20>?SAE 摰孵膥 <20>?(蝏煺<E89D8F>蝞∠<E89D9E>)
```
---
## <20>函蔡<E587BD><EFBFBD><E59786>?
```
<EFBFBD>𢞖<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?<3F>? <20><EFBFBD>鈭烐沲<E78390>? <20>?<3F>? <20>?<3F>? <20>𢞖<EFBFBD><F0A29E96><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>𢞖<EFBFBD><F0A29E96><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>?<3F>? <20>? SAE (<28>𡒊垢) <20>?<3F>𣂼<EFBFBD>蝵爗<E89DB5> <20>? SAE (Python 敺格<E695BA><E6A0BC>? <20>? <20>?<3F>? <20>? <20>? <20>? <20>? <20>?<3F>? <20>? Node.js <20>? <20>? <20>𢞖<EFBFBD><F0A29E96><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>? <20>?<3F>? <20>? Backend <20>? <20>? <20>? Docker 摰孵膥: <20>? <20>? <20>?<3F>? <20>? <20>? <20>? <20>? - FastAPI <20>? <20>? <20>?<3F>? <20>? <20>? <20>? <20>? - PyMuPDF <20>? <20>? <20>?<3F>? <20>? <20>? <20>? <20>? - Polars <20>? <20>? <20>?<3F>? <20><EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>? <20>? - Mammoth <20>? <20>? <20>?<3F>? <20>? <20>? <20><EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>? <20>?<3F>? <20>? <20><EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>?<3F>? <20>? <20>?<3F>? <20><EFBFBD><E98EBF><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? RDS PostgreSQL 15 <20>?<3F>? <20><EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? OSS (<28><>﹝摮睃<E691AE>) <20>?<3F><EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?```
**<2A>喲睸<E596B2>?*嚗?- Python 敺格<E695BA><E6A0BC><EFBFBD> Node.js <20>𡒊垢<F0A1928A><EFBFBD>蝵脣銁 SAE 銝𠺪<E98A9D><F0A0BAAA><EFBFBD> VPC嚗?- <20><EFBFBD><E69C9E><EFBFBD><EFBFBD><EFBFBD>帋縑嚗<E7B891><EFBFBD>?< 5ms嚗?- <20>曹澈 RDS <20>?OSS 韏<><E99F8F>
---
## <20>滨蔭<E6BBA8><E894AD><EFBFBD><EFBFBD><E79A9C>
### <20>?敹<><E695B9><EFBFBD><E99F8F>
| 韏<><E99F8F>蝐餃<E89D90> | <20>滨蔭撱箄悅 | 憸<>摯韐寧鍂 | <20><EFBFBD>?|
|---------|---------|---------|-----|
| **SAE 摨𠉛鍂** | 1<>?G / 1摰硺<E691B0> | ~100<30>?<3F>?| 餈鞱<E9A488> Python <20>滚𦛚 |
| **摰孵膥<E5ADB5>𨅯<EFBFBD>隞枏<E99A9E>** | <20><EFBFBD>鈭?ACR 銝芯犖<E88AAF>?| <20>滩晶嚗?GB嚗?| 摮睃<E691AE> Docker <20>𨅯<EFBFBD> |
| **OSS 摮睃<E691AE>** | 撌脫<E6928C><EFBFBD><E59A97><EFBFBD><EFBFBD> | 0<><30><EFBFBD>憓鮋<E68693>嚗?| <20><>﹝摮睃<E691AE> |
| **RDS PostgreSQL** | 撌脫<E6928C><EFBFBD><E59A97><EFBFBD><EFBFBD> | 0<>?| <20>唳旿摨?|
### <20>?頧臭辣<E887AD><E8BEA3><EFBFBD>
```bash
# <20>砍𧑐撘<F0A79190><E69298>烐㦤<E78390><EFBFBD><EFBFBD><E996AC>鋆?- Docker Desktop
- <20><EFBFBD>鈭?CLI嚗<49><EFBFBD><EFBFBD>
# 銝漤<E98A9D><EFBFBD>銁 SAE 銝𠰴<E98A9D><EFBFBD>遙雿蓥<E99BBF>镼選<E995BC>摰孵膥撌脣<E6928C><E884A3><EFBFBD>
```
### <20>?韐血噡銝擧<E98A9D><E693A7>?
- <20><EFBFBD>鈭𤏸揭<F0A48FB8><EFBFBD>撌脫<E6928C>嚗?- 摰孵膥<E5ADB5>𨅯<EFBFBD>隞枏<E99A9E>霈輸䔮<E8BCB8><E494AE><EFBFBD>
- SAE 摨𠉛鍂<F0A0899B>𥕦遣<F0A595A6><E981A3><EFBFBD>
---
## Python <20>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>
### <20><> 敶枏<E695B6><E69E8F>滚𦛚璁<F0A69B9A><E79281>
#### <20>滚𦛚 1: extraction_service嚗<65><E59A97><EFBFBD><E78DA2><EFBFBD><EFBFBD>
**雿滨蔭**: `AIclinicalresearch/extraction_service/`
**<2A><EFBFBD>?*:
- PKB 璅<E79285>: 銝𠹺<E98A9D><F0A0B9BA><EFBFBD><EFBFBD>?Dify <20><EFBFBD><E3B5AA><EFBFBD><EFBFBD><EFBFBD>𡝗<EFBFBD><F0A19D97>?- ASL 璅<E79285>: <20>𣂼<EFBFBD> PDF <20><EFBFBD><E586BD><EFBFBD>瘛勗漲<E58B97><E6BCB2>
- DC 璅<E79285>: <20>𣂼<EFBFBD> Excel/CSV <20>唳旿
**<2A><EFBFBD><E8A9A8><EFBFBD>辣**:
```
extraction_service/
<EFBFBD><EFBFBD><EFBFBD><EFBFBD> main.py # FastAPI <20>亙藁
<EFBFBD><EFBFBD><EFBFBD><EFBFBD> requirements.txt # 靘肽<E99D98><E882BD>𡑒”
<EFBFBD><EFBFBD><EFBFBD><EFBFBD> services/
<EFBFBD>? <20><EFBFBD><E98EBF><EFBFBD> pdf_extractor.py # PDF <20>𣂼<EFBFBD><EFBFBD><E59A97>摨血膥嚗?<3F>? <20><EFBFBD><E98EBF><EFBFBD> pymupdf_extractor.py # PyMuPDF 摰䂿緵
<EFBFBD>? <20><EFBFBD><E98EBF><EFBFBD> nougat_extractor.py # Nougat OCR 摰䂿緵
<EFBFBD>? <20><EFBFBD><E98EBF><EFBFBD> docx_extractor.py # Word <20>𣂼<EFBFBD>
<EFBFBD>? <20><EFBFBD><E5A999><EFBFBD> txt_extractor.py # 蝥舀<E89DA5><E88880><EFBFBD><E7A586>?<3F><EFBFBD><E5A999><EFBFBD> operations/
<20><EFBFBD><E5A999><EFBFBD> fillna_operations.py # <20>唳旿皜<E697BF><E79A9C>嚗㇊olars嚗?```
**<2A>喲睸蝡舐<E89DA1>**:
```python
POST /extract/pdf # PDF <20>𣂼<EFBFBD>
POST /extract/docx # Word <20>𣂼<EFBFBD>
POST /extract/txt # <20><>𧋦<EFBFBD>𣂼<EFBFBD>
POST /operations/fillna # <20>唳旿皜<E697BF><E79A9C>
```
### <20><> 靘肽<E99D98><E882BD><EFBFBD><EFBFBD>
#### 敶枏<E695B6> `requirements.txt` <20><>捆嚗?
```txt
fastapi==0.115.5
uvicorn[standard]==0.32.1
python-multipart==0.0.20
PyMuPDF==1.24.14
pdfplumber==0.11.4
nougat-ocr==0.1.17
torch==2.1.0
torchvision==0.16.0
mammoth==1.8.0
python-docx==1.1.2
langdetect==1.0.9
chardet==5.2.0
polars==1.17.1
numpy==1.26.4
```
#### 靘肽<E99D98>憭批<E686AD><EFBFBD>摯嚗?
| <20><><EFBFBD> | 憭批<E686AD> | <20><EFBFBD>?| <20>臬炏敹<E7828F><E695B9> |
|-----|------|-----|---------|
| **PyMuPDF** | ~50MB | PDF <20>𣂼<EFBFBD><EFBFBD>瓲敹<E793B2><E695B9> | <20>?敹<><E695B9> |
| **pdfplumber** | ~10MB | PDF 銵冽聢<E586BD>𣂼<EFBFBD> | <20>𩤃<EFBFBD> <20><EFBFBD><EFBFBD><E39A81><EFBFBD>𧊋雿輻鍂嚗?|
| **nougat-ocr** | ~300MB | 摮行钟霈箸<E99C88> OCR | <20>𩤃<EFBFBD> <20>嗆挾<E59786><EFBFBD><EFBFBD><E996AB><EFBFBD><EFBFBD><EFBFBD> |
| **torch** | ~800MB | Nougat 靘肽<E99D98> | <20>𩤃<EFBFBD> <20>嗆挾<E59786>?|
| **torchvision** | ~100MB | Nougat 靘肽<E99D98> | <20>𩤃<EFBFBD> <20>嗆挾<E59786>?|
| **mammoth** | ~5MB | Word <20>𣂼<EFBFBD> | <20>?敹<><E695B9> |
| **python-docx** | ~3MB | Word <20>𣂼<EFBFBD> | <20>?敹<><E695B9> |
| **polars** | ~50MB | <20>唳旿皜<E697BF><E79A9C> | <20>?敹<><E695B9> |
| **numpy** | ~20MB | <20><EFBFBD>潸恣蝞?| <20>?敹<><E695B9> |
| **fastapi** | ~10MB | Web 獢<>沲 | <20>?敹<><E695B9> |
| **uvicorn** | ~5MB | ASGI <20>滚𦛚<E6BB9A>?| <20>?敹<><E695B9> |
| **<2A><EFBFBD>** | ~10MB | 颲<>𨭌摨?| <20>?敹<><E695B9> |
| **<2A>餉恣嚗<E681A3>鉄 Nougat嚗?* | **~1.4GB** | - | - |
| **<2A>餉恣嚗<E681A3><E59A97><EFBFBD>?Nougat嚗?* | **~163MB** | - | - |
---
## 靘肽<E99D98>隡睃<E99AA1>蝑𣇉裦
### <20><20>嗆挾 1嚗𡁏<E59A97>撠誩<E692A0><E8AAA9>函蔡嚗<E894A1><EFBFBD>鞟鍂鈭𡡞<E988AD>甈⊿<E79488>蝵莎<E89DB5>
**<2A><EFBFBD>**: 敹恍<E695B9><EFBFBD>蝥選<E89DA5>撉諹<E69289><E8ABB9><EFBFBD><E8A9A8><EFBFBD>
**蝑𣇉裦**:
- <20>?靽萘<E99DBD> PyMuPDF嚗<46>瓲敹?PDF <20>𣂼<EFBFBD>嚗?- <20>?靽萘<E99DBD> Mammoth/python-docx嚗Áord <20>𣂼<EFBFBD>嚗?- <20>?靽萘<E99DBD> Polars嚗<73><EFBFBD><EFBFBD>瘣梹<E798A3>
- <20>?<3F><>𧒄蝘駁膄 Nougat嚗<74><E59A97>蝘臬之嚗䔶蝙<E494B6><EFBFBD><E588B8><EFBFBD><EFBFBD>嚗?
**隡睃<E99AA1><E79D83>𡒊<EFBFBD> `requirements.txt`**:
```txt
# Web 獢<>
fastapi==0.115.5
uvicorn[standard]==0.32.1
python-multipart==0.0.20
# <20><><EFBFBD>𣂼<EFBFBD><EFBFBD>瓲敹<E793B2><E695B9>
PyMuPDF==1.24.14
mammoth==1.8.0
python-docx==1.1.2
# <20>唳旿憭<E697BF><E686AD>
polars==1.17.1
numpy==1.26.4
# 颲<>𨭌撌亙<E6928C>
langdetect==1.0.9
chardet==5.2.0
# <20><EFBFBD><E4BA99>𣬚<EFBFBD><F0A3AC9A>?python-json-logger==2.0.7
```
**<2A>𨅯<EFBFBD>憭批<E686AD><EFBFBD>摯**: ~500MB嚗<42>鉄 Python <20><EFBFBD><E7AE87>𨅯<EFBFBD>嚗?
**隞<><E99A9E>靽格㺿**:
```python
# services/pdf_extractor.py
# 瘜券<E7989C><E588B8>?Nougat <20><EFBFBD><EFBFBD><E99A9E>
# from .nougat_extractor import extract_pdf_nougat, check_nougat_available
async def extract_pdf(pdf_path: str, filename: str):
"""PDF <20>𣂼<EFBFBD><EFBFBD>𧫴畾?嚗帋<E59A97> PyMuPDF嚗?""
# 璉<>瘚贝祗閮<E7A597><E996AE><EFBFBD><EFBFBD><EFBFBD>? language = detect_language(pdf_path)
is_academic = detect_academic_paper(pdf_path)
# <20>嗆挾1嚗𡁶凒<F0A181B6>乩蝙<E4B9A9>?PyMuPDF
text = extract_pdf_pymupdf(pdf_path)
# <20>嗆挾2嚗𡁜虾隞亙<E99A9E><E4BA99>?Nougat <20>滨漣<E6BBA8><EFBFBD>
# if language == 'english' and is_academic:
# try:
# if check_nougat_available():
# text = extract_pdf_nougat(pdf_path)
# except:
# text = extract_pdf_pymupdf(pdf_path) # <20>滨漣
return {
'text': text,
'method': 'pymupdf',
'language': language,
'is_academic': is_academic
}
```
### <20><20>嗆挾 2嚗𡁜<E59A97><F0A1819C><EFBFBD>蝵莎<E89DB5><E88E8E>芣䔉<E88AA3><E49489><EFBFBD>𧒄嚗?
**<2A>嗆㦤**:
- 敶梶鍂<E6A2B6><EFBFBD><EFBFBD><EFBFBD><E3989A><EFBFBD>航捏<E888AA><E68D8F><EFBFBD><EFBFBD>𤥁捶<F0A4A581><EFBFBD>雿單𧒄
- <20>㕑雲憭毺<E686AD> GPU 韏<><E99F8F><EFBFBD>?
**蝑𣇉裦**:
- <20>?<3F><EFBFBD> Nougat + Torch
- <20>?雿輻鍂 GPU 摰硺<E691B0>嚗𠄎AE <20><EFBFBD>銝齿𣈲<E9BDBF>?GPU嚗屸<E59A97><EFBFBD><EFBFBD>?ECS嚗?
**摰峕㟲<E5B395>?`requirements.txt`**:
```txt
# <20><EFBFBD><EFBCB7><EFBFBD>靘肽<E99D98><EFBFBD><E59A97><EFBFBD>?Nougat嚗?fastapi==0.115.5
uvicorn[standard]==0.32.1
python-multipart==0.0.20
PyMuPDF==1.24.14
pdfplumber==0.11.4
nougat-ocr==0.1.17
torch==2.1.0
torchvision==0.16.0
mammoth==1.8.0
python-docx==1.1.2
langdetect==1.0.9
chardet==5.2.0
polars==1.17.1
numpy==1.26.4
```
**<2A>𨅯<EFBFBD>憭批<E686AD><EFBFBD>摯**: ~2GB
---
## <20><>遣 Docker <20>𨅯<EFBFBD>
### 甇仿炊 1嚗𡁜<E59A97>撱箔<E692B1><E7AE94>𣇉<EFBFBD> Dockerfile
<EFBFBD>?`extraction_service/` <20><EFBFBD>銝见<E98A9D>撱?`Dockerfile`:
```dockerfile
# ========================================
# 憭𡁻𧫴畾菜<E795BE>撱綽<E692B1><E7B6BD><EFBFBD><E8AAA9>𨅯<EFBFBD>雿梶妖
# ========================================
# <20>嗆挾 1: <20><><EFBFBD>嗆挾嚗<E68CBE><E59A97><EFBFBD><E98B86>韏吔<E99F8F>
FROM python:3.11-slim as builder
# 霈曄蔭撌乩<E6928C><E4B9A9><EFBFBD>
WORKDIR /app
# 摰㕑<E691B0>蝟餌<E89D9F>靘肽<E99D98><EFBFBD><E59A97>撱箸𧒄<E7AEB8><F0A79284><EFBFBD><E996AC>
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc \
g++ \
make \
libffi-dev \
libssl-dev \
&& rm -rf /var/lib/apt/lists/*
# 憭滚<E686AD>靘肽<E99D98><E882BD><EFBFBD>
COPY requirements.txt .
# 摰㕑<E691B0> Python 靘肽<E99D98><E882BD><EFBFBD><E59597>毺㴓憓?RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r requirements.txt
# ========================================
# <20>嗆挾 2: 餈鞱<E9A488><E99EB1>嗆挾嚗<E68CBE><E59A97>撠誩<E692A0><E8AAA9>𨅯<EFBFBD>嚗?# ========================================
FROM python:3.11-slim
# 霈曄蔭撌乩<E6928C><E4B9A9><EFBFBD>
WORKDIR /app
# 摰㕑<E691B0>餈鞱<E9A488><E99EB1><EFBFBD>韏吔<E99F8F>蝟餌<E89D9F>蝥批<E89DA5> + <20>嗅躹<E59785>唳旿嚗?RUN apt-get update && apt-get install -y --no-install-recommends \
# PyMuPDF 靘肽<E99D98>
# 瘜剁<E7989C>libmupdf-dev <20>𡁜虜<F0A1819C><EFBFBD>蝻𤥁<E89DBB>嚗俰ip 摰㕑<E691B0><E39591>?PyMuPDF wheel <20><><EFBFBD>芸蒂<E88AB8><EFBFBD><E586BD><EFBFBD>
# 靽萘<E99DBD><EFBFBD><E691B0>銝箔<E98A9D><E7AE94><EFBFBD><EFBFBD><E68692><EFBFBD>西澈<E8A5BF><EFBFBD>霂閧宏<E996A7><EFBFBD>撉諹<E69289>
libmupdf-dev \
libfreetype6 \
libjpeg62-turbo \
libopenjp2-7 \
# Polars 靘肽<E99D98>
libgomp1 \
# <20><EFBFBD>撌亙<E6928C>
curl \
# <20>嗅躹<E59785>唳旿
tzdata \
&& rm -rf /var/lib/apt/lists/*
# <20>𩤃<EFBFBD> 蝏煺<E89D8F><E785BA>嗅躹嚗鋫sia/Shanghai
ENV TZ=Asia/Shanghai
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
# 蝖桐<E89D96>銝湔𧒄<E6B994><EFBFBD><E6A180><EFBFBD><EFBFBD><EFBFBD><E4B98B>辣銝𠹺<E98A9D><F0A0B9BA><EFBFBD><EFBFBD><E996AC>
RUN mkdir -p /tmp && chmod 1777 /tmp
# 隞擧<E99A9E>撱粹𧫴畾萄<E795BE><E89084><EFBFBD><E59789>毺㴓憓?COPY --from=builder /opt/venv /opt/venv
# 憭滚<E686AD>摨𠉛鍂隞<E98D82><E99A9E>
COPY . .
# 霈曄蔭<E69B84><EFBFBD><E887AC><EFBFBD>
ENV PATH="/opt/venv/bin:$PATH" \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PORT=8000
# <20>湧蠧蝡臬藁
EXPOSE 8000
# <20>亙熒璉<E78692><E79289>?HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# <20>臬𢆡<E887AC>賭誘
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
```
### 甇仿炊 2嚗𡁜<E59A97>撱?.dockerignore
```
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
env/
ENV/
# IDE
.vscode/
.idea/
*.swp
*.swo
# 瘚贝<E7989A><E8B49D><EFBFBD>獢?tests/
test_files/
*.md
README.md
# Git
.git/
.gitignore
# <20><EFBFBD>
*.log
# 銝湔𧒄<E6B994><F0A79284>
tmp/
temp/
```
### 甇仿炊 3嚗𡁏𧋦<F0A1818F><EFBFBD>撱粹<E692B1><E7B2B9>?
```bash
# 餈𥕦<E9A488> extraction_service <20><EFBFBD>
cd d:\MyCursor\AIclinicalresearch\extraction_service
# <20><><EFBFBD>𨅯<EFBFBD><EFBFBD>𧋦<EFBFBD><EFBFBD>霂𤏪<E99C82>
docker build -t extraction-service:latest .
# <20><EFBFBD><E4BAA6>𨅯<EFBFBD>憭批<E686AD>
docker images extraction-service
```
### 甇仿炊 4嚗𡁏𧋦<F0A1818F><EFBFBD>霂閖<E99C82><E99696>?
```bash
# <20>臬𢆡摰孵膥嚗<E886A5>𧋦<EFBFBD><EFBFBD>霂𤏪<E99C82>
docker run -d \
--name extraction-test \
-p 8000:8000 \
-e DATABASE_URL="postgresql://user:pass@host:5432/dbname" \
extraction-service:latest
# <20><EFBFBD><E4BAA6><EFBFBD>
docker logs -f extraction-test
# 瘚贝<E7989A><E8B49D>亙熒璉<E78692><E79289>?curl http://localhost:8000/health
# 瘚贝<E7989A> PDF <20>𣂼<EFBFBD>
curl -X POST \
-F "file=@test.pdf" \
http://localhost:8000/extract/pdf
# <20>𨀣迫撟嗅<E6929F><E59785><EFBFBD>霂訫捆<E8A8AB>?docker stop extraction-test
docker rm extraction-test
```
### 甇仿炊 5嚗𡁏綫<F0A1818F><E7B6AB><EFBFBD><EFBFBD><EFBFBD>鈭穃捆<E7A983><EFBFBD><E588B8><EFBFBD>摨?
#### 5.1 <20>𥕦遣<F0A595A6>𨅯<EFBFBD>隞枏<E99A9E><EFBFBD><E59A97>甈⊿<E79488>蝵莎<E89DB5>
1. **<2A><EFBFBD><E9A483><EFBFBD>鈭烐綉<E78390>嗅蝱** <20>?**摰孵膥<E5ADB5>𨅯<EFBFBD><F0A885AF>滚𦛚 ACR**
2. **<2A>𥕦遣銝芯犖摰硺<E691B0>**嚗<><E59A97>韐寧<E99F90>嚗?
```
摰硺<E691B0><E7A1BA>滨妍: extraction-service
<20><EFBFBD>: <20>𦒘<EFBFBD>1嚗<31>㜺撌痹<E6928C>
```
3. **<2A>𥕦遣<F0A595A6><EFBFBD>蝛粹𡢿**:
```
<20><EFBFBD>蝛粹𡢿: clinical-research
```
4. **<2A>𥕦遣<F0A595A6>𨅯<EFBFBD>隞枏<E99A9E>**:
```
隞枏<E99A9E><E69E8F>滨妍: extraction-service
<><E99A9E>皞? <20>砍𧑐隞枏<E99A9E>
```
#### 5.2 <20><EFBFBD><E588B8><EFBFBD><EFBFBD>?
```bash
# 1. <20><EFBFBD><E9A483><EFBFBD>鈭穃捆<E7A983><EFBFBD><E588B8>𤩺<EFBFBD><F0A4A9BA>?# <20><EFBFBD><E79195><EFBFBD><E9A483>賭誘嚗𡁻燵<F0A181BB><EFBFBD><E494B6><EFBFBD><E689B9>?<3F>?摰孵膥<E5ADB5>𨅯<EFBFBD><F0A885AF>滚𦛚 <20>?霈輸䔮<E8BCB8><EFBFBD> <20>?霈曄蔭Registry<72><EFBFBD><EFBFBD><E69296>
docker login --username=<your-username> registry.cn-beijing.aliyuncs.com
# 2. 蝏䠷<E89D8F><E4A0B7>𤩺<EFBFBD><F0A4A9BA><EFBFBD>
docker tag extraction-service:latest \
registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.0
# 3. <20><EFBFBD><E588B8><EFBFBD><EFBFBD><EFBFBD>鈭?docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.0
# 4. <20><EFBFBD>?latest <20><>倌嚗<E5808C>噶鈭𤾸<E988AD>蝏剜凒<E5899C><EFBFBD>
docker tag extraction-service:latest \
registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:latest
docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:latest
```
---
## <20>函蔡<E587BD>?SAE
### 甇仿炊 1嚗𡁜<E59A97>撱?SAE 摨𠉛鍂
1. **<2A><EFBFBD><E9A483><EFBFBD>鈭烐綉<E78390>嗅蝱** <20>?**Serverless 摨𠉛鍂撘閙<E69298> SAE**
2. **<2A>𥕦遣摨𠉛鍂**:
```
摨𠉛鍂<F0A0899B>滨妍: extraction-service
<20><EFBFBD>蝛粹𡢿: <20>㗇𥋘<E39787>𡒊垢<F0A1928A><E59EA2><EFBFBD><EFBFBD><E587BD><EFBFBD>蝛粹𡢿嚗<F0A1A2BF><E59A97> VPC嚗? <20>函蔡<E587BD><EFBFBD>: <20>𨅯<EFBFBD>
```
3. **<2A>𨅯<EFBFBD><F0A885AF>滨蔭**:
```
<20>𨅯<EFBFBD><F0A885AF><EFBFBD>: registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:latest
<20>𨅯<EFBFBD><F0A885AF><EFBFBD>𧋦: latest
<20>𨅯<EFBFBD><F0A885AF><EFBFBD>蝑𣇉裦: Always嚗<73><E59A97>甈⊿<E79488>蝵脤<E89DB5><E884A4><EFBFBD><E58CA7><EFBFBD><EFBFBD><EFBFBD><E59C88>𧶏<EFBFBD>
```
4. **閫<><EFBFBD>滨蔭**:
```
CPU: 1<>? <20><><EFBFBD>: 2GB
摰硺<E691B0><E7A1BA>? 1嚗<31><E59A97>憪页<E686AA>
撘寞<E69298><EFBFBD>蝻拙捆:
- <20><>撠誩<E692A0>靘𧢲㺭: 1
- <20><>憭批<E686AD>靘𧢲㺭: 3
- CPU 閫血<E996AB><E8A180><EFBFBD><EFBFBD>? 70%
```
5. **蝵𤑳<E89DB5><F0A491B3>滨蔭**:
```
銝𤘪<E98A9D>蝵𤑳<E89DB5> VPC: <20>㗇𥋘<E39787>𡒊垢<F0A1928A><E59EA2><EFBFBD><EFBFBD> VPC
vSwitch: <20>㗇𥋘<E39787>𡒊垢<F0A1928A><E59EA2><EFBFBD><EFBFBD>鈭斗揢<E69697>? 摰匧<E691B0>蝏? <20><>捂 VPC <20><><EFBFBD>? ```
### 甇仿炊 2嚗𡁻<E59A97>蝵桃㴓憓<E3B493><E68693><EFBFBD>?
<EFBFBD>?SAE 摨𠉛鍂<F0A0899B>滨蔭銝剜溶<E5899C>牐誑銝讠㴓憓<E3B493><E68693><EFBFBD>𧶏<EFBFBD>
```bash
# ========= <20>唳旿摨㯄<E691A8>蝵?=========
DATABASE_URL=postgresql://user:password@rm-xxxx.pg.rds.aliyuncs.com:5432/clinical_research
# ========= 摮睃<E691AE><E79D83>滨蔭 =========
OSS_ENDPOINT=oss-cn-hangzhou-internal.aliyuncs.com
OSS_BUCKET=your-bucket-name
OSS_ACCESS_KEY_ID=<your-id>
OSS_ACCESS_KEY_SECRET=<your-secret>
# ========= <20>滚𦛚<E6BB9A>滨蔭 =========
SERVICE_NAME=extraction-service
SERVICE_VERSION=v1.0
LOG_LEVEL=INFO
# ========= <20><EFBFBD><E689AF>滨蔭 =========
WORKERS=2
TIMEOUT=300
MAX_FILE_SIZE=52428800
# ========= <20>嗅躹 =========
TZ=Asia/Shanghai
```
### 甇仿炊 3嚗𡁻<E59A97>蝵桀<E89DB5>摨瑟<E691A8><E7919F>?
```bash
<EFBFBD>亙熒璉<EFBFBD><EFBFBD>亥楝敺? /health
<EFBFBD>亙熒璉<EFBFBD><EFBFBD>亦垢<EFBFBD>? 8000
<EFBFBD>亙熒璉<EFBFBD><EFBFBD><EFBFBD>霈? HTTP
<EFBFBD><EFBFBD>撱嗉<EFBFBD>: 30蝘?璉<><E79289>仿𡢿<E4BBBF>? 10蝘?頞<>𧒄<EFBFBD>園𡢿: 5蝘?<3F>亙熒<E4BA99><E78692><EFBFBD>? 2甈?銝滚<E98A9D>摨琿<E691A8><E790BF>? 3甈?```
### 甇仿炊 4嚗𡁻<E59A97>蝵格𠯫敹?
```bash
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>: /app/logs
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>辣: extraction-service.log
<EFBFBD><EFBFBD>蝥批<EFBFBD>: INFO
<EFBFBD><EFBFBD>靽萘<EFBFBD>憭拇㺭: 7憭?```
### 甇仿炊 5嚗𡁻<E59A97>蝵?SLB嚗<42><EFBFBD><EFBFBD><EFBFBD><E68692><EFBFBD><EFBFBD><EFBFBD><E996AC>蝵𤏸挪<F0A48FB8><EFBFBD>
```bash
# <20>𡁜虜 Python 敺格<E695BA><E6A0BC><E288AA><E898A8><EFBFBD><E996AC>蝵𤏸挪<F0A48FB8><EFBFBD>鋡怠<E98BA1>蝡航<E89DA1><E888AA><EFBFBD>
# 憒<><E68692><EFBFBD><EFBFBD><EFBFBD><E996AC>蝵𤏸挪<F0A48FB8><EFBFBD><EFBFBD><E68692><EFBFBD><E99D9A><EFBFBD><EFBFBD>洵銝㗇䲮<E39787><E4B2AE><EFBFBD>嚗㚁<E59A97>
韐蠘蝸<EFBFBD><EFBFBD>﹛蝐餃<EFBFBD>: <20><EFBFBD>
<EFBFBD>穃𨯬蝡臬藁: 80
<EFBFBD>𡒊垢蝡臬藁: 8000
<EFBFBD>亙熒璉<EFBFBD><EFBFBD>? <20>舐鍂
```
### 甇仿炊 6嚗𡁻<E59A97>蝵脣<E89DB5><E884A3>?
1. **<2A>孵稬"<22>函蔡摨𠉛鍂"**
2. **蝑匧<E89D91><E58CA7>函蔡摰峕<E691B0>**嚗<>漲 2-3 <20><><EFBFBD>嚗?
3. **<2A><EFBFBD><E4BAA6>函蔡<E587BD><EFBFBD>**:
```
[INFO] Pulling image...
[INFO] Image pulled successfully
[INFO] Starting container...
[INFO] Container started successfully
[INFO] Health check passed
[INFO] Application is running
```
---
## 瘚贝<E7989A>銝𡡞<E98A9D>霂?
### 甇仿炊 1嚗朞繮<E69C9E><EFBFBD>蝵穃𧑐<E7A983><F0A79190><EFBFBD><E59A97><EFBFBD>格郊撉歹<E69289>
**<2A>𩤃<EFBFBD> <20><EFBFBD>嚗锭AE 摰硺<E691B0><E7A1BA>湔糓頝其蜓<E585B6><EFBFBD><EFBFBD><E59A97>憿颱蝙<E9A2B1>?SAE <20>𣂷<EFBFBD><F0A382B7><EFBFBD><EFBFBD>蝵穃𧑐<E7A983><F0A79190>**
#### <20><EFBFBD><E79195><EFBFBD><E7AC94><EFBFBD><EFBFBD><EFBFBD><EFBFBD><E595A3><EFBFBD>迤蝖格䲮瘜𤏪<E7989C>
1. **<2A><EFBFBD> SAE <20><EFBFBD><E689B9>?* <20>?**摨𠉛鍂<F0A0899B>𡑒”** <20>?**<2A>孵稬 extraction-service 摨𠉛鍂**
2. **<2A><EFBFBD><E585B8>刻祕<E588BB><E7A595>△嚗峕𪄳<E5B395>?摨𠉛鍂霈輸䔮<E8BCB8>滨蔭"<22>?VPC <20><><EFBFBD>霈輸䔮"<22><EFBFBD>**
3. **<2A><EFBFBD>撟嗅<E6929F><E59785>?<3F><><EFBFBD>霈輸䔮<E8BCB8><EFBFBD>"**嚗屸<E59A97>𡁜虜<F0A1819C>臭誑銝𧢲聢撘譍<E69298><EFBFBD>嚗? ```
# <20><EFBFBD> 1: <20><><EFBFBD> IP + 蝡臬藁嚗<E89781><E59A97>潃鐥<E6BD83>潃鐥<E6BD83> 撘箇<E69298><E7AE87><EFBFBD>嚗峕<E59A97>蝔喳<E89D94>嚗? 172.17.x.x:8000
# <20><EFBFBD> 2: SAE <20><><EFBFBD> Service <20><EFBFBD><EFBFBD><E59A97><EFBFBD><E996AC>憭㚚<E686AD>蝵格<E89DB5><E6A0BC><EFBFBD><E288AA><EFBFBD>銝齿綫<E9BDBF><EFBFBD>
extraction-service-xxxxx.cn-hangzhou.sae.aliyuncs.com:8000
# <20><EFBFBD> 3: K8s Service <20><EFBFBD><EFBFBD><E59A97><EFBFBD><E996AC>蝵堉8s<38>滚𦛚<E6BB9A>𤑳緵嚗<E7B7B5><E59A97><EFBFBD><EFBFBD><EFBFBD>銝齿綫<E9BDBF><EFBFBD>
extraction-service.namespace.svc.cluster.local:8000
```
4. **<2A>?<3F>躰秤<E8BAB0>𡁏<EFBFBD><EFBFBD><E59A97>撖潸稲餈墧𦻖憭梯揖嚗?*嚗? ```bash
# <20>?銝滩<E98A9D><E6BBA9>𨀣<EFBFBD><F0A880A3><EFBFBD>霈曉<E99C88><E69B89><EFBFBD>100%憭梯揖嚗? http://extraction-service.sae:8000 # .sae <20><EFBFBD>銝滚<E98A9D><E6BB9A>? http://extraction-service.internal:8000 # .internal <20><EFBFBD>銝滚<E98A9D><E6BB9A>? http://extraction-service.cluster.local:8000 # <20><><EFBFBD>8s<38>滚𦛚<E6BB9A>𤑳緵<F0A491B3>滨蔭
# <20>?銝滩<E98A9D>雿輻鍂 localhost
http://localhost:8000 # SAE 摰硺<E691B0><E7A1BA>湔糓頝其蜓<E585B6><EFBFBD>
# <20>?銝滩<E98A9D>雿輻鍂 Docker <20>滚𦛚<E6BB9A>? http://extraction-service:8000 # 餈嗘<E9A488><E59798><EFBFBD><E887AC>?Docker Compose
```
5. **<2A>?<3F><EFBFBD><E588BB>𡁏<EFBFBD><EFBFBD><E59A97>隡睃<E99AA1>蝥扳<E89DA5>摨𧶏<E691A8>**嚗? ```bash
# 潃鐥<E6BD83>潃鐥<E6BD83>潃?<3F><EFBFBD>A嚗𡁶凒<F0A181B6>乩蝙<E4B9A9><EFBFBD>蝵飡P嚗<50><EFBFBD><E692A9><EFBFBD><EFBFBD>
EXTRACTION_SERVICE_URL=http://172.17.x.x:8000
# <20><EFBFBD><E79195><EFBFBD>嚗锭AE<41><EFBFBD><E689B9>?> Python摨𠉛鍂 > 摰硺<E691B0><E7A1BA>𡑒” > <20><EFBFBD><E4BAA6><EFBFBD><EFBFBD>IP
# 潃鐥<E6BD83>潃?<3F><EFBFBD>B嚗帋蝙<E5B88B>沒AE<41>滚𦛚<E6BB9A>𤑳緵嚗<E7B7B5><E59A97><EFBFBD><E996AC>憭㚚<E686AD>蝵殷<E89DB5>銝齿綫<E9BDBF>𣂼<EFBFBD><F0A382BC>煺蝙<E785BA><EFBFBD>
# <20><><EFBFBD>銁SAE<41><EFBFBD><E689B9><EFBFBD>蝵?敺格<E695BA><E6A0BC>⊥釣<E28AA5>䔶葉敹?
EXTRACTION_SERVICE_URL=http://extraction-service-xxxxx.cn-hangzhou.sae.aliyuncs.com:8000
```
### 甇仿炊 2嚗𡁻<E59A97>蝵桀<E89DB5>蝡舐㴓憓<E3B493><E68693><EFBFBD>?
<EFBFBD>?SAE <20>𡒊垢摨𠉛鍂<F0A0899B><E98D82>㴓憓<E3B493><E68693><EFBFBD>譍葉瘛餃<E7989B>嚗?
```bash
# <20>𩤃<EFBFBD> 雿輻鍂 SAE <20><EFBFBD><E689B9>唳遬蝷箇<E89DB7><E7AE87><EFBFBD><E7AC94><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
EXTRACTION_SERVICE_URL=http://172.17.x.x:8000
# 瘜冽<E7989C>嚗?# 1. 銝滩<E98A9D>雿輻鍂<E8BCBB>𨀣<EFBFBD><F0A880A3><EFBFBD><EFBFBD><EFBFBD>?# 2. 敹<>◆隞?SAE <20><EFBFBD><E689B9><EFBFBD>"摨𠉛鍂霈輸䔮<E8BCB8>滨蔭"銝剛繮<E5899B>?# 3. 憒<><E68692> IP <20><EFBFBD><EFBFBD><E59A97><EFBFBD>齿鰵<E9BDBF>函蔡嚗㚁<E59A97><E39A81><EFBFBD><EFBFBD><E996AC>甇交凒<E4BAA4><EFBFBD>銝芰㴓憓<E3B493><E68693><EFBFBD>?```
**<2A>滨蔭<E6BBA8>𡡞<EFBFBD><F0A1A19E><EFBFBD>蝡臬<E89DA1><E887AC>?*嚗?- SAE <20><EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?<3F>滚鍳
### 甇仿炊 3嚗帋<E59A97><E5B88B>𡒊垢<F0A1928A>滚𦛚瘚贝<E7989A>
<EFBFBD><EFBFBD><EFBFBD>?Node.js <20>𡒊垢<F0A1928A>滚𦛚銝剜溶<E5899C><EFBFBD>霂閧垢<E996A7><EFBFBD>
```typescript
// backend/src/tests/test-extraction-service.ts
import axios from 'axios';
import FormData from 'form-data';
import fs from 'fs';
const EXTRACTION_SERVICE_URL = process.env.EXTRACTION_SERVICE_URL || 'http://extraction-service.internal:8000';
export async function testExtractionService() {
try {
// 1. <20>亙熒璉<E78692><E79289>? console.log('Testing health endpoint...');
const healthRes = await axios.get(`${EXTRACTION_SERVICE_URL}/health`);
console.log('Health check:', healthRes.data);
// 2. 瘚贝<E7989A> PDF <20>𣂼<EFBFBD>
console.log('Testing PDF extraction...');
const form = new FormData();
form.append('file', fs.createReadStream('./test.pdf'));
const pdfRes = await axios.post(
`${EXTRACTION_SERVICE_URL}/extract/pdf`,
form,
{ headers: form.getHeaders() }
);
console.log('PDF extraction result:', pdfRes.data);
// 3. 瘚贝<E7989A> Word <20>𣂼<EFBFBD>
console.log('Testing Word extraction...');
const form2 = new FormData();
form2.append('file', fs.createReadStream('./test.docx'));
const docxRes = await axios.post(
`${EXTRACTION_SERVICE_URL}/extract/docx`,
form2,
{ headers: form2.getHeaders() }
);
console.log('Word extraction result:', docxRes.data);
console.log('<27>?All tests passed!');
} catch (error) {
console.error('<27>?Test failed:', error.message);
if (error.response) {
console.error('Response:', error.response.data);
}
}
}
```
### 甇仿炊 4嚗𡁻<E59A97><EFBFBD><EFBFBD>啁垢瘚<E59EA2><E7989A><EFBFBD><E59A97><EFBFBD><EFBFBD><E6B8AF><E288AA><EFBFBD>
瘚贝<EFBFBD>隞乩<EFBFBD>銝𡁜𦛚瘚<EFBFBD><EFBFBD>嚗?
#### <20>箸艶 1: PKB <20><>﹝銝𠹺<E98A9D>
**銝𡁜𦛚瘚<F0A69B9A><E7989A>**嚗?```
<EFBFBD><EFBFBD>銝𠹺<EFBFBD> PDF
<20>?Node.js <20>𡒊垢<F0A1928A>交𤣰
<20>?HTTP POST 頧砍<E9A0A7><E7A08D><EFBFBD>辣瘚<E8BEA3><E7989A> Python <20>滚𦛚 (EXTRACTION_SERVICE_URL)
<20>?Python <20>滚𦛚閫<F0A69B9A><E996AB> PDF嚗諹<E59A97><E8ABB9>?JSON <20><>𧋦
<20>?Node.js <20>𡒊垢<F0A1928A><EFBFBD><E59785><EFBFBD>𧋦
<20>?銝𠹺<E98A9D><F0A0B9BA>?Dify
<20>?餈𥪜<E9A488><F0A5AA9C>滨垢
```
**瘚贝<E7989A>甇仿炊**嚗?1. <20><EFBFBD>蝡臭<E89DA1>隡牐<E99AA1>銝?PDF <20><>﹝嚗<EFB99D>遣霈?< 5MB <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><E78DA2>
2. **<2A><EFBFBD> Node.js <20>𡒊垢<F0A1928A><EFBFBD>**嚗𠄎AE <20><EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?<3F><EFBFBD>嚗㚁<E59A97>
```
[INFO] Calling extraction service: http://172.17.x.x:8000/extract/pdf
[INFO] Extraction completed in 2.3s
[INFO] Extracted text preview: "This is a test document..."
```
3. **<2A><EFBFBD> Python <20>滚𦛚<E6BB9A><EFBFBD>**嚗𠄎AE <20><EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?<3F><EFBFBD>嚗㚁<E59A97>
```
INFO: Request: POST /extract/pdf
INFO: File size: 1.2MB, filename: test.pdf
INFO: Using PyMuPDF extraction
INFO: Response: 200 (took 2.10s)
```
4. **<2A>?Dify Web UI 銝剔霈斗<E99C88><EFBFBD>歇銝𠹺<E98A9D>**
**憒<><E68692>憭梯揖嚗峕<E59A97><E5B395>?*嚗?- <20>𡒊垢<F0A1928A><EFBFBD><E4BA99>臬炏<E887AC>曄內 "Connection refused" <20>?璉<><E79289>?EXTRACTION_SERVICE_URL <20>滨蔭
- Python <20><EFBFBD><E4BA99>臬炏<E887AC>曄內 "ImportError" <20>?璉<><E79289>?Dockerfile 蝟餌<E89D9F>靘肽<E99D98>
- <20>𣂼<EFBFBD><EFBFBD>𧒄嚗? 300s嚗争<E59A97> <20><>辣憭芸之<E88AB8><EFBFBD><EFBFBD><E996AC><EFBFBD><EFBFBD><E3B098><EFBFBD>蝵?
#### <20>箸艶 2: ASL 瘛勗漲<E58B97><E6BCB2>
```
<EFBFBD><EFBFBD><EFBFBD>孵稬"瘛勗漲<E58B97><E6BCB2>粉" <20>?<3F>𡒊垢靚<E59EA2>鍂 Python <20>滚𦛚<E6BB9A>𣂼<EFBFBD><F0A382BC><EFBFBD> <20>?餈𥪜<E9A488> LLM <20><><EFBFBD>蝏𤘪<E89D8F>
```
**瘚贝<E7989A>甇仿炊**嚗?1. <20>?ASL 璅<E79285><E288AA>孵稬"瘛勗漲<E58B97><E6BCB2>粉"
2. <20><EFBFBD><E4BAA6>𡒊垢<F0A1928A><EFBFBD><EFBFBD>霈方<E99C88><E696B9>?Python <20>滚𦛚嚗?3. <20><EFBFBD> Python <20>滚𦛚<E6BB9A><EFBFBD><EFBFBD>霈斗<E99C88><E69697>𡝗<EFBFBD><F0A19D97><EFBFBD><EFBFBD>
4. <20>滨垢<E6BBA8>曄內<E69B84><E585A7><EFBFBD>蝏𤘪<E89D8F>
#### <20>箸艶 3: DC <20>唳旿皜<E697BF><E79A9C>
```
<EFBFBD><EFBFBD>銝𠹺<EFBFBD> Excel <20>?<3F>𡒊垢靚<E59EA2>鍂 Python <20>滚𦛚 fillna <20>?餈𥪜<E9A488><EFBFBD><E79A9C><EFBFBD>擧㺭<E693A7>?```
**瘚贝<E7989A>甇仿炊**嚗?1. <20>?DC 璅<E79285>銝𠹺<E98A9D> Excel <20><>
2. <20><EFBFBD> fillna <20><EFBFBD>
3. <20><EFBFBD> Python <20>滚𦛚<E6BB9A><EFBFBD>
4. 撉諹<E69289><EFBFBD><E79A9C>蝏𤘪<E89D8F>
---
## <20>烐綉銝𡒊輕<F0A1928A>?
### <20><> SAE <20>芸蒂<E88AB8>烐綉
#### 1. <20><EFBFBD>摨𠉛鍂<F0A0899B>烐綉
```
SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>烐綉
```
**<2A>喲睸<E596B2><E79DB8><EFBFBD>**嚗?- **CPU 雿輻鍂<E8BCBB>?*嚗? 70%嚗㚁<E59A97>PDF <20>𣂼<EFBFBD><F0A382BC>?CPU 撖<><E69296><EFBFBD>衤遙<E8A1A4>?- **<2A><><EFBFBD>雿輻鍂<E8BCBB>?*嚗? 80%嚗㚁<E59A97>憭扳<E686AD>隞嗅<E99A9E><E59785><EFBFBD>𧒄隡𡁜<E99AA1><F0A1819C><EFBFBD>憭𡁜<E686AD>摮?- **霂瑟<E99C82> QPS**嚗<><E59A97>蝘埝䰻霂㺭嚗㚁<E59A97><EFBFBD>圾韐蠘蝸<E8A098><E89DB8><EFBFBD>
- **撟喳<E6929F><E596B3><EFBFBD><E6BB9A>園𡢿**嚗? 1000ms嚗㚁<E59A97>撠𤩺<E692A0>隞嗅<E99A9E> < 2s嚗<73><EFBFBD><E4B98B>辣 < 30s
- **<2A>躰秤<E8BAB0>?*嚗? 1%嚗㚁<E59A97><E39A81>烐綉<E78390><E7B689>辣閫<E8BEA3><E996AB>憭梯揖<E6A2AF>?
**<2A><EFBFBD><E689AF><EFBFBD><EFBFBD><E59A97><EFBFBD><EFBFBD><EFBFBD>**嚗?```
撠𤩺<EFBFBD>隞塚<EFBFBD>< 1MB PDF嚗㚁<E59A97><E39A81><EFBFBD><E6BB9A>園𡢿 1-3s
銝剔<EFBFBD><EFBFBD><EFBFBD>辣嚗?-10MB PDF嚗㚁<E59A97><E39A81><EFBFBD><E6BB9A>園𡢿 5-15s
憭扳<EFBFBD>隞塚<EFBFBD>10-50MB PDF嚗㚁<E59A97><E39A81><EFBFBD><E6BB9A>園𡢿 20-60s
<EFBFBD><EFBFBD><EFBFBD>辣嚗? 50MB嚗㚁<E59A97>撱箄悅<E7AE84>𣂼<EFBFBD><F0A382BC>𡝗<EFBFBD>蝏?```
#### 2. 摰墧𧒄<E5A2A7><EFBFBD><E4BA99><EFBFBD>
```
SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F><EFBFBD> <20>?摰墧𧒄<E5A2A7><EFBFBD>
```
**<2A><EFBFBD>蝐餃<E89D90>**嚗?- 摨𠉛鍂<F0A0899B><EFBFBD>嚗ìtdout/stderr嚗㚁<E59A97>uvicorn <20>臬𢆡靽⊥<E99DBD><E28AA5><EFBFBD>窈瘙<E7AA88>𠯫敹?- 霈輸䔮<E8BCB8><EFBFBD>嚗𠃍TTP 霂瑟<E99C82>嚗㚁<E59A97>霂瑟<E99C82>頝臬<E9A09D><E887AC><EFBFBD><EFBFBD>摨娍𧒄<E5A88D><EFBFBD><E6B0AC>𠶖<EFBFBD><F0A0B696><EFBFBD>
- <20>躰秤<E8BAB0><EFBFBD><EFBFBD><E59A97>撣詨<E692A3><E8A9A8><EFBFBD><EFBFBD>嚗䥪ython 撘<>虜霂行<E99C82>
**<2A>喲睸<E596B2><EFBFBD>蝷箔<E89DB7>**嚗?```bash
# <20>?甇<><EFBFBD>臬𢆡
INFO: Started server process [1]
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000
# <20>?甇<>虜霂瑟<E99C82>
INFO: Request: POST /extract/pdf
INFO: File: test.pdf (1.2MB)
INFO: Response: 200 (took 2.10s)
# <20>?<3F>躰秤<E8BAB0><EFBFBD><EFBFBD><E59A97><EFBFBD>單釣嚗?ERROR: ImportError: libGL.so.1: cannot open shared object file
ERROR: Timeout: PDF extraction took > 300s
ERROR: Memory error: Cannot allocate memory
```
#### 3. 撘寞<E69298>找撓蝻拚<E89DBB>蝵?
```
SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?撘寞<E69298>找撓蝻?```
**<2A><EFBFBD><E588BB>滨蔭**嚗?```
<EFBFBD><EFBFBD>撠誩<EFBFBD>靘𧢲㺭: 1嚗<31>靽脲<E99DBD><E884B2><EFBFBD>銝剜鱏嚗?<3F><>憭批<E686AD>靘𧢲㺭: 3嚗<33><EFBFBD><EFBFBD><E6A180><EFBFBD><EFBFBD>頧質<E9A0A7><E8B3AA><EFBFBD>
閫血<EFBFBD><EFBFBD>∩辣:
- CPU 雿輻鍂<E8BCBB>?> 70% <20><>賒 3 <20><><EFBFBD> <20>?<3F>拙捆 1 銝芸<E98A9D>靘? - CPU 雿輻鍂<E8BCBB>?< 30% <20><>賒 5 <20><><EFBFBD> <20>?蝻拙捆 1 銝芸<E98A9D>靘?```
**瘜冽<E7989C>鈭钅★**嚗?- PDF <20>𣂼<EFBFBD><F0A382BC>?CPU 撖<><E69296><EFBFBD><EFBFBD><E9A1B5>拙捆銝餉<E98A9D><E9A489>?CPU
- 憒<><E68692>蝏誩虜<E8AAA9>拙捆嚗諹<E59A97><E8ABB9><EFBFBD><EFBFBD>湔𦻖憓𧼮<E68693>摰硺<E691B0><EFBFBD>聢嚗?<3F>?<3F>?4<><EFBFBD>
- SAE 隡朞䌊<E69C9E><EFBFBD>頧賢<E9A0A7>銵∴<E98AB5><E288B4>𣳇<EFBFBD><F0A3B387>见𢆡<E8A781>滨蔭
### <20><> 摨𠉛鍂<F0A0899B><E98D82><EFBFBD><EFBFBD>?
#### 瘛餃<E7989B><E9A483>亙熒璉<E78692><E79289>亦垢<E4BAA6>?
```python
# main.py
from fastapi import FastAPI
import psutil
import os
app = FastAPI()
@app.get("/health")
async def health_check():
"""<22>亙熒璉<E78692><E79289>亦垢<E4BAA6>?""
return {
"status": "healthy",
"service": "extraction-service",
"version": os.getenv("SERVICE_VERSION", "unknown")
}
@app.get("/metrics")
async def metrics():
"""<22><EFBFBD><E689AF><EFBFBD><EFBFBD>蝡舐<E89DA1>"""
cpu_percent = psutil.cpu_percent(interval=1)
memory = psutil.virtual_memory()
disk = psutil.disk_usage('/app')
return {
"cpu": {
"percent": cpu_percent,
"count": psutil.cpu_count()
},
"memory": {
"total": memory.total,
"available": memory.available,
"percent": memory.percent
},
"disk": {
"total": disk.total,
"used": disk.used,
"free": disk.free,
"percent": disk.percent
}
}
```
#### 瘛餃<E7989B>霂瑟<E99C82><E7919F><EFBFBD>
```python
# main.py
import logging
from fastapi import Request
import time
# <20>滨蔭<E6BBA8><EFBFBD>
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('/app/logs/extraction-service.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
@app.middleware("http")
async def log_requests(request: Request, call_next):
"""霂瑟<E99C82><E7919F><EFBFBD>銝剝𡢿隞?""
start_time = time.time()
# 霈啣<E99C88>霂瑟<E99C82>
logger.info(f"Request: {request.method} {request.url}")
# <20><EFBFBD>霂瑟<E99C82>
response = await call_next(request)
# 霈啣<E99C88><E595A3><EFBFBD>
process_time = time.time() - start_time
logger.info(
f"Response: {response.status_code} "
f"(took {process_time:.2f}s)"
)
return response
```
### <20><> 摰𡁏<E691B0>蝏湔擪隞餃𦛚
#### 瘥誩𪂹隞餃𦛚
```bash
# 1. 璉<><E79289>交𠯫敹堒之撠?du -sh /app/logs
# 2. <20><EFBFBD><E4BAA6>躰秤<E8BAB0><EFBFBD>
tail -n 100 /app/logs/extraction-service.log | grep ERROR
# 3. <20>滚鍳摨𠉛鍂嚗<E98D82><E59A97><EFBFBD>𨀣<EFBFBD><F0A880A3><EFBFBD><EFBFBD><EFBFBD><E7989C>嚗?# SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>滚鍳
```
#### 瘥𤩺<E798A5>隞餃𦛚
```bash
# 1. <20>湔鰵 Python 靘肽<E99D98>
pip list --outdated
# 2. <20>滚遣<E6BB9A>𨅯<EFBFBD><EFBFBD><E59A97><EFBFBD><EFBFBD><E680A0>冽凒<E586BD><EFBFBD>
docker build -t extraction-service:v1.1 .
docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.1
# 3. <20>?SAE 銝剜凒<E5899C><EFBFBD><E59C88><EFBFBD><E8AE90>?```
---
## <20><><EFBFBD><EFBFBD>埝䰻
### <20>𤣳 撣貉<E692A3><E8B289><EFBFBD>
#### <20><EFBFBD> 1嚗𡁜捆<F0A1819C>典鍳<E585B8>典仃韐?
**<2A><>𠶖**嚗?```
SAE <20>曄內嚗𡁜<E59A97><F0A1819C>典鍳<E585B8>典仃韐?<3F><EFBFBD><E4BA99>曄內嚗䥑mportError: libXXX.so: cannot open shared object file
```
**<2A><EFBFBD>**嚗𡁶撩撠𤑳頂蝏煺<E89D8F>韏?
**閫<><E996AB>**嚗?```dockerfile
# <20>?Dockerfile 銝剜溶<E5899C>删撩憭梁<E686AD>摨?RUN apt-get update && apt-get install -y \
libgl1-mesa-glx \ # OpenCV
libglib2.0-0 \ # OpenCV
libgomp1 \ # Polars
libmupdf-dev \ # PyMuPDF
&& rm -rf /var/lib/apt/lists/*
```
#### <20><EFBFBD> 2嚗䥪DF <20>𣂼<EFBFBD><EFBFBD>𧒄
**<2A><>𠶖**嚗?```
霂瑟<EFBFBD><EFBFBD>𧒄嚗? 300蝘𡜐<E89D98>
<EFBFBD><EFBFBD><EFBFBD>曄內嚗関imeout error
```
**<2A>埝䰻甇仿炊**嚗?```bash
# 1. 璉<><E79289><EFBFBD>隞嗅之撠?# 憒<><E68692><EFBFBD><EFBFBD>辣 > 50MB嚗諹<E59A97><E8ABB9><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><E686AD>
# 2. 憓𧼮<E68693><EFBFBD>𧒄<EFBFBD>園𡢿
# SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂<F0A0899B>滨蔭 <20>?<3F><EFBFBD><E887AC><EFBFBD>
TIMEOUT=600
# 3. 隡睃<E99AA1><E79D83>𣂼<EFBFBD><F0A382BC><EFBFBD>
# 頝唾<E9A09D><E594BE><EFBFBD>憿萸<E686BF><E890B8><EFBFBD>蝻拙㦛<E68B99><E3A69B><EFBFBD>
```
#### <20><EFBFBD> 3嚗𡁜<E59A97>摮䀹滯<E480B9><EFBFBD>OOM嚗?
**<2A><>𠶖**嚗?```
摰孵膥<EFBFBD>芸𢆡<EFBFBD>滚鍳
<EFBFBD><EFBFBD><EFBFBD>曄內嚗鐗illed (signal 9)
```
**閫<><E996AB>**嚗?```bash
# 1. 憓𧼮<E68693><F0A7BCAE><EFBFBD><EFBFBD><EFBFBD>滨蔭
# SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂<F0A0899B>滨蔭 <20>?閫<>
<EFBFBD><EFBFBD><EFBFBD>: 2GB <20>?4GB
# 2. 隡睃<E99AA1><EFBFBD><E99A9E><EFBFBD><E59A97>撘誩<E69298><E8AAA9><EFBFBD><EFBFBD>
# 銝滩<E98A9D><EFBFBD>甈⊥<E79488><EFBFBD>頧賣㟲銝芣<E98A9D>隞嗅<E99A9E><E59785><EFBFBD><EFBFBD>
with open(pdf_path, 'rb') as f:
# <20><><EFBFBD><EFBFBD><E686AD>
for chunk in read_in_chunks(f):
process(chunk)
```
#### <20><EFBFBD> 4嚗𡁜<E59A97>蝡舀<E89DA1>瘜閗<E7989C><E99697>?Python <20>滚𦛚嚗<F0A69B9A><E59A97>憸煾<E686B8>霂荔<E99C82>
**<2A><>𠶖**嚗?```
<EFBFBD>𡒊垢<EFBFBD><EFBFBD>嚗鋴onnection refused
<EFBFBD>?ECONNREFUSED: connect ECONNREFUSED 172.17.x.x:8000
<EFBFBD>?Error: getaddrinfo ENOTFOUND extraction-service.internal
```
**<2A>寞𧋦<E5AF9E><EFBFBD><E7AC94>埝䰻**嚗?
**<2A><EFBFBD> 1嚗𡁜<E59A97>蝵穃𧑐<E7A983><F0A79190><EFBFBD>滨蔭<E6BBA8>躰秤嚗<E7A7A4><E59A97>撣貉<E692A3>嚗?*
```bash
# <20>?<3F>躰秤<E8BAB0>滨蔭嚗<E894AD><E59A97>瘚讠<E7989A><E8AEA0><EFBFBD>嚗?EXTRACTION_SERVICE_URL=http://extraction-service.internal:8000
# <20>?甇<><EFBFBD>滨蔭嚗𠄎AE <20><EFBFBD><E689B9>唳遬蝷箇<E89DB7><E7AE87><EFBFBD><E7AC94><EFBFBD>嚗?EXTRACTION_SERVICE_URL=http://172.17.x.x:8000
```
**閫<><E996AB><EFBFBD><EFBFBD>**嚗?```bash
# 1. <20><EFBFBD><E79195><EFBFBD><E7AC94><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
# SAE <20><EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?摨𠉛鍂霂行<E99C82> <20>?摨𠉛鍂霈輸䔮<E8BCB8>滨蔭
# 憭滚<E686AD><E6BB9A>曄內<E69B84>?VPC <20><><EFBFBD>霈輸䔮<E8BCB8><EFBFBD>"
# 2. <20>湔鰵<E6B994>𡒊垢<F0A1928A><EFBFBD><E887AC><EFBFBD>
# SAE <20><EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?摨𠉛鍂<F0A0899B>滨蔭 <20>?<3F><EFBFBD><E887AC><EFBFBD>
EXTRACTION_SERVICE_URL=http://<<3C><EFBFBD><E7AC94><EFBFBD><EFBFBD>IP>:8000
# 3. <20>滚鍳<E6BB9A>𡒊垢摨𠉛鍂
# SAE <20><EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?<3F>滚鍳
```
**<2A><EFBFBD> 2嚗䥪ython <20>滚𦛚<E6BB9A>芸鍳<E88AB8>?*
```bash
# 璉<><E79289>?Python <20>滚𦛚<E6BB9A><EFBFBD>?# SAE <20><EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?摰硺<E691B0><E7A1BA>𡑒”
# 蝖株恕摰硺<E691B0><E7A1BA><EFBFBD><E59786>蛹"餈鞱<E9A488>銝?
# <20><EFBFBD><E4BAA6>臬𢆡<E887AC><EFBFBD>
# SAE <20><EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?<3F><EFBFBD>
# 摨磰砲<E7A3B0><EFBFBD>嚗?# INFO: Application startup complete.
# INFO: Uvicorn running on http://0.0.0.0:8000
```
**<2A><EFBFBD> 3嚗𡁜<E59A97><F0A1819C><EFBFBD><EFBFBD><E996AB><EFBFBD>𣂼<EFBFBD>**
```bash
# SAE 暺䁅恕<E48185>?VPC <20><><EFBFBD><EFBFBD>典虾鈭垍㮾霈輸䔮
# 憒<><E68692>隞齿<E99A9E>瘜閗<E7989C><E99697><EFBFBD><EFBFBD><E79289><EFBFBD>
# SAE <20><EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?蝵𤑳<E89DB5><F0A491B3>滨蔭 <20>?摰匧<E691B0>蝏?# 蝖株恕<E6A0AA><EFBFBD><EFBFBD><E996AB><EFBFBD><EFBFBD>捂 VPC <20><><EFBFBD>?8000 蝡臬藁
```
**瘚贝<E7989A><E8B49D><EFBFBD><EFBFBD>餈鮋<E9A488>𡁏<EFBFBD>?*嚗?```bash
# <20><EFBFBD> 1嚗𡁜銁 SAE <20><EFBFBD><E689B9><EFBFBD>"Webshell"銝剜<E98A9D>霂𤏪<E99C82><EFBFBD><E68692><EFBFBD><EFBFBD>嚗?curl http://<Python<6F>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>IP>:8000/health
# <20><EFBFBD> 2嚗𡁜銁<F0A1819C>𡒊垢摨𠉛鍂<F0A0899B><E98D82><EFBFBD><EFBFBD><E588BB>砌葉瘛餃<E7989B>瘚贝<E7989A>
echo "Testing extraction service connectivity..."
curl -f http://<Python<6F>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>IP>:8000/health || echo "<22>?Cannot connect to extraction service"
# <20><EFBFBD> 3嚗帋蝙<E5B88B>?telnet 瘚贝<E7989A>蝡臬藁
telnet <Python<6F>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>IP> 8000
```
---
## 瘜冽<E7989C>鈭钅★銝𡒊<E98A9D>敹?
### <20>?<3F><>雿喳<E99BBF>頝?
#### 1. **<2A>𨅯<EFBFBD>隡睃<E99AA1>**
```dockerfile
# <20>?雿輻鍂憭𡁻𧫴畾菜<E795BE>撱?FROM python:3.11-slim as builder
# ... <20><>遣 ...
FROM python:3.11-slim
COPY --from=builder /opt/venv /opt/venv
# <20>?皜<><E79A9C>蝻枏<E89DBB>
RUN apt-get update && apt-get install -y ... \
&& rm -rf /var/lib/apt/lists/*
# <20>?雿輻鍂 .dockerignore
# <20><EFBFBD><EFBFBD><E692A0><EFBFBD><E695B9><EFBFBD><EFBFBD><EFBFBD>隞嗆<E99A9E><E59786><EFBFBD><EFBFBD><EFBFBD>𨅯<EFBFBD>
```
#### 2. **<2A><>𧋦蝞∠<E89D9E>**
```bash
# <20>?雿輻鍂霂凋<E99C82><E5878B>𣇉<EFBFBD><F0A38789>?v1.0.0 # 銝餌<E98A9D><E9A48C>?甈∠<E79488><E288A0>?銵乩<E98AB5><E4B9A9><EFBFBD>𧋦
# <20>?靽萘<E99DBD>憭帋葵<E5B88B><E891B5>𧋦
docker tag ... extraction-service:v1.0.0
docker tag ... extraction-service:v1.0
docker tag ... extraction-service:latest
# <20>?霈啣<E99C88><E595A3>䀹凒
# CHANGELOG.md
## v1.0.1 (2025-12-20)
- 靽桀<E99DBD>: PDF <20>𣂼<EFBFBD><EFBFBD>𧒄<EFBFBD><EFBFBD>
- 隡睃<E99AA1>: <20><EFBFBD><E8AAA9>𨅯<EFBFBD>雿梶妖 30%
```
#### 3. **摰匧<E691B0><E58CA7>惩𤐄**
```python
# <20>?<3F><>辣憭批<E686AD><E689B9>𣂼<EFBFBD>
MAX_FILE_SIZE = 50 * 1024 * 1024 # 50MB
@app.post("/extract/pdf")
async def extract_pdf(file: UploadFile):
if file.size > MAX_FILE_SIZE:
raise HTTPException(
status_code=413,
detail="File too large"
)
# <20>?<3F><>辣蝐餃<E89D90>撉諹<E69289>
ALLOWED_TYPES = {'application/pdf', 'application/msword'}
if file.content_type not in ALLOWED_TYPES:
raise HTTPException(
status_code=415,
detail="Unsupported file type"
)
```
#### 4. **<2A><EFBFBD>隡睃<E99AA1>**
```python
# <20>?撘<>郊憭<E9838A><E686AD>憭扳<E686AD>隞?import asyncio
async def extract_large_pdf(pdf_path: str):
# 雿輻鍂撘<E98D82>郊 I/O
async with aiofiles.open(pdf_path, 'rb') as f:
content = await f.read()
# <20>函瑪蝔𧢲<E89D94>銝剜<E98A9D>銵?CPU 撖<><E69296><EFBFBD>衤遙<E8A1A4>? loop = asyncio.get_event_loop()
text = await loop.run_in_executor(None, pymupdf_extract, content)
return text
# <20>?餈墧𦻖瘙?from sqlalchemy.pool import NullPool
engine = create_engine(
DATABASE_URL,
poolclass=NullPool, # SAE <20><EFBFBD><E887AC><EFBFBD>
echo=False
)
```
### <20>?蝏嘥笆蝳<E7AC86>
#### 1. **蝳<><EFBFBD>𨀣<EFBFBD><F0A880A3><EFBFBD>霈曉<E99C88>蝵穃𧑐<E7A983><F0A79190><EFBFBD><EFBFBD><EFBFBD>霂荔<E99C82>**
```bash
# <20>?<3F>躰秤<E8BAB0>𡁏<EFBFBD><EFBFBD><E59A97>撖潸稲餈墧𦻖憭梯揖嚗?EXTRACTION_SERVICE_URL=http://extraction-service.internal:8000
EXTRACTION_SERVICE_URL=http://localhost:8000
EXTRACTION_SERVICE_URL=http://extraction-service:8000
# <20>?甇<><EFBFBD>𡁏<EFBFBD>嚗帋<E59A97> SAE <20><EFBFBD><E689B9>啗繮<E59597>𣇉<EFBFBD>摰𧼮𧑐<F0A7BCAE><F0A79190>
# SAE <20><EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?摨𠉛鍂霈輸䔮<E8BCB8>滨蔭
# 憭滚<E686AD><E6BB9A>曄內<E69B84>?VPC <20><><EFBFBD>霈輸䔮<E8BCB8><EFBFBD>"
EXTRACTION_SERVICE_URL=http://172.17.x.x:8000
```
**<2A><EFBFBD>**嚗?- SAE 摰硺<E691B0><E7A1BA>湔糓頝其蜓<E585B6><EFBFBD>嚗䔶<E59A97><E494B6>賭蝙<E8B3AD>?Docker <20>滚𦛚<E6BB9A>?- SAE <20>?K8s Service <20><EFBFBD><E7AC94><EFBFBD><E6BE86>𣳇<EFBFBD>蝵株<E89DB5><E6A0AA><EFBFBD>嚗䔶<E59A97><E494B6><EFBFBD>霈?- <20><>蝔喳戎<E596B3><E6888E>糓雿輻鍂 SAE <20><EFBFBD><E689B9>唳遬蝷箇<E89DB7> IP <20><EFBFBD>
#### 2. **蝳<><EFBFBD><EFBFBD><E588B8>譍葉蝖祉<E89D96><E7A589><EFBFBD><EFBFBD><EFBFBD>煺縑<E785BA>?*
```dockerfile
# <20>?<3F>躰秤蝷箔<E89DB7>
ENV DATABASE_PASSWORD=my-secret-password
# <20>?甇<><EFBFBD>𡁏<EFBFBD>嚗𡁜銁 SAE <20><EFBFBD><E887AC><EFBFBD>銝剝<E98A9D>蝵?```
#### 3. **蝳<>迫雿輻鍂<E8BCBB>砍𧑐<E7A08D><F0A79190><EFBFBD><E8BEA3><EFBFBD><EFBFBD><EFBFBD><E7A18B>?*
```python
# <20>?<3F>躰秤蝷箔<E89DB7><EFBFBD><EFBFBD><EFBFBD><E588B8><EFBFBD>仃嚗?output_path = '/app/output/result.txt'
with open(output_path, 'w') as f:
f.write(result)
# <20>?甇<><EFBFBD>𡁏<EFBFBD>嚗帋蝙<E5B88B>?/tmp 摮䀝葩<E4809D><EFBFBD>隞塚<E99A9E>蝏𤘪<E89D8F>銝𠹺<E98A9D><F0A0B9BA>?OSS
import tempfile
with tempfile.NamedTemporaryFile(mode='w', delete=False) as f:
f.write(result)
# 銝𠹺<E98A9D><F0A0B9BA>?OSS嚗<53><EFBFBD>?oss2 摨橒<E691A8>
# <20><><EFBFBD>𤾸<EFBFBD><F0A4BEB8>支葩<E694AF><EFBFBD>隞?```
#### 4. **蝳<>迫雿輻鍂 :latest <20><><EFBFBD><EFBFBD>鈭抒㴓憓?*
```bash
# <20>?<3F>躰秤<E8BAB0>𡁏<EFBFBD><EFBFBD><E59A97>瘜訫<E7989C>皛𡄯<E79A9B>
image: extraction-service:latest
# <20>?甇<><EFBFBD>𡁏<EFBFBD><EFBFBD>祗銋匧<E98A8B><E58CA7><EFBFBD>𧋦嚗?image: extraction-service:v1.0.0
```
#### 5. **蝳<><EFBFBD>典捆<E585B8><EFBFBD>靽格㺿隞<E3BABF><E99A9E>**
```bash
# <20>?<3F>躰秤<E8BAB0><EFBFBD><EFBFBD><EFBFBD><EFBFBD><E588B8><EFBFBD>仃嚗?# SAE Webshell <20>?vi /app/main.py
# <20>?甇<><EFBC86><E7989A>嚗?# 1. <20>砍𧑐靽格㺿隞<E3BABF><E99A9E>
# 2. <20>滚遣<E6BB9A>𨅯<EFBFBD>
# 3. <20><EFBFBD><E588B8><EFBFBD> ACR
# 4. SAE 銝剜凒<E5899C><EFBFBD><E59C88><EFBFBD><E8AE90>?```
#### 6. **蝳<>迫雿輻鍂<E8BCBB>𣳇<EFBFBD>憓鮋鵭<E9AE8B><E9B5AD><EFBFBD><EFBFBD><E69285><EFBFBD>**
```python
# <20>?<3F>躰秤蝷箔<E89DB7><EFBFBD><E59A97>摮䀹<E691AE>瞍𧶏<E79E8D>
CACHE = {} # <20><EFBFBD>蝻枏<E89DBB>嚗峕<E59A97><E5B395>𣂼<EFBFBD><F0A382BC>?
@app.post("/extract/pdf")
async def extract_pdf(file: UploadFile):
key = file.filename
if key not in CACHE:
CACHE[key] = extract(file) # <20><><EFBFBD>隡𡁏<E99AA1>蝏剖<E89D8F><E58996><EFBFBD>
return CACHE[key]
# <20>?甇<><EFBFBD>𡁏<EFBFBD>嚗帋蝙<E5B88B><EFBFBD><E586BD>𣂼捆<F0A382BC><EFBFBD>蝻枏<E89DBB>
from functools import lru_cache
@lru_cache(maxsize=100) # <20><>憭𡁶<E686AD>摮?100 銝芰<E98A9D><E88AB0>?def extract_with_cache(file_hash: str):
return extract(file_hash)
```
#### 7. **蝳<>迫敹賜裦 /tmp <20><EFBFBD><E6A180><EFBFBD>之撠誯<E692A0><E8AAAF>?*
```python
# <20>𩤃<EFBFBD> 瘜冽<E7989C>嚗锭AE 摰孵膥<E5ADB5>?/tmp <20><EFBFBD><E6A180>𡁜虜<F0A1819C>匧之撠誯<E692A0><E8AAAF><EFBFBD>憒?1-2GB嚗?# 憭<><E686AD>憭扳<E686AD>隞嗅<E99A9E><EFBFBD>◆皜<E29786><E79A9C>銝湔𧒄<E6B994><F0A79284>
import os
import tempfile
async def extract_large_pdf(file: UploadFile):
# 靽嘥<E99DBD><E598A5>唬葩<E594AC><EFBFBD>隞? with tempfile.NamedTemporaryFile(delete=False, suffix='.pdf') as tmp:
content = await file.read()
tmp.write(content)
tmp_path = tmp.name
try:
# 憭<><E686AD><EFBFBD><EFBFBD>
result = extract_pdf_pymupdf(tmp_path)
return result
finally:
# <20>?<3F>喲睸嚗𡁜<E59A97>憿餅<E686BF><E9A485><EFBFBD><EFBFBD><EFBFBD>隞? if os.path.exists(tmp_path):
os.unlink(tmp_path)
```
---
## <20><> <20><><EFBFBD>
### A. 摰峕㟲<E5B395>?requirements.txt嚗<74>𧫴畾?嚗?
```txt
# Web 獢<>
fastapi==0.115.5
uvicorn[standard]==0.32.1
python-multipart==0.0.20
# <20><><EFBFBD>𣂼<EFBFBD>
PyMuPDF==1.24.14
mammoth==1.8.0
python-docx==1.1.2
# <20>唳旿憭<E697BF><E686AD>
polars==1.17.1
numpy==1.26.4
# 颲<>𨭌撌亙<E6928C>
langdetect==1.0.9
chardet==5.2.0
aiofiles==23.2.1
# <20>唳旿摨?sqlalchemy==2.0.25
asyncpg==0.29.0
# <20><EFBFBD>鈭?OSS
oss2==2.18.3
# <20><EFBFBD><E4BA99>𣬚<EFBFBD><F0A3AC9A>?python-json-logger==2.0.7
psutil==5.9.8
```
### B. Dockerfile 摰峕㟲<E5B395>?
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>齿<EFBFBD> [<5B><>遣 Docker <20>𨅯<EFBFBD> - 甇仿炊 1](#甇仿炊-1<>𥕦遣隡睃<E99AA1><E79D83>?dockerfile)
### C. <20>砍𧑐瘚贝<E7989A><E8B49D>𡁏𧋦
```bash
#!/bin/bash
# test-local.sh
echo "Building Docker image..."
docker build -t extraction-service:test .
echo "Starting container..."
docker run -d \
--name extraction-test \
-p 8000:8000 \
-e DATABASE_URL="postgresql://user:pass@host:5432/db" \
extraction-service:test
echo "Waiting for service to start..."
sleep 10
echo "Testing health endpoint..."
curl http://localhost:8000/health
echo "Testing PDF extraction..."
curl -X POST \
-F "file=@test.pdf" \
http://localhost:8000/extract/pdf
echo "Cleaning up..."
docker stop extraction-test
docker rm extraction-test
echo "Done!"
```
### D. <20><EFBFBD><E8A9A8><EFBFBD><EFBFBD>暹𦻖
- [<5B><EFBFBD>鈭?SAE <20><>﹝](https://help.aliyun.com/product/134532.html)
- [Docker <20><>﹝](https://docs.docker.com/)
- [FastAPI <20><>﹝](https://fastapi.tiangolo.com/)
- [PyMuPDF <20><>﹝](https://pymupdf.readthedocs.io/)
- [Polars <20><>﹝](https://pola-rs.github.io/polars/)
---
## <20>㴓 敹恍<E695B9><EFBFBD><E7AC94>?
### 撣貊鍂<E8B28A>賭誘
```bash
# <20><><EFBFBD>𨅯<EFBFBD>
docker build -t extraction-service:v1.0 .
# <20><EFBFBD><E588B8><EFBFBD><EFBFBD>?docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.0
# <20><EFBFBD> SAE <20><EFBFBD>
# SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F><EFBFBD>
# <20>滚鍳 SAE 摨𠉛鍂
# SAE <20><EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>滚鍳
# 瘚贝<E7989A><E8B49D><EFBFBD><EFBFBD>餈鮋<E9A488>𡁏<EFBFBD>?curl http://extraction-service.internal:8000/health
# <20><EFBFBD>摰孵膥韏<E886A5><E99F8F>
docker stats extraction-service
```
### <20>喲睸<E596B2>滨蔭
| <20>滨蔭憿?| <20><EFBFBD><E588BB>?| 霂湔<E99C82> |
|-------|--------|------|
| CPU | 1<>?| <20><EFBFBD><E598A5>滨蔭 |
| <20><><EFBFBD> | 2GB | 銝滚鉄 Nougat |
| 摰硺<E691B0><E7A1BA>?| 1-3 | <20>芸𢆡撘寞<E69298>找撓蝻?|
| 頞<>𧒄<EFBFBD>園𡢿 | 300蝘?| 憭扳<E686AD>隞嗅<E99A9E><E59785>?|
| <20>亙熒璉<E78692><E79289>?| 30蝘?| <20><EFBFBD>撱嗉<E692B1> |
| Worker <20><EFBFBD> | 2 | Uvicorn workers |
---
**<EFBFBD><EFBFBD>﹝蝏湔擪**嚗?- 憒<><E68692><EFBFBD><EFBFBD><E6A185>硋遣霈殷<E99C88>霂瑁<E99C82>蝟餅<E89D9F><E9A485><EFBFBD><EFBFBD>
- <20><><EFBFBD>擧凒<E693A7><EFBFBD>2025-12-13
- 銝𧢲活摰⊥䰻嚗?025-03-13