Major Changes: - Add StreamingService with OpenAI Compatible format - Upgrade Chat component V2 with Ant Design X integration - Implement AIA module with 12 intelligent agents - Update API routes to unified /api/v1 prefix - Update system documentation Backend (~1300 lines): - common/streaming: OpenAI Compatible adapter - modules/aia: 12 agents, conversation service, streaming integration - Update route versions (RVW, PKB to v1) Frontend (~3500 lines): - modules/aia: AgentHub + ChatWorkspace (100% prototype restoration) - shared/Chat: AIStreamChat, ThinkingBlock, useAIStream Hook - Update API endpoints to v1 Documentation: - AIA module status guide - Universal capabilities catalog - System overview updates - All module documentation sync Tested: Stream response verified, authentication working Status: AIA V2.0 core completed (85%)
1351 lines
37 KiB
Markdown
1351 lines
37 KiB
Markdown
# Python 敺格<E695BA><E6A0BC>?SAE 摰孵膥<E5ADB5>函蔡摰<E894A1><E691B0><EFBFBD><EFBFBD><EFBFBD>
|
||
|
||
**<EFBFBD><EFBFBD>﹝<EFBFBD><EFBFBD>𧋦**: v1.1 (靽桀<E99DBD><E6A180><EFBFBD><EFBFBD><EFBFBD>啣<EFBFBD><E595A3>䔶葩<E494B6>嗆<EFBFBD>隞園䔮憸?
|
||
**<EFBFBD>𥕦遣<EFBFBD>園𡢿**: 2025-12-13
|
||
**<EFBFBD><EFBFBD><EFBFBD>𦒘耨霈?*: 2025-12-13
|
||
**<EFBFBD><EFBFBD>鍂<EFBFBD><EFBFBD>凒**: AIclinicalresearch 撟喳蝱 - Python 敺格<E695BA><E6A0BC>∴<EFBFBD>extraction_service嚗?
|
||
**<EFBFBD>格<EFBFBD>霂餉<EFBFBD>?*: 餈鞟輕撌亦<E6928C>撣<EFBFBD><E692A3><EFBFBD><EFBFBD>蝡臬<E89DA1><E887AC>穃極蝔见<E89D94>
|
||
|
||
**v1.1 <20>湔鰵<E6B994>亙<EFBFBD>**:
|
||
- <20>?靽桀<E99DBD>嚗𡁜<E59A97>蝵穃𧑐<E7A983><F0A79190>雿輻鍂 SAE <20>批<EFBFBD><E689B9>唳遬蝷箇<E89DB7><E7AE87>笔<EFBFBD> IP嚗<50><E59A97><EFBFBD>𨀣<EFBFBD><F0A880A3>笔<EFBFBD>嚗?- <20>?隡睃<E99AA1>嚗鋽ockerfile 蝟餌<E89D9F>靘肽<E99D98>霂湔<E99C82>嚗ēibmupdf-dev <20>舫<EFBFBD>㚁<EFBFBD>
|
||
- <20>?<3F>啣<EFBFBD>嚗𡁶&靽?/tmp <20>桀<EFBFBD><E6A180>臬<EFBFBD>嚗<EFBFBD>之<EFBFBD><E4B98B>辣銝湔𧒄摮睃<E691AE>嚗?- <20>?摰<><E691B0>嚗𡁜<E59A97><F0A1819C>賡<EFBFBD>霂<EFBFBD><E99C82>蝔见<E89D94><E8A781>烐綉<E78390><E7B689><EFBFBD>
|
||
|
||
---
|
||
|
||
## <20><> <20><>﹝<EFBFBD>桀<EFBFBD>
|
||
|
||
1. [銝箔<EFBFBD>銋<EFBFBD><EFBFBD>㗇𥋘 SAE 摰孵膥<E5ADB5>函蔡](#銝箔<E98A9D>銋<EFBFBD><E98A8B>㗇𥋘-sae-摰孵膥<E5ADB5>函蔡)
|
||
2. [<5B>函蔡<E587BD>嗆<EFBFBD><E59786>霄(#<23>函蔡<E587BD>嗆<EFBFBD><E59786>?
|
||
3. [<EFBFBD>滨蔭<EFBFBD><EFBFBD><EFBFBD>皜<EFBFBD><EFBFBD>](#<23>滨蔭<E6BBA8><E894AD><EFBFBD>皜<EFBFBD><E79A9C>)
|
||
4. [Python <20>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>](#python-<2D>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>)
|
||
5. [靘肽<EFBFBD>隡睃<EFBFBD>蝑𣇉裦](#靘肽<E99D98>隡睃<E99AA1>蝑𣇉裦)
|
||
6. [<EFBFBD><EFBFBD>遣 Docker <20>𨅯<EFBFBD>](#<23><>遣-docker-<2D>𨅯<EFBFBD>)
|
||
7. [<EFBFBD>函蔡<EFBFBD>?SAE](#<23>函蔡<E587BD>?sae)
|
||
8. [瘚贝<E7989A>銝𡡞<E98A9D>霂<EFBFBD>(#瘚贝<EFBFBD>銝𡡞<EFBFBD>霂?
|
||
9. [<5B>烐綉銝𡒊輕<F0A1928A>也(#<23>烐綉銝𡒊輕<F0A1928A>?
|
||
10. [<EFBFBD><EFBFBD><EFBFBD><EFBFBD>埝䰻](#<23><><EFBFBD><EFBFBD>埝䰻)
|
||
11. [瘜冽<E7989C>鈭钅★銝𡒊<E98A9D>敹䀉(#瘜冽<EFBFBD>鈭钅★銝𡒊<EFBFBD>敹?
|
||
|
||
---
|
||
|
||
## 銝箔<E98A9D>銋<EFBFBD><E98A8B>㗇𥋘 SAE 摰孵膥<E5ADB5>函蔡
|
||
|
||
### <20>?SAE 摰孵膥<E5ADB5>函蔡 vs. SAE Python 餈鞱<E9A488><E99EB1>?
|
||
| 撖寞<E69296>蝏游漲 | SAE Python 餈鞱<E9A488><E99EB1>?| SAE 摰孵膥<E5ADB5>函蔡 (<28>刻<EFBFBD>) |
|
||
|---------|-----------------|------------------|
|
||
| **蝟餌<E89D9F>靘肽<E99D98>** | <20>?<3F>䭾<EFBFBD>摰㕑<E691B0>蝟餌<E89D9F>摨?| <20>?摰<><E691B0><EFBFBD>舀綉 |
|
||
| **憭齿<E686AD>靘肽<E99D98>** | <20>?PyMuPDF/OpenCV <20>仿<EFBFBD> | <20>?摰𣬚<E691B0><F0A3AC9A>舀<EFBFBD> |
|
||
| **<EFBFBD>臬<EFBFBD>銝<EFBFBD><EFBFBD>湔<EFBFBD>?* | <20>𩤃<EFBFBD> 鈭睲<E988AD><E79DB2>峕𧋦<E5B395>啣虾<E595A3>賭<EFBFBD><E8B3AD>?| <20>?<3F>砍𧑐頝煾<E9A09D>?= 鈭睲<E988AD>頝煾<E9A09D>?|
|
||
| **Nougat (Torch)** | <20>?<3F><>𧋦<EFBFBD>脩<EFBFBD>憌𡡞埯擃?| <20>?頧餅𠹭<E9A485>舀<EFBFBD> |
|
||
| **<EFBFBD>函蔡<EFBFBD>孵<EFBFBD>** | 銝𠹺<E98A9D> ZIP <20>?| <20>券<EFBFBD>?Docker <20>𨅯<EFBFBD> |
|
||
| **<EFBFBD>臬𢆡<EFBFBD>笔漲** | 敹恬<E695B9>< 5蝘𡜐<E89D98> | 颲<>翰嚗?0-20蝘𡜐<E89D98> |
|
||
| **餈鞟輕憭齿<E686AD>摨?* | 雿?| 銝?|
|
||
| **<EFBFBD>刻<EFBFBD>摨?* | <20>?銝齿綫<E9BDBF>?| <20>?撘箇<E69298><E7AE87>刻<EFBFBD> |
|
||
|
||
### <20>㴓 <20>詨<EFBFBD><E8A9A8>笔<EFBFBD>
|
||
|
||
#### 1. **蝟餌<E89D9F>蝥找<E89DA5>韏𣇉撩憭梧<E686AD><E6A2A7>游𦶢<E6B8B8>桅<EFBFBD>嚗?*
|
||
|
||
```python
|
||
# <20>函<EFBFBD>隞<EFBFBD><E99A9E>雿輻鍂鈭<E98D82><E988AD>鈭𥕦<E988AD>嚗?import fitz # PyMuPDF <20>?靘肽<E99D98> libmupdf.so, libfreetype.so
|
||
import cv2 # OpenCV <20>?靘肽<E99D98> libGL.so.1, libgthread-2.0.so
|
||
import polars # Polars <20>?靘肽<E99D98> libgomp.so
|
||
```
|
||
|
||
**SAE Python 餈鞱<E9A488><E99EB1>?*嚗?```bash
|
||
<EFBFBD>?<3F>芣<EFBFBD>靘𥟇<E99D98><F0A59F87>?Python <20>臬<EFBFBD>
|
||
<EFBFBD>?<3F>䭾<EFBFBD><E4ADBE>扯<EFBFBD> apt-get install
|
||
<EFBFBD>?餈鞱<E9A488><E99EB1>嗆𥁒<E59786>辷<EFBFBD>ImportError: libGL.so.1: cannot open shared object file
|
||
```
|
||
|
||
**SAE 摰孵膥<E5ADB5>函蔡**嚗?```dockerfile
|
||
<EFBFBD>?Dockerfile 銝剛䌊<E5899B>勗<EFBFBD>鋆<EFBFBD><E98B86>
|
||
RUN apt-get update && apt-get install -y \
|
||
libgl1-mesa-glx \
|
||
libglib2.0-0 \
|
||
libgomp1
|
||
```
|
||
|
||
#### 2. **<2A>臬<EFBFBD>摰<EFBFBD><E691B0><EFBFBD>舀綉**
|
||
|
||
```
|
||
<EFBFBD>砍𧑐撘<EFBFBD><EFBFBD>𤑳㴓憓?= Docker <20>𨅯<EFBFBD> = SAE <20>煺漣<E785BA>臬<EFBFBD>
|
||
```
|
||
|
||
- <20>典銁<E585B8>砍𧑐 Docker 銝剛<E98A9D><E5899B>帋<EFBFBD>嚗峕綫<E5B395>?SAE 撠曹<E692A0>摰朞<E691B0>頝煾<E9A09D>?- 瘝⊥<E7989D>"<22>砍𧑐憟賜鍂<E8B39C><E98D82><EFBFBD>銝𦠜𥁒<F0A6A09C>?<3F><>䔮憸?
|
||
#### 3. **<2A>拙<EFBFBD><E68B99>批撩**
|
||
|
||
```
|
||
<EFBFBD>芣䔉<EFBFBD><EFBFBD>瘙<EFBFBD><EFBFBD>
|
||
<20>鎿<EFBFBD> 瘛餃<E7989B> Nougat OCR (<28><>閬?PyTorch + GPU <20>舀<EFBFBD>)
|
||
<20>鎿<EFBFBD> 瘛餃<E7989B><E9A483>曉<EFBFBD>憸<EFBFBD><E686B8><EFBFBD>?(<28><>閬?OpenCV)
|
||
<20>鎿<EFBFBD> 瘛餃<E7989B><E9A483>游<EFBFBD><E6B8B8><EFBFBD>﹝<EFBFBD>澆<EFBFBD> (<28><>閬<EFBFBD>凒憭𡁶頂蝏笔<E89D8F>)
|
||
<20>婙<EFBFBD> 摰孵膥<E5ADB5>函蔡<E587BD>質<EFBFBD>頧餅𠹭<E9A485>舀<EFBFBD>
|
||
```
|
||
|
||
#### 4. **餈鞟輕蝏煺<E89D8F>**
|
||
|
||
```
|
||
<EFBFBD>函<EFBFBD><EFBFBD>港<EFBFBD><EFBFBD>嗆<EFBFBD>嚗? <20>鎿<EFBFBD> <20>滨垢 Nginx <20>?SAE 摰孵膥
|
||
<20>鎿<EFBFBD> <20>𡒊垢 Node.js <20>?SAE 摰孵膥
|
||
<20>婙<EFBFBD> Python <20>滚𦛚 <20>?SAE 摰孵膥 <20>?(蝏煺<E89D8F>蝞∠<E89D9E>)
|
||
```
|
||
|
||
---
|
||
|
||
## <20>函蔡<E587BD>嗆<EFBFBD><E59786>?
|
||
```
|
||
<EFBFBD>𢞖<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?<3F>? <20>輸<EFBFBD>鈭烐沲<E78390>? <20>?<3F>? <20>?<3F>? <20>𢞖<EFBFBD><F0A29E96><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>𢞖<EFBFBD><F0A29E96><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>?<3F>? <20>? SAE (<28>𡒊垢) <20>?<3F>𣂼<EFBFBD>蝵爗<E89DB5> <20>? SAE (Python 敺格<E695BA><E6A0BC>? <20>? <20>?<3F>? <20>? <20>? <20>? <20>? <20>?<3F>? <20>? Node.js <20>? <20>? <20>𢞖<EFBFBD><F0A29E96><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>? <20>?<3F>? <20>? Backend <20>? <20>? <20>? Docker 摰孵膥: <20>? <20>? <20>?<3F>? <20>? <20>? <20>? <20>? - FastAPI <20>? <20>? <20>?<3F>? <20>? <20>? <20>? <20>? - PyMuPDF <20>? <20>? <20>?<3F>? <20>? <20>? <20>? <20>? - Polars <20>? <20>? <20>?<3F>? <20>婙<EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>? <20>? - Mammoth <20>? <20>? <20>?<3F>? <20>? <20>? <20>婙<EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>? <20>?<3F>? <20>? <20>婙<EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>?<3F>? <20>? <20>?<3F>? <20>鎿<EFBFBD><E98EBF><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? RDS PostgreSQL 15 <20>?<3F>? <20>婙<EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? OSS (<28><>﹝摮睃<E691AE>) <20>?<3F>婙<EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?```
|
||
|
||
**<2A>喲睸<E596B2>?*嚗?- Python 敺格<E695BA><E6A0BC>∪<EFBFBD> Node.js <20>𡒊垢<F0A1928A>賡<EFBFBD>蝵脣銁 SAE 銝𠺪<E98A9D><F0A0BAAA>䔶<EFBFBD> VPC嚗?- <20>朞<EFBFBD><E69C9E><EFBFBD><EFBFBD><EFBFBD>帋縑嚗<E7B891>辣<EFBFBD>?< 5ms嚗?- <20>曹澈 RDS <20>?OSS 韏<><E99F8F>
|
||
|
||
---
|
||
|
||
## <20>滨蔭<E6BBA8><E894AD><EFBFBD>皜<EFBFBD><E79A9C>
|
||
|
||
### <20>?敹<><E695B9>韏<EFBFBD><E99F8F>
|
||
|
||
| 韏<><E99F8F>蝐餃<E89D90> | <20>滨蔭撱箄悅 | 憸<>摯韐寧鍂 | <20>券<EFBFBD>?|
|
||
|---------|---------|---------|-----|
|
||
| **SAE 摨𠉛鍂** | 1<>?G / 1摰硺<E691B0> | ~100<30>?<3F>?| 餈鞱<E9A488> Python <20>滚𦛚 |
|
||
| **摰孵膥<E5ADB5>𨅯<EFBFBD>隞枏<E99A9E>** | <20>輸<EFBFBD>鈭?ACR 銝芯犖<E88AAF>?| <20>滩晶嚗?GB嚗?| 摮睃<E691AE> Docker <20>𨅯<EFBFBD> |
|
||
| **OSS 摮睃<E691AE>** | 撌脫<E6928C>嚗<EFBFBD><E59A97><EFBFBD>剁<EFBFBD> | 0<><30><EFBFBD>憓鮋<E68693>嚗?| <20><>﹝摮睃<E691AE> |
|
||
| **RDS PostgreSQL** | 撌脫<E6928C>嚗<EFBFBD><E59A97><EFBFBD>剁<EFBFBD> | 0<>?| <20>唳旿摨?|
|
||
|
||
### <20>?頧臭辣<E887AD><E8BEA3><EFBFBD>
|
||
|
||
```bash
|
||
# <20>砍𧑐撘<F0A79190><E69298>烐㦤<E78390>券<EFBFBD>閬<EFBFBD><E996AC>鋆?- Docker Desktop
|
||
- <20>輸<EFBFBD>鈭?CLI嚗<49>虾<EFBFBD>㚁<EFBFBD>
|
||
|
||
# 銝漤<E98A9D>閬<EFBFBD>銁 SAE 銝𠰴<E98A9D>鋆<EFBFBD>遙雿蓥<E99BBF>镼選<E995BC>摰孵膥撌脣<E6928C><E884A3>恬<EFBFBD>
|
||
```
|
||
|
||
### <20>?韐血噡銝擧<E98A9D><E693A7>?
|
||
- <20>輸<EFBFBD>鈭𤏸揭<F0A48FB8>瘀<EFBFBD>撌脫<E6928C>嚗?- 摰孵膥<E5ADB5>𨅯<EFBFBD>隞枏<E99A9E>霈輸䔮<E8BCB8><E494AE><EFBFBD>
|
||
- SAE 摨𠉛鍂<F0A0899B>𥕦遣<F0A595A6><E981A3><EFBFBD>
|
||
|
||
---
|
||
|
||
## Python <20>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>
|
||
|
||
### <20><> 敶枏<E695B6><E69E8F>滚𦛚璁<F0A69B9A><E79281>
|
||
|
||
#### <20>滚𦛚 1: extraction_service嚗<65><E59A97>獢<EFBFBD><E78DA2><EFBFBD>吔<EFBFBD>
|
||
|
||
**雿滨蔭**: `AIclinicalresearch/extraction_service/`
|
||
|
||
**<2A>券<EFBFBD>?*:
|
||
- PKB 璅∪<E79285>: 銝𠹺<E98A9D><F0A0B9BA><EFBFBD>﹝<EFBFBD>?Dify <20>㵪<EFBFBD><E3B5AA><EFBFBD><EFBFBD><EFBFBD>𡝗<EFBFBD><F0A19D97>?- ASL 璅∪<E79285>: <20>𣂼<EFBFBD> PDF <20>冽<EFBFBD><E586BD>其<EFBFBD>瘛勗漲<E58B97><E6BCB2>粉
|
||
- DC 璅∪<E79285>: <20>𣂼<EFBFBD> Excel/CSV <20>唳旿
|
||
|
||
**<2A>詨<EFBFBD><E8A9A8><EFBFBD>辣**:
|
||
```
|
||
extraction_service/
|
||
<EFBFBD>鎿<EFBFBD><EFBFBD><EFBFBD> main.py # FastAPI <20>亙藁
|
||
<EFBFBD>鎿<EFBFBD><EFBFBD><EFBFBD> requirements.txt # 靘肽<E99D98><E882BD>𡑒”
|
||
<EFBFBD>鎿<EFBFBD><EFBFBD><EFBFBD> services/
|
||
<EFBFBD>? <20>鎿<EFBFBD><E98EBF><EFBFBD> pdf_extractor.py # PDF <20>𣂼<EFBFBD>嚗<EFBFBD><E59A97>摨血膥嚗?<3F>? <20>鎿<EFBFBD><E98EBF><EFBFBD> pymupdf_extractor.py # PyMuPDF 摰䂿緵
|
||
<EFBFBD>? <20>鎿<EFBFBD><E98EBF><EFBFBD> nougat_extractor.py # Nougat OCR 摰䂿緵
|
||
<EFBFBD>? <20>鎿<EFBFBD><E98EBF><EFBFBD> docx_extractor.py # Word <20>𣂼<EFBFBD>
|
||
<EFBFBD>? <20>婙<EFBFBD><E5A999><EFBFBD> txt_extractor.py # 蝥舀<E89DA5><E88880>祆<EFBFBD><E7A586>?<3F>婙<EFBFBD><E5A999><EFBFBD> operations/
|
||
<20>婙<EFBFBD><E5A999><EFBFBD> fillna_operations.py # <20>唳旿皜<E697BF><E79A9C>嚗㇊olars嚗?```
|
||
|
||
**<2A>喲睸蝡舐<E89DA1>**:
|
||
```python
|
||
POST /extract/pdf # PDF <20>𣂼<EFBFBD>
|
||
POST /extract/docx # Word <20>𣂼<EFBFBD>
|
||
POST /extract/txt # <20><>𧋦<EFBFBD>𣂼<EFBFBD>
|
||
POST /operations/fillna # <20>唳旿皜<E697BF><E79A9C>
|
||
```
|
||
|
||
### <20><> 靘肽<E99D98><E882BD><EFBFBD><EFBFBD>
|
||
|
||
#### 敶枏<E695B6> `requirements.txt` <20><>捆嚗?
|
||
```txt
|
||
fastapi==0.115.5
|
||
uvicorn[standard]==0.32.1
|
||
python-multipart==0.0.20
|
||
PyMuPDF==1.24.14
|
||
pdfplumber==0.11.4
|
||
nougat-ocr==0.1.17
|
||
torch==2.1.0
|
||
torchvision==0.16.0
|
||
mammoth==1.8.0
|
||
python-docx==1.1.2
|
||
langdetect==1.0.9
|
||
chardet==5.2.0
|
||
polars==1.17.1
|
||
numpy==1.26.4
|
||
```
|
||
|
||
#### 靘肽<E99D98>憭批<E686AD>憸<EFBFBD>摯嚗?
|
||
| <20><><EFBFBD> | 憭批<E686AD> | <20>券<EFBFBD>?| <20>臬炏敹<E7828F><E695B9> |
|
||
|-----|------|-----|---------|
|
||
| **PyMuPDF** | ~50MB | PDF <20>𣂼<EFBFBD>嚗<EFBFBD>瓲敹<E793B2><E695B9> | <20>?敹<><E695B9> |
|
||
| **pdfplumber** | ~10MB | PDF 銵冽聢<E586BD>𣂼<EFBFBD> | <20>𩤃<EFBFBD> <20>舫<EFBFBD>㚁<EFBFBD><E39A81><EFBFBD>𧊋雿輻鍂嚗?|
|
||
| **nougat-ocr** | ~300MB | 摮行钟霈箸<E99C88> OCR | <20>𩤃<EFBFBD> <20>嗆挾<E59786>改<EFBFBD>閫<EFBFBD><E996AB><EFBFBD><EFBFBD><EFBFBD> |
|
||
| **torch** | ~800MB | Nougat 靘肽<E99D98> | <20>𩤃<EFBFBD> <20>嗆挾<E59786>?|
|
||
| **torchvision** | ~100MB | Nougat 靘肽<E99D98> | <20>𩤃<EFBFBD> <20>嗆挾<E59786>?|
|
||
| **mammoth** | ~5MB | Word <20>𣂼<EFBFBD> | <20>?敹<><E695B9> |
|
||
| **python-docx** | ~3MB | Word <20>𣂼<EFBFBD> | <20>?敹<><E695B9> |
|
||
| **polars** | ~50MB | <20>唳旿皜<E697BF><E79A9C> | <20>?敹<><E695B9> |
|
||
| **numpy** | ~20MB | <20>啣<EFBFBD>潸恣蝞?| <20>?敹<><E695B9> |
|
||
| **fastapi** | ~10MB | Web 獢<>沲 | <20>?敹<><E695B9> |
|
||
| **uvicorn** | ~5MB | ASGI <20>滚𦛚<E6BB9A>?| <20>?敹<><E695B9> |
|
||
| **<2A>嗡<EFBFBD>** | ~10MB | 颲<>𨭌摨?| <20>?敹<><E695B9> |
|
||
| **<2A>餉恣嚗<E681A3>鉄 Nougat嚗?* | **~1.4GB** | - | - |
|
||
| **<2A>餉恣嚗<E681A3><E59A97><EFBFBD>?Nougat嚗?* | **~163MB** | - | - |
|
||
|
||
---
|
||
|
||
## 靘肽<E99D98>隡睃<E99AA1>蝑𣇉裦
|
||
|
||
### <20>㴓 <20>嗆挾 1嚗𡁏<E59A97>撠誩<E692A0><E8AAA9>函蔡嚗<E894A1>綫<EFBFBD>鞟鍂鈭𡡞<E988AD>甈⊿<E79488>蝵莎<E89DB5>
|
||
|
||
**<2A>格<EFBFBD>**: 敹恍<E695B9>煺<EFBFBD>蝥選<E89DA5>撉諹<E69289><E8ABB9>詨<EFBFBD><E8A9A8>蠘<EFBFBD>
|
||
|
||
**蝑𣇉裦**:
|
||
- <20>?靽萘<E99DBD> PyMuPDF嚗<46>瓲敹?PDF <20>𣂼<EFBFBD>嚗?- <20>?靽萘<E99DBD> Mammoth/python-docx嚗Áord <20>𣂼<EFBFBD>嚗?- <20>?靽萘<E99DBD> Polars嚗<73>㺭<EFBFBD>格<EFBFBD>瘣梹<E798A3>
|
||
- <20>?<3F><>𧒄蝘駁膄 Nougat嚗<74><E59A97>蝘臬之嚗䔶蝙<E494B6>券<EFBFBD><E588B8><EFBFBD><EFBFBD>嚗?
|
||
**隡睃<E99AA1><E79D83>𡒊<EFBFBD> `requirements.txt`**:
|
||
|
||
```txt
|
||
# Web 獢<>沲
|
||
fastapi==0.115.5
|
||
uvicorn[standard]==0.32.1
|
||
python-multipart==0.0.20
|
||
|
||
# <20><>﹝<EFBFBD>𣂼<EFBFBD>嚗<EFBFBD>瓲敹<E793B2><E695B9>
|
||
PyMuPDF==1.24.14
|
||
mammoth==1.8.0
|
||
python-docx==1.1.2
|
||
|
||
# <20>唳旿憭<E697BF><E686AD>
|
||
polars==1.17.1
|
||
numpy==1.26.4
|
||
|
||
# 颲<>𨭌撌亙<E6928C>
|
||
langdetect==1.0.9
|
||
chardet==5.2.0
|
||
|
||
# <20>亙<EFBFBD><E4BA99>𣬚<EFBFBD><F0A3AC9A>?python-json-logger==2.0.7
|
||
```
|
||
|
||
**<2A>𨅯<EFBFBD>憭批<E686AD>憸<EFBFBD>摯**: ~500MB嚗<42>鉄 Python <20>箇<EFBFBD><E7AE87>𨅯<EFBFBD>嚗?
|
||
**隞<><E99A9E>靽格㺿**:
|
||
|
||
```python
|
||
# services/pdf_extractor.py
|
||
|
||
# 瘜券<E7989C><E588B8>?Nougat <20>詨<EFBFBD>隞<EFBFBD><E99A9E>
|
||
# from .nougat_extractor import extract_pdf_nougat, check_nougat_available
|
||
|
||
async def extract_pdf(pdf_path: str, filename: str):
|
||
"""PDF <20>𣂼<EFBFBD>嚗<EFBFBD>𧫴畾?嚗帋<E59A97> PyMuPDF嚗?""
|
||
|
||
# 璉<>瘚贝祗閮<E7A597><E996AE>峕<EFBFBD>獢<EFBFBD>掩<EFBFBD>? language = detect_language(pdf_path)
|
||
is_academic = detect_academic_paper(pdf_path)
|
||
|
||
# <20>嗆挾1嚗𡁶凒<F0A181B6>乩蝙<E4B9A9>?PyMuPDF
|
||
text = extract_pdf_pymupdf(pdf_path)
|
||
|
||
# <20>嗆挾2嚗𡁜虾隞亙<E99A9E><E4BA99>?Nougat <20>滨漣<E6BBA8>餉<EFBFBD>
|
||
# if language == 'english' and is_academic:
|
||
# try:
|
||
# if check_nougat_available():
|
||
# text = extract_pdf_nougat(pdf_path)
|
||
# except:
|
||
# text = extract_pdf_pymupdf(pdf_path) # <20>滨漣
|
||
|
||
return {
|
||
'text': text,
|
||
'method': 'pymupdf',
|
||
'language': language,
|
||
'is_academic': is_academic
|
||
}
|
||
```
|
||
|
||
### <20>㴓 <20>嗆挾 2嚗𡁜<E59A97><F0A1819C>湧<EFBFBD>蝵莎<E89DB5><E88E8E>芣䔉<E88AA3><E49489>閬<EFBFBD>𧒄嚗?
|
||
**<2A>嗆㦤**:
|
||
- 敶梶鍂<E6A2B6>瑕<EFBFBD>擐<EFBFBD>㘚<EFBFBD><E3989A>郎<EFBFBD>航捏<E888AA><E68D8F><EFBFBD><EFBFBD>𤥁捶<F0A4A581>譍<EFBFBD>雿單𧒄
|
||
- <20>㕑雲憭毺<E686AD> GPU 韏<><E99F8F><EFBFBD>?
|
||
**蝑𣇉裦**:
|
||
- <20>?<3F>惩<EFBFBD> Nougat + Torch
|
||
- <20>?雿輻鍂 GPU 摰硺<E691B0>嚗𠄎AE <20>桀<EFBFBD>銝齿𣈲<E9BDBF>?GPU嚗屸<E59A97>餈<EFBFBD>宏<EFBFBD>?ECS嚗?
|
||
**摰峕㟲<E5B395>?`requirements.txt`**:
|
||
|
||
```txt
|
||
# <20>W<EFBFBD><EFBCB7>券<EFBFBD>靘肽<E99D98>嚗<EFBFBD><E59A97><EFBFBD>?Nougat嚗?fastapi==0.115.5
|
||
uvicorn[standard]==0.32.1
|
||
python-multipart==0.0.20
|
||
PyMuPDF==1.24.14
|
||
pdfplumber==0.11.4
|
||
nougat-ocr==0.1.17
|
||
torch==2.1.0
|
||
torchvision==0.16.0
|
||
mammoth==1.8.0
|
||
python-docx==1.1.2
|
||
langdetect==1.0.9
|
||
chardet==5.2.0
|
||
polars==1.17.1
|
||
numpy==1.26.4
|
||
```
|
||
|
||
**<2A>𨅯<EFBFBD>憭批<E686AD>憸<EFBFBD>摯**: ~2GB
|
||
|
||
---
|
||
|
||
## <20><>遣 Docker <20>𨅯<EFBFBD>
|
||
|
||
### 甇仿炊 1嚗𡁜<E59A97>撱箔<E692B1><E7AE94>𣇉<EFBFBD> Dockerfile
|
||
|
||
<EFBFBD>?`extraction_service/` <20>桀<EFBFBD>銝见<E98A9D>撱?`Dockerfile`:
|
||
|
||
```dockerfile
|
||
# ========================================
|
||
# 憭𡁻𧫴畾菜<E795BE>撱綽<E692B1><E7B6BD>誩<EFBFBD><E8AAA9>𨅯<EFBFBD>雿梶妖
|
||
# ========================================
|
||
|
||
# <20>嗆挾 1: <20><>遣<EFBFBD>嗆挾嚗<E68CBE><E59A97>鋆<EFBFBD><E98B86>韏吔<E99F8F>
|
||
FROM python:3.11-slim as builder
|
||
|
||
# 霈曄蔭撌乩<E6928C><E4B9A9>桀<EFBFBD>
|
||
WORKDIR /app
|
||
|
||
# 摰㕑<E691B0>蝟餌<E89D9F>靘肽<E99D98>嚗<EFBFBD><E59A97>撱箸𧒄<E7AEB8><F0A79284>閬<EFBFBD><E996AC>
|
||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||
gcc \
|
||
g++ \
|
||
make \
|
||
libffi-dev \
|
||
libssl-dev \
|
||
&& rm -rf /var/lib/apt/lists/*
|
||
|
||
# 憭滚<E686AD>靘肽<E99D98><E882BD><EFBFBD>辣
|
||
COPY requirements.txt .
|
||
|
||
# 摰㕑<E691B0> Python 靘肽<E99D98><E882BD>啗<EFBFBD><E59597>毺㴓憓?RUN python -m venv /opt/venv
|
||
ENV PATH="/opt/venv/bin:$PATH"
|
||
RUN pip install --no-cache-dir --upgrade pip && \
|
||
pip install --no-cache-dir -r requirements.txt
|
||
|
||
# ========================================
|
||
# <20>嗆挾 2: 餈鞱<E9A488><E99EB1>嗆挾嚗<E68CBE><E59A97>撠誩<E692A0><E8AAA9>𨅯<EFBFBD>嚗?# ========================================
|
||
FROM python:3.11-slim
|
||
|
||
# 霈曄蔭撌乩<E6928C><E4B9A9>桀<EFBFBD>
|
||
WORKDIR /app
|
||
|
||
# 摰㕑<E691B0>餈鞱<E9A488><E99EB1>嗡<EFBFBD>韏吔<E99F8F>蝟餌<E89D9F>蝥批<E89DA5> + <20>嗅躹<E59785>唳旿嚗?RUN apt-get update && apt-get install -y --no-install-recommends \
|
||
# PyMuPDF 靘肽<E99D98>
|
||
# 瘜剁<E7989C>libmupdf-dev <20>𡁜虜<F0A1819C>其<EFBFBD>蝻𤥁<E89DBB>嚗俰ip 摰㕑<E691B0><E39591>?PyMuPDF wheel <20><>歇<EFBFBD>芸蒂<E88AB8>冽<EFBFBD><E586BD><EFBFBD>
|
||
# 靽萘<E99DBD>摰<EFBFBD><E691B0>銝箔<E98A9D><E7AE94>抬<EFBFBD>憒<EFBFBD><E68692><EFBFBD>西澈<E8A5BF>臬<EFBFBD>霂閧宏<E996A7>文<EFBFBD>撉諹<E69289>
|
||
libmupdf-dev \
|
||
libfreetype6 \
|
||
libjpeg62-turbo \
|
||
libopenjp2-7 \
|
||
# Polars 靘肽<E99D98>
|
||
libgomp1 \
|
||
# <20>嗡<EFBFBD>撌亙<E6928C>
|
||
curl \
|
||
# <20>嗅躹<E59785>唳旿
|
||
tzdata \
|
||
&& rm -rf /var/lib/apt/lists/*
|
||
|
||
# <20>𩤃<EFBFBD> 蝏煺<E89D8F><E785BA>嗅躹嚗鋫sia/Shanghai
|
||
ENV TZ=Asia/Shanghai
|
||
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
|
||
|
||
# 蝖桐<E89D96>銝湔𧒄<E6B994>桀<EFBFBD><E6A180>臬<EFBFBD>嚗<EFBFBD>之<EFBFBD><E4B98B>辣銝𠹺<E98A9D><F0A0B9BA>園<EFBFBD>閬<EFBFBD><E996AC>
|
||
RUN mkdir -p /tmp && chmod 1777 /tmp
|
||
|
||
# 隞擧<E99A9E>撱粹𧫴畾萄<E795BE><E89084>嗉<EFBFBD><E59789>毺㴓憓?COPY --from=builder /opt/venv /opt/venv
|
||
|
||
# 憭滚<E686AD>摨𠉛鍂隞<E98D82><E99A9E>
|
||
COPY . .
|
||
|
||
# 霈曄蔭<E69B84>臬<EFBFBD><E887AC>㗛<EFBFBD>
|
||
ENV PATH="/opt/venv/bin:$PATH" \
|
||
PYTHONUNBUFFERED=1 \
|
||
PYTHONDONTWRITEBYTECODE=1 \
|
||
PORT=8000
|
||
|
||
# <20>湧蠧蝡臬藁
|
||
EXPOSE 8000
|
||
|
||
# <20>亙熒璉<E78692><E79289>?HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
|
||
CMD curl -f http://localhost:8000/health || exit 1
|
||
|
||
# <20>臬𢆡<E887AC>賭誘
|
||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
|
||
```
|
||
|
||
### 甇仿炊 2嚗𡁜<E59A97>撱?.dockerignore
|
||
|
||
```
|
||
# Python
|
||
__pycache__/
|
||
*.py[cod]
|
||
*$py.class
|
||
*.so
|
||
.Python
|
||
venv/
|
||
env/
|
||
ENV/
|
||
|
||
# IDE
|
||
.vscode/
|
||
.idea/
|
||
*.swp
|
||
*.swo
|
||
|
||
# 瘚贝<E7989A><E8B49D>峕<EFBFBD>獢?tests/
|
||
test_files/
|
||
*.md
|
||
README.md
|
||
|
||
# Git
|
||
.git/
|
||
.gitignore
|
||
|
||
# <20>亙<EFBFBD>
|
||
*.log
|
||
|
||
# 銝湔𧒄<E6B994><F0A79284>辣
|
||
tmp/
|
||
temp/
|
||
```
|
||
|
||
### 甇仿炊 3嚗𡁏𧋦<F0A1818F>唳<EFBFBD>撱粹<E692B1><E7B2B9>?
|
||
```bash
|
||
# 餈𥕦<E9A488> extraction_service <20>桀<EFBFBD>
|
||
cd d:\MyCursor\AIclinicalresearch\extraction_service
|
||
|
||
# <20><>遣<EFBFBD>𨅯<EFBFBD>嚗<EFBFBD>𧋦<EFBFBD>唳<EFBFBD>霂𤏪<E99C82>
|
||
docker build -t extraction-service:latest .
|
||
|
||
# <20>亦<EFBFBD><E4BAA6>𨅯<EFBFBD>憭批<E686AD>
|
||
docker images extraction-service
|
||
```
|
||
|
||
### 甇仿炊 4嚗𡁏𧋦<F0A1818F>唳<EFBFBD>霂閖<E99C82><E99696>?
|
||
```bash
|
||
# <20>臬𢆡摰孵膥嚗<E886A5>𧋦<EFBFBD>唳<EFBFBD>霂𤏪<E99C82>
|
||
docker run -d \
|
||
--name extraction-test \
|
||
-p 8000:8000 \
|
||
-e DATABASE_URL="postgresql://user:pass@host:5432/dbname" \
|
||
extraction-service:latest
|
||
|
||
# <20>亦<EFBFBD><E4BAA6>亙<EFBFBD>
|
||
docker logs -f extraction-test
|
||
|
||
# 瘚贝<E7989A><E8B49D>亙熒璉<E78692><E79289>?curl http://localhost:8000/health
|
||
|
||
# 瘚贝<E7989A> PDF <20>𣂼<EFBFBD>
|
||
curl -X POST \
|
||
-F "file=@test.pdf" \
|
||
http://localhost:8000/extract/pdf
|
||
|
||
# <20>𨀣迫撟嗅<E6929F><E59785>斗<EFBFBD>霂訫捆<E8A8AB>?docker stop extraction-test
|
||
docker rm extraction-test
|
||
```
|
||
|
||
### 甇仿炊 5嚗𡁏綫<F0A1818F><E7B6AB><EFBFBD><EFBFBD>輸<EFBFBD>鈭穃捆<E7A983>券<EFBFBD><E588B8>譍<EFBFBD>摨?
|
||
#### 5.1 <20>𥕦遣<F0A595A6>𨅯<EFBFBD>隞枏<E99A9E>嚗<EFBFBD><E59A97>甈⊿<E79488>蝵莎<E89DB5>
|
||
|
||
1. **<2A>餃<EFBFBD><E9A483>輸<EFBFBD>鈭烐綉<E78390>嗅蝱** <20>?**摰孵膥<E5ADB5>𨅯<EFBFBD><F0A885AF>滚𦛚 ACR**
|
||
|
||
2. **<2A>𥕦遣銝芯犖摰硺<E691B0>**嚗<><E59A97>韐寧<E99F90>嚗?
|
||
```
|
||
摰硺<E691B0><E7A1BA>滨妍: extraction-service
|
||
<20>啣<EFBFBD>: <20>𦒘<EFBFBD>1嚗<31>㜺撌痹<E6928C>
|
||
```
|
||
|
||
3. **<2A>𥕦遣<F0A595A6>賢<EFBFBD>蝛粹𡢿**:
|
||
```
|
||
<20>賢<EFBFBD>蝛粹𡢿: clinical-research
|
||
```
|
||
|
||
4. **<2A>𥕦遣<F0A595A6>𨅯<EFBFBD>隞枏<E99A9E>**:
|
||
```
|
||
隞枏<E99A9E><E69E8F>滨妍: extraction-service
|
||
隞<><E99A9E>皞? <20>砍𧑐隞枏<E99A9E>
|
||
```
|
||
|
||
#### 5.2 <20>券<EFBFBD><E588B8><EFBFBD><EFBFBD>?
|
||
```bash
|
||
# 1. <20>餃<EFBFBD><E9A483>輸<EFBFBD>鈭穃捆<E7A983>券<EFBFBD><E588B8>𤩺<EFBFBD><F0A4A9BA>?# <20>瑕<EFBFBD><E79195>餃<EFBFBD><E9A483>賭誘嚗𡁻燵<F0A181BB>䔶<EFBFBD><E494B6>批<EFBFBD><E689B9>?<3F>?摰孵膥<E5ADB5>𨅯<EFBFBD><F0A885AF>滚𦛚 <20>?霈輸䔮<E8BCB8>剛<EFBFBD> <20>?霈曄蔭Registry<72>餃<EFBFBD>撖<EFBFBD><E69296>
|
||
docker login --username=<your-username> registry.cn-beijing.aliyuncs.com
|
||
|
||
# 2. 蝏䠷<E89D8F><E4A0B7>𤩺<EFBFBD><F0A4A9BA><EFBFBD>倌
|
||
docker tag extraction-service:latest \
|
||
registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.0
|
||
|
||
# 3. <20>券<EFBFBD><E588B8><EFBFBD><EFBFBD>輸<EFBFBD>鈭?docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.0
|
||
|
||
# 4. <20>券<EFBFBD>?latest <20><>倌嚗<E5808C>噶鈭𤾸<E988AD>蝏剜凒<E5899C>堆<EFBFBD>
|
||
docker tag extraction-service:latest \
|
||
registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:latest
|
||
docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:latest
|
||
```
|
||
|
||
---
|
||
|
||
## <20>函蔡<E587BD>?SAE
|
||
|
||
### 甇仿炊 1嚗𡁜<E59A97>撱?SAE 摨𠉛鍂
|
||
|
||
1. **<2A>餃<EFBFBD><E9A483>輸<EFBFBD>鈭烐綉<E78390>嗅蝱** <20>?**Serverless 摨𠉛鍂撘閙<E69298> SAE**
|
||
|
||
2. **<2A>𥕦遣摨𠉛鍂**:
|
||
```
|
||
摨𠉛鍂<F0A0899B>滨妍: extraction-service
|
||
<20>賢<EFBFBD>蝛粹𡢿: <20>㗇𥋘<E39787>𡒊垢<F0A1928A><E59EA2><EFBFBD>函<EFBFBD><E587BD>賢<EFBFBD>蝛粹𡢿嚗<F0A1A2BF><E59A97> VPC嚗? <20>函蔡<E587BD>孵<EFBFBD>: <20>𨅯<EFBFBD>
|
||
```
|
||
|
||
3. **<2A>𨅯<EFBFBD><F0A885AF>滨蔭**:
|
||
```
|
||
<20>𨅯<EFBFBD><F0A885AF>啣<EFBFBD>: registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:latest
|
||
<20>𨅯<EFBFBD><F0A885AF><EFBFBD>𧋦: latest
|
||
<20>𨅯<EFBFBD><F0A885AF>匧<EFBFBD>蝑𣇉裦: Always嚗<73><E59A97>甈⊿<E79488>蝵脤<E89DB5><E884A4>匧<EFBFBD><E58CA7><EFBFBD><EFBFBD>圈<EFBFBD><E59C88>𧶏<EFBFBD>
|
||
```
|
||
|
||
4. **閫<>聢<EFBFBD>滨蔭**:
|
||
```
|
||
CPU: 1<>? <20><><EFBFBD>: 2GB
|
||
摰硺<E691B0><E7A1BA>? 1嚗<31><E59A97>憪页<E686AA>
|
||
撘寞<E69298>扳<EFBFBD>蝻拙捆:
|
||
- <20><>撠誩<E692A0>靘𧢲㺭: 1
|
||
- <20><>憭批<E686AD>靘𧢲㺭: 3
|
||
- CPU 閫血<E996AB><E8A180><EFBFBD><EFBFBD>? 70%
|
||
```
|
||
|
||
5. **蝵𤑳<E89DB5><F0A491B3>滨蔭**:
|
||
```
|
||
銝𤘪<E98A9D>蝵𤑳<E89DB5> VPC: <20>㗇𥋘<E39787>𡒊垢<F0A1928A><E59EA2><EFBFBD>函<EFBFBD> VPC
|
||
vSwitch: <20>㗇𥋘<E39787>𡒊垢<F0A1928A><E59EA2><EFBFBD>函<EFBFBD>鈭斗揢<E69697>? 摰匧<E691B0>蝏? <20><>捂 VPC <20><>挪<EFBFBD>? ```
|
||
|
||
### 甇仿炊 2嚗𡁻<E59A97>蝵桃㴓憓<E3B493><E68693><EFBFBD>?
|
||
<EFBFBD>?SAE 摨𠉛鍂<F0A0899B>滨蔭銝剜溶<E5899C>牐誑銝讠㴓憓<E3B493><E68693><EFBFBD>𧶏<EFBFBD>
|
||
|
||
```bash
|
||
# ========= <20>唳旿摨㯄<E691A8>蝵?=========
|
||
DATABASE_URL=postgresql://user:password@rm-xxxx.pg.rds.aliyuncs.com:5432/clinical_research
|
||
|
||
# ========= 摮睃<E691AE><E79D83>滨蔭 =========
|
||
OSS_ENDPOINT=oss-cn-hangzhou-internal.aliyuncs.com
|
||
OSS_BUCKET=your-bucket-name
|
||
OSS_ACCESS_KEY_ID=<your-id>
|
||
OSS_ACCESS_KEY_SECRET=<your-secret>
|
||
|
||
# ========= <20>滚𦛚<E6BB9A>滨蔭 =========
|
||
SERVICE_NAME=extraction-service
|
||
SERVICE_VERSION=v1.0
|
||
LOG_LEVEL=INFO
|
||
|
||
# ========= <20>扯<EFBFBD><E689AF>滨蔭 =========
|
||
WORKERS=2
|
||
TIMEOUT=300
|
||
MAX_FILE_SIZE=52428800
|
||
|
||
# ========= <20>嗅躹 =========
|
||
TZ=Asia/Shanghai
|
||
```
|
||
|
||
### 甇仿炊 3嚗𡁻<E59A97>蝵桀<E89DB5>摨瑟<E691A8><E7919F>?
|
||
```bash
|
||
<EFBFBD>亙熒璉<EFBFBD><EFBFBD>亥楝敺? /health
|
||
<EFBFBD>亙熒璉<EFBFBD><EFBFBD>亦垢<EFBFBD>? 8000
|
||
<EFBFBD>亙熒璉<EFBFBD><EFBFBD>亙<EFBFBD>霈? HTTP
|
||
<EFBFBD>嘥<EFBFBD>撱嗉<EFBFBD>: 30蝘?璉<><E79289>仿𡢿<E4BBBF>? 10蝘?頞<>𧒄<EFBFBD>園𡢿: 5蝘?<3F>亙熒<E4BA99><E78692><EFBFBD>? 2甈?銝滚<E98A9D>摨琿<E691A8><E790BF>? 3甈?```
|
||
|
||
### 甇仿炊 4嚗𡁻<E59A97>蝵格𠯫敹?
|
||
```bash
|
||
<EFBFBD>亙<EFBFBD><EFBFBD>桀<EFBFBD>: /app/logs
|
||
<EFBFBD>亙<EFBFBD><EFBFBD><EFBFBD>辣: extraction-service.log
|
||
<EFBFBD>亙<EFBFBD>蝥批<EFBFBD>: INFO
|
||
<EFBFBD>亙<EFBFBD>靽萘<EFBFBD>憭拇㺭: 7憭?```
|
||
|
||
### 甇仿炊 5嚗𡁻<E59A97>蝵?SLB嚗<42>虾<EFBFBD>㚁<EFBFBD>憒<EFBFBD><E68692><EFBFBD><EFBFBD>閬<EFBFBD><E996AC>蝵𤏸挪<F0A48FB8>殷<EFBFBD>
|
||
|
||
```bash
|
||
# <20>𡁜虜 Python 敺格<E695BA><E6A0BC>∪蘨<E288AA><E898A8>閬<EFBFBD><E996AC>蝵𤏸挪<F0A48FB8>殷<EFBFBD>鋡怠<E98BA1>蝡航<E89DA1><E888AA>剁<EFBFBD>
|
||
# 憒<><E68692><EFBFBD><EFBFBD>閬<EFBFBD><E996AC>蝵𤏸挪<F0A48FB8>殷<EFBFBD>憒<EFBFBD><E68692>靚<EFBFBD><E99D9A><EFBFBD><EFBFBD>洵銝㗇䲮<E39787><E4B2AE><EFBFBD>嚗㚁<E59A97>
|
||
|
||
韐蠘蝸<EFBFBD><EFBFBD>﹛蝐餃<EFBFBD>: <20>祉<EFBFBD>
|
||
<EFBFBD>穃𨯬蝡臬藁: 80
|
||
<EFBFBD>𡒊垢蝡臬藁: 8000
|
||
<EFBFBD>亙熒璉<EFBFBD><EFBFBD>? <20>舐鍂
|
||
```
|
||
|
||
### 甇仿炊 6嚗𡁻<E59A97>蝵脣<E89DB5><E884A3>?
|
||
1. **<2A>孵稬"<22>函蔡摨𠉛鍂"**
|
||
|
||
2. **蝑匧<E89D91><E58CA7>函蔡摰峕<E691B0>**嚗<>漲 2-3 <20><><EFBFBD>嚗?
|
||
3. **<2A>亦<EFBFBD><E4BAA6>函蔡<E587BD>亙<EFBFBD>**:
|
||
```
|
||
[INFO] Pulling image...
|
||
[INFO] Image pulled successfully
|
||
[INFO] Starting container...
|
||
[INFO] Container started successfully
|
||
[INFO] Health check passed
|
||
[INFO] Application is running
|
||
```
|
||
|
||
---
|
||
|
||
## 瘚贝<E7989A>銝𡡞<E98A9D>霂?
|
||
### 甇仿炊 1嚗朞繮<E69C9E>硋<EFBFBD>蝵穃𧑐<E7A983><F0A79190>嚗<EFBFBD><E59A97><EFBFBD>格郊撉歹<E69289>
|
||
|
||
**<2A>𩤃<EFBFBD> <20>滩<EFBFBD>嚗锭AE 摰硺<E691B0><E7A1BA>湔糓頝其蜓<E585B6>箇<EFBFBD>嚗<EFBFBD><E59A97>憿颱蝙<E9A2B1>?SAE <20>𣂷<EFBFBD><F0A382B7><EFBFBD><EFBFBD>蝵穃𧑐<E7A983><F0A79190>**
|
||
|
||
#### <20>瑕<EFBFBD><E79195>笔<EFBFBD><E7AC94><EFBFBD><EFBFBD><EFBFBD>啣<EFBFBD><E595A3><EFBFBD>迤蝖格䲮瘜𤏪<E7989C>
|
||
|
||
1. **<2A>餃<EFBFBD> SAE <20>批<EFBFBD><E689B9>?* <20>?**摨𠉛鍂<F0A0899B>𡑒”** <20>?**<2A>孵稬 extraction-service 摨𠉛鍂**
|
||
|
||
2. **<2A>典<EFBFBD><E585B8>刻祕<E588BB><E7A595>△嚗峕𪄳<E5B395>?摨𠉛鍂霈輸䔮<E8BCB8>滨蔭"<22>?VPC <20><><EFBFBD>霈輸䔮"<22>典<EFBFBD>**
|
||
|
||
3. **<2A>亦<EFBFBD>撟嗅<E6929F><E59785>?<3F><><EFBFBD>霈輸䔮<E8BCB8>啣<EFBFBD>"**嚗屸<E59A97>𡁜虜<F0A1819C>臭誑銝𧢲聢撘譍<E69298>銝<EFBFBD>嚗? ```
|
||
# <20>澆<EFBFBD> 1: <20><><EFBFBD> IP + 蝡臬藁嚗<E89781><E59A97>潃鐥<E6BD83>潃鐥<E6BD83> 撘箇<E69298><E7AE87>刻<EFBFBD>嚗峕<E59A97>蝔喳<E89D94>嚗? 172.17.x.x:8000
|
||
|
||
# <20>澆<EFBFBD> 2: SAE <20><><EFBFBD> Service <20>笔<EFBFBD>嚗<EFBFBD><E59A97>閬<EFBFBD><E996AC>憭㚚<E686AD>蝵格<E89DB5><E6A0BC>∪<EFBFBD><E288AA>堆<EFBFBD>銝齿綫<E9BDBF>琜<EFBFBD>
|
||
extraction-service-xxxxx.cn-hangzhou.sae.aliyuncs.com:8000
|
||
|
||
# <20>澆<EFBFBD> 3: K8s Service <20>笔<EFBFBD>嚗<EFBFBD><E59A97>閬<EFBFBD><E996AC>蝵堉8s<38>滚𦛚<E6BB9A>𤑳緵嚗<E7B7B5><E59A97><EFBFBD><EFBFBD><EFBFBD>銝齿綫<E9BDBF>琜<EFBFBD>
|
||
extraction-service.namespace.svc.cluster.local:8000
|
||
```
|
||
|
||
4. **<2A>?<3F>躰秤<E8BAB0>𡁏<EFBFBD>嚗<EFBFBD><E59A97>撖潸稲餈墧𦻖憭梯揖嚗?*嚗? ```bash
|
||
# <20>?銝滩<E98A9D><E6BBA9>𨀣<EFBFBD><F0A880A3>硋<EFBFBD>霈曉<E99C88><E69B89>㵪<EFBFBD>100%憭梯揖嚗? http://extraction-service.sae:8000 # .sae <20>笔<EFBFBD>銝滚<E98A9D><E6BB9A>? http://extraction-service.internal:8000 # .internal <20>笔<EFBFBD>銝滚<E98A9D><E6BB9A>? http://extraction-service.cluster.local:8000 # <20><>閬<EFBFBD>8s<38>滚𦛚<E6BB9A>𤑳緵<F0A491B3>滨蔭
|
||
|
||
# <20>?銝滩<E98A9D>雿輻鍂 localhost
|
||
http://localhost:8000 # SAE 摰硺<E691B0><E7A1BA>湔糓頝其蜓<E585B6>箇<EFBFBD>
|
||
|
||
# <20>?銝滩<E98A9D>雿輻鍂 Docker <20>滚𦛚<E6BB9A>? http://extraction-service:8000 # 餈嗘<E9A488><E59798>臬<EFBFBD><E887AC>?Docker Compose
|
||
```
|
||
|
||
5. **<2A>?<3F>刻<EFBFBD><E588BB>𡁏<EFBFBD>嚗<EFBFBD><E59A97>隡睃<E99AA1>蝥扳<E89DA5>摨𧶏<E691A8>**嚗? ```bash
|
||
# 潃鐥<E6BD83>潃鐥<E6BD83>潃?<3F>寞<EFBFBD>A嚗𡁶凒<F0A181B6>乩蝙<E4B9A9>典<EFBFBD>蝵飡P嚗<50>撩<EFBFBD><E692A9>綫<EFBFBD>琜<EFBFBD>
|
||
EXTRACTION_SERVICE_URL=http://172.17.x.x:8000
|
||
# <20>瑕<EFBFBD><E79195>孵<EFBFBD>嚗锭AE<41>批<EFBFBD><E689B9>?> Python摨𠉛鍂 > 摰硺<E691B0><E7A1BA>𡑒” > <20>亦<EFBFBD><E4BAA6><EFBFBD><EFBFBD>IP
|
||
|
||
# 潃鐥<E6BD83>潃?<3F>寞<EFBFBD>B嚗帋蝙<E5B88B>沒AE<41>滚𦛚<E6BB9A>𤑳緵嚗<E7B7B5><E59A97>閬<EFBFBD><E996AC>憭㚚<E686AD>蝵殷<E89DB5>銝齿綫<E9BDBF>𣂼<EFBFBD><F0A382BC>煺蝙<E785BA>剁<EFBFBD>
|
||
# <20><>閬<EFBFBD>銁SAE<41>批<EFBFBD><E689B9>圈<EFBFBD>蝵?敺格<E695BA><E6A0BC>⊥釣<E28AA5>䔶葉敹?
|
||
EXTRACTION_SERVICE_URL=http://extraction-service-xxxxx.cn-hangzhou.sae.aliyuncs.com:8000
|
||
```
|
||
|
||
### 甇仿炊 2嚗𡁻<E59A97>蝵桀<E89DB5>蝡舐㴓憓<E3B493><E68693><EFBFBD>?
|
||
<EFBFBD>?SAE <20>𡒊垢摨𠉛鍂<F0A0899B><E98D82>㴓憓<E3B493><E68693><EFBFBD>譍葉瘛餃<E7989B>嚗?
|
||
```bash
|
||
# <20>𩤃<EFBFBD> 雿輻鍂 SAE <20>批<EFBFBD><E689B9>唳遬蝷箇<E89DB7><E7AE87>笔<EFBFBD><E7AC94><EFBFBD><EFBFBD><EFBFBD>啣<EFBFBD>
|
||
EXTRACTION_SERVICE_URL=http://172.17.x.x:8000
|
||
|
||
# 瘜冽<E7989C>嚗?# 1. 銝滩<E98A9D>雿輻鍂<E8BCBB>𨀣<EFBFBD><F0A880A3><EFBFBD><EFBFBD><EFBFBD>?# 2. 敹<>◆隞?SAE <20>批<EFBFBD><E689B9>啁<EFBFBD>"摨𠉛鍂霈輸䔮<E8BCB8>滨蔭"銝剛繮<E5899B>?# 3. 憒<><E68692> IP <20>睃<EFBFBD>嚗<EFBFBD><E59A97><EFBFBD>齿鰵<E9BDBF>函蔡嚗㚁<E59A97><E39A81><EFBFBD>閬<EFBFBD><E996AC>甇交凒<E4BAA4>啗<EFBFBD>銝芰㴓憓<E3B493><E68693><EFBFBD>?```
|
||
|
||
**<2A>滨蔭<E6BBA8>𡡞<EFBFBD><F0A1A19E>臬<EFBFBD>蝡臬<E89DA1><E887AC>?*嚗?- SAE <20>批<EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?<3F>滚鍳
|
||
|
||
### 甇仿炊 3嚗帋<E59A97><E5B88B>𡒊垢<F0A1928A>滚𦛚瘚贝<E7989A>
|
||
|
||
<EFBFBD>冽<EFBFBD><EFBFBD>?Node.js <20>𡒊垢<F0A1928A>滚𦛚銝剜溶<E5899C>䭾<EFBFBD>霂閧垢<E996A7>對<EFBFBD>
|
||
|
||
```typescript
|
||
// backend/src/tests/test-extraction-service.ts
|
||
|
||
import axios from 'axios';
|
||
import FormData from 'form-data';
|
||
import fs from 'fs';
|
||
|
||
const EXTRACTION_SERVICE_URL = process.env.EXTRACTION_SERVICE_URL || 'http://extraction-service.internal:8000';
|
||
|
||
export async function testExtractionService() {
|
||
try {
|
||
// 1. <20>亙熒璉<E78692><E79289>? console.log('Testing health endpoint...');
|
||
const healthRes = await axios.get(`${EXTRACTION_SERVICE_URL}/health`);
|
||
console.log('Health check:', healthRes.data);
|
||
|
||
// 2. 瘚贝<E7989A> PDF <20>𣂼<EFBFBD>
|
||
console.log('Testing PDF extraction...');
|
||
const form = new FormData();
|
||
form.append('file', fs.createReadStream('./test.pdf'));
|
||
|
||
const pdfRes = await axios.post(
|
||
`${EXTRACTION_SERVICE_URL}/extract/pdf`,
|
||
form,
|
||
{ headers: form.getHeaders() }
|
||
);
|
||
console.log('PDF extraction result:', pdfRes.data);
|
||
|
||
// 3. 瘚贝<E7989A> Word <20>𣂼<EFBFBD>
|
||
console.log('Testing Word extraction...');
|
||
const form2 = new FormData();
|
||
form2.append('file', fs.createReadStream('./test.docx'));
|
||
|
||
const docxRes = await axios.post(
|
||
`${EXTRACTION_SERVICE_URL}/extract/docx`,
|
||
form2,
|
||
{ headers: form2.getHeaders() }
|
||
);
|
||
console.log('Word extraction result:', docxRes.data);
|
||
|
||
console.log('<27>?All tests passed!');
|
||
} catch (error) {
|
||
console.error('<27>?Test failed:', error.message);
|
||
if (error.response) {
|
||
console.error('Response:', error.response.data);
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 甇仿炊 4嚗𡁻<E59A97>霂<EFBFBD>垢<EFBFBD>啁垢瘚<E59EA2><E7989A>嚗<EFBFBD><E59A97><EFBFBD>港<EFBFBD><E6B8AF>∪㦤<E288AA>荔<EFBFBD>
|
||
|
||
瘚贝<EFBFBD>隞乩<EFBFBD>銝𡁜𦛚瘚<EFBFBD><EFBFBD>嚗?
|
||
#### <20>箸艶 1: PKB <20><>﹝銝𠹺<E98A9D>
|
||
|
||
**銝𡁜𦛚瘚<F0A69B9A><E7989A>**嚗?```
|
||
<EFBFBD>冽<EFBFBD>銝𠹺<EFBFBD> PDF
|
||
<20>?Node.js <20>𡒊垢<F0A1928A>交𤣰
|
||
<20>?HTTP POST 頧砍<E9A0A7><E7A08D><EFBFBD>辣瘚<E8BEA3><E7989A> Python <20>滚𦛚 (EXTRACTION_SERVICE_URL)
|
||
<20>?Python <20>滚𦛚閫<F0A69B9A><E996AB> PDF嚗諹<E59A97><E8ABB9>?JSON <20><>𧋦
|
||
<20>?Node.js <20>𡒊垢<F0A1928A>嗅<EFBFBD><E59785><EFBFBD>𧋦
|
||
<20>?銝𠹺<E98A9D><F0A0B9BA>?Dify
|
||
<20>?餈𥪜<E9A488><F0A5AA9C>滨垢
|
||
```
|
||
|
||
**瘚贝<E7989A>甇仿炊**嚗?1. <20>典<EFBFBD>蝡臭<E89DA1>隡牐<E99AA1>銝?PDF <20><>﹝嚗<EFB99D>遣霈?< 5MB <20><><EFBFBD><EFBFBD>閙<EFBFBD>獢<EFBFBD><E78DA2>
|
||
|
||
2. **<2A>亦<EFBFBD> Node.js <20>𡒊垢<F0A1928A>亙<EFBFBD>**嚗𠄎AE <20>批<EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?<3F>亙<EFBFBD>嚗㚁<E59A97>
|
||
```
|
||
[INFO] Calling extraction service: http://172.17.x.x:8000/extract/pdf
|
||
[INFO] Extraction completed in 2.3s
|
||
[INFO] Extracted text preview: "This is a test document..."
|
||
```
|
||
|
||
3. **<2A>亦<EFBFBD> Python <20>滚𦛚<E6BB9A>亙<EFBFBD>**嚗𠄎AE <20>批<EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?<3F>亙<EFBFBD>嚗㚁<E59A97>
|
||
```
|
||
INFO: Request: POST /extract/pdf
|
||
INFO: File size: 1.2MB, filename: test.pdf
|
||
INFO: Using PyMuPDF extraction
|
||
INFO: Response: 200 (took 2.10s)
|
||
```
|
||
|
||
4. **<2A>?Dify Web UI 銝剔&霈斗<E99C88>獢<EFBFBD>歇銝𠹺<E98A9D>**
|
||
|
||
**憒<><E68692>憭梯揖嚗峕<E59A97><E5B395>?*嚗?- <20>𡒊垢<F0A1928A>亙<EFBFBD><E4BA99>臬炏<E887AC>曄內 "Connection refused" <20>?璉<><E79289>?EXTRACTION_SERVICE_URL <20>滨蔭
|
||
- Python <20>亙<EFBFBD><E4BA99>臬炏<E887AC>曄內 "ImportError" <20>?璉<><E79289>?Dockerfile 蝟餌<E89D9F>靘肽<E99D98>
|
||
- <20>𣂼<EFBFBD>頞<EFBFBD>𧒄嚗? 300s嚗争<E59A97> <20><>辣憭芸之<E88AB8>㚚<EFBFBD>閬<EFBFBD><E996AC><EFBFBD>㰘<EFBFBD><E3B098>園<EFBFBD>蝵?
|
||
#### <20>箸艶 2: ASL 瘛勗漲<E58B97><E6BCB2>粉
|
||
|
||
```
|
||
<EFBFBD>冽<EFBFBD><EFBFBD>孵稬"瘛勗漲<E58B97><E6BCB2>粉" <20>?<3F>𡒊垢靚<E59EA2>鍂 Python <20>滚𦛚<E6BB9A>𣂼<EFBFBD><F0A382BC>冽<EFBFBD> <20>?餈𥪜<E9A488> LLM <20><><EFBFBD>蝏𤘪<E89D8F>
|
||
```
|
||
|
||
**瘚贝<E7989A>甇仿炊**嚗?1. <20>?ASL 璅∪<E79285><E288AA>孵稬"瘛勗漲<E58B97><E6BCB2>粉"
|
||
2. <20>亦<EFBFBD><E4BAA6>𡒊垢<F0A1928A>亙<EFBFBD>嚗<EFBFBD>&霈方<E99C88><E696B9>?Python <20>滚𦛚嚗?3. <20>亦<EFBFBD> Python <20>滚𦛚<E6BB9A>亙<EFBFBD>嚗<EFBFBD>&霈斗<E99C88><E69697>𡝗<EFBFBD><F0A19D97><EFBFBD><EFBFBD>
|
||
4. <20>滨垢<E6BBA8>曄內<E69B84><E585A7><EFBFBD>蝏𤘪<E89D8F>
|
||
|
||
#### <20>箸艶 3: DC <20>唳旿皜<E697BF><E79A9C>
|
||
|
||
```
|
||
<EFBFBD>冽<EFBFBD>銝𠹺<EFBFBD> Excel <20>?<3F>𡒊垢靚<E59EA2>鍂 Python <20>滚𦛚 fillna <20>?餈𥪜<E9A488>皜<EFBFBD><E79A9C><EFBFBD>擧㺭<E693A7>?```
|
||
|
||
**瘚贝<E7989A>甇仿炊**嚗?1. <20>?DC 璅∪<E79285>銝𠹺<E98A9D> Excel <20><>辣
|
||
2. <20>扯<EFBFBD> fillna <20>滢<EFBFBD>
|
||
3. <20>亦<EFBFBD> Python <20>滚𦛚<E6BB9A>亙<EFBFBD>
|
||
4. 撉諹<E69289>皜<EFBFBD><E79A9C>蝏𤘪<E89D8F>
|
||
|
||
---
|
||
|
||
## <20>烐綉銝𡒊輕<F0A1928A>?
|
||
### <20><> SAE <20>芸蒂<E88AB8>烐綉
|
||
|
||
#### 1. <20>亦<EFBFBD>摨𠉛鍂<F0A0899B>烐綉
|
||
|
||
```
|
||
SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>烐綉
|
||
```
|
||
|
||
**<2A>喲睸<E596B2><E79DB8><EFBFBD>**嚗?- **CPU 雿輻鍂<E8BCBB>?*嚗? 70%嚗㚁<E59A97>PDF <20>𣂼<EFBFBD><F0A382BC>?CPU 撖<><E69296><EFBFBD>衤遙<E8A1A4>?- **<2A><><EFBFBD>雿輻鍂<E8BCBB>?*嚗? 80%嚗㚁<E59A97>憭扳<E686AD>隞嗅<E99A9E><E59785><EFBFBD>𧒄隡𡁜<E99AA1><F0A1819C>刻<EFBFBD>憭𡁜<E686AD>摮?- **霂瑟<E99C82> QPS**嚗<><E59A97>蝘埝䰻霂X㺭嚗㚁<E59A97>鈭<EFBFBD>圾韐蠘蝸<E8A098><E89DB8><EFBFBD>
|
||
- **撟喳<E6929F><E596B3>滚<EFBFBD><E6BB9A>園𡢿**嚗? 1000ms嚗㚁<E59A97>撠𤩺<E692A0>隞嗅<E99A9E> < 2s嚗<73>之<EFBFBD><E4B98B>辣 < 30s
|
||
- **<2A>躰秤<E8BAB0>?*嚗? 1%嚗㚁<E59A97><E39A81>烐綉<E78390><E7B689>辣閫<E8BEA3><E996AB>憭梯揖<E6A2AF>?
|
||
**<2A>扯<EFBFBD><E689AF>箏<EFBFBD>嚗<EFBFBD><E59A97><EFBFBD><EFBFBD><EFBFBD>**嚗?```
|
||
撠𤩺<EFBFBD>隞塚<EFBFBD>< 1MB PDF嚗㚁<E59A97><E39A81>滚<EFBFBD><E6BB9A>園𡢿 1-3s
|
||
銝剔<EFBFBD><EFBFBD><EFBFBD>辣嚗?-10MB PDF嚗㚁<E59A97><E39A81>滚<EFBFBD><E6BB9A>園𡢿 5-15s
|
||
憭扳<EFBFBD>隞塚<EFBFBD>10-50MB PDF嚗㚁<E59A97><E39A81>滚<EFBFBD><E6BB9A>園𡢿 20-60s
|
||
頞<EFBFBD>之<EFBFBD><EFBFBD>辣嚗? 50MB嚗㚁<E59A97>撱箄悅<E7AE84>𣂼<EFBFBD><F0A382BC>𡝗<EFBFBD>蝏?```
|
||
|
||
#### 2. 摰墧𧒄<E5A2A7>亙<EFBFBD><E4BA99>亦<EFBFBD>
|
||
|
||
```
|
||
SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>亙<EFBFBD> <20>?摰墧𧒄<E5A2A7>亙<EFBFBD>
|
||
```
|
||
|
||
**<2A>亙<EFBFBD>蝐餃<E89D90>**嚗?- 摨𠉛鍂<F0A0899B>亙<EFBFBD>嚗ìtdout/stderr嚗㚁<E59A97>uvicorn <20>臬𢆡靽⊥<E99DBD><E28AA5><EFBFBD>窈瘙<E7AA88>𠯫敹?- 霈輸䔮<E8BCB8>亙<EFBFBD>嚗𠃍TTP 霂瑟<E99C82>嚗㚁<E59A97>霂瑟<E99C82>頝臬<E9A09D><E887AC><EFBFBD><EFBFBD>摨娍𧒄<E5A88D>氬<EFBFBD><E6B0AC>𠶖<EFBFBD><F0A0B696><EFBFBD>
|
||
- <20>躰秤<E8BAB0>亙<EFBFBD>嚗<EFBFBD><E59A97>撣詨<E692A3><E8A9A8><EFBFBD><EFBFBD>嚗䥪ython 撘<>虜霂行<E99C82>
|
||
|
||
**<2A>喲睸<E596B2>亙<EFBFBD>蝷箔<E89DB7>**嚗?```bash
|
||
# <20>?甇<>虜<EFBFBD>臬𢆡
|
||
INFO: Started server process [1]
|
||
INFO: Application startup complete.
|
||
INFO: Uvicorn running on http://0.0.0.0:8000
|
||
|
||
# <20>?甇<>虜霂瑟<E99C82>
|
||
INFO: Request: POST /extract/pdf
|
||
INFO: File: test.pdf (1.2MB)
|
||
INFO: Response: 200 (took 2.10s)
|
||
|
||
# <20>?<3F>躰秤<E8BAB0>亙<EFBFBD>嚗<EFBFBD><E59A97><EFBFBD>單釣嚗?ERROR: ImportError: libGL.so.1: cannot open shared object file
|
||
ERROR: Timeout: PDF extraction took > 300s
|
||
ERROR: Memory error: Cannot allocate memory
|
||
```
|
||
|
||
#### 3. 撘寞<E69298>找撓蝻拚<E89DBB>蝵?
|
||
```
|
||
SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?撘寞<E69298>找撓蝻?```
|
||
|
||
**<2A>刻<EFBFBD><E588BB>滨蔭**嚗?```
|
||
<EFBFBD><EFBFBD>撠誩<EFBFBD>靘𧢲㺭: 1嚗<31>&靽脲<E99DBD><E884B2>∩<EFBFBD>銝剜鱏嚗?<3F><>憭批<E686AD>靘𧢲㺭: 3嚗<33>覔<EFBFBD>桀<EFBFBD><E6A180><EFBFBD><EFBFBD>頧質<E9A0A7><E8B3AA>湛<EFBFBD>
|
||
|
||
閫血<EFBFBD><EFBFBD>∩辣:
|
||
- CPU 雿輻鍂<E8BCBB>?> 70% <20><>賒 3 <20><><EFBFBD> <20>?<3F>拙捆 1 銝芸<E98A9D>靘? - CPU 雿輻鍂<E8BCBB>?< 30% <20><>賒 5 <20><><EFBFBD> <20>?蝻拙捆 1 銝芸<E98A9D>靘?```
|
||
|
||
**瘜冽<E7989C>鈭钅★**嚗?- PDF <20>𣂼<EFBFBD><F0A382BC>?CPU 撖<><E69296><EFBFBD>页<EFBFBD><E9A1B5>拙捆銝餉<E98A9D><E9A489>?CPU
|
||
- 憒<><E68692>蝏誩虜<E8AAA9>拙捆嚗諹<E59A97><E8ABB9><EFBFBD><EFBFBD>湔𦻖憓𧼮<E68693>摰硺<E691B0>閫<EFBFBD>聢嚗?<3F>?<3F>?4<>賂<EFBFBD>
|
||
- SAE 隡朞䌊<E69C9E>刻<EFBFBD>頧賢<E9A0A7>銵∴<E98AB5><E288B4>𣳇<EFBFBD><F0A3B387>见𢆡<E8A781>滨蔭
|
||
|
||
### <20><> 摨𠉛鍂<F0A0899B><E98D82><EFBFBD><EFBFBD>?
|
||
#### 瘛餃<E7989B><E9A483>亙熒璉<E78692><E79289>亦垢<E4BAA6>?
|
||
```python
|
||
# main.py
|
||
|
||
from fastapi import FastAPI
|
||
import psutil
|
||
import os
|
||
|
||
app = FastAPI()
|
||
|
||
@app.get("/health")
|
||
async def health_check():
|
||
"""<22>亙熒璉<E78692><E79289>亦垢<E4BAA6>?""
|
||
return {
|
||
"status": "healthy",
|
||
"service": "extraction-service",
|
||
"version": os.getenv("SERVICE_VERSION", "unknown")
|
||
}
|
||
|
||
@app.get("/metrics")
|
||
async def metrics():
|
||
"""<22>扯<EFBFBD><E689AF><EFBFBD><EFBFBD>蝡舐<E89DA1>"""
|
||
cpu_percent = psutil.cpu_percent(interval=1)
|
||
memory = psutil.virtual_memory()
|
||
disk = psutil.disk_usage('/app')
|
||
|
||
return {
|
||
"cpu": {
|
||
"percent": cpu_percent,
|
||
"count": psutil.cpu_count()
|
||
},
|
||
"memory": {
|
||
"total": memory.total,
|
||
"available": memory.available,
|
||
"percent": memory.percent
|
||
},
|
||
"disk": {
|
||
"total": disk.total,
|
||
"used": disk.used,
|
||
"free": disk.free,
|
||
"percent": disk.percent
|
||
}
|
||
}
|
||
```
|
||
|
||
#### 瘛餃<E7989B>霂瑟<E99C82><E7919F>亙<EFBFBD>
|
||
|
||
```python
|
||
# main.py
|
||
|
||
import logging
|
||
from fastapi import Request
|
||
import time
|
||
|
||
# <20>滨蔭<E6BBA8>亙<EFBFBD>
|
||
logging.basicConfig(
|
||
level=logging.INFO,
|
||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
||
handlers=[
|
||
logging.FileHandler('/app/logs/extraction-service.log'),
|
||
logging.StreamHandler()
|
||
]
|
||
)
|
||
logger = logging.getLogger(__name__)
|
||
|
||
@app.middleware("http")
|
||
async def log_requests(request: Request, call_next):
|
||
"""霂瑟<E99C82><E7919F>亙<EFBFBD>銝剝𡢿隞?""
|
||
start_time = time.time()
|
||
|
||
# 霈啣<E99C88>霂瑟<E99C82>
|
||
logger.info(f"Request: {request.method} {request.url}")
|
||
|
||
# <20>扯<EFBFBD>霂瑟<E99C82>
|
||
response = await call_next(request)
|
||
|
||
# 霈啣<E99C88><E595A3>滚<EFBFBD>
|
||
process_time = time.time() - start_time
|
||
logger.info(
|
||
f"Response: {response.status_code} "
|
||
f"(took {process_time:.2f}s)"
|
||
)
|
||
|
||
return response
|
||
```
|
||
|
||
### <20><> 摰𡁏<E691B0>蝏湔擪隞餃𦛚
|
||
|
||
#### 瘥誩𪂹隞餃𦛚
|
||
|
||
```bash
|
||
# 1. 璉<><E79289>交𠯫敹堒之撠?du -sh /app/logs
|
||
|
||
# 2. <20>亦<EFBFBD><E4BAA6>躰秤<E8BAB0>亙<EFBFBD>
|
||
tail -n 100 /app/logs/extraction-service.log | grep ERROR
|
||
|
||
# 3. <20>滚鍳摨𠉛鍂嚗<E98D82><E59A97><EFBFBD>𨀣<EFBFBD><F0A880A3><EFBFBD><EFBFBD>瘜<EFBFBD><E7989C>嚗?# SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>滚鍳
|
||
```
|
||
|
||
#### 瘥𤩺<E798A5>隞餃𦛚
|
||
|
||
```bash
|
||
# 1. <20>湔鰵 Python 靘肽<E99D98>
|
||
pip list --outdated
|
||
|
||
# 2. <20>滚遣<E6BB9A>𨅯<EFBFBD>嚗<EFBFBD><E59A97><EFBFBD>怠<EFBFBD><E680A0>冽凒<E586BD>堆<EFBFBD>
|
||
docker build -t extraction-service:v1.1 .
|
||
docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.1
|
||
|
||
# 3. <20>?SAE 銝剜凒<E5899C>圈<EFBFBD><E59C88>讐<EFBFBD><E8AE90>?```
|
||
|
||
---
|
||
|
||
## <20><><EFBFBD><EFBFBD>埝䰻
|
||
|
||
### <20>𤣳 撣貉<E692A3><E8B289>桅<EFBFBD>
|
||
|
||
#### <20>桅<EFBFBD> 1嚗𡁜捆<F0A1819C>典鍳<E585B8>典仃韐?
|
||
**<2A><>𠶖**嚗?```
|
||
SAE <20>曄內嚗𡁜<E59A97><F0A1819C>典鍳<E585B8>典仃韐?<3F>亙<EFBFBD><E4BA99>曄內嚗䥑mportError: libXXX.so: cannot open shared object file
|
||
```
|
||
|
||
**<2A>笔<EFBFBD>**嚗𡁶撩撠𤑳頂蝏煺<E89D8F>韏?
|
||
**閫<><E996AB>**嚗?```dockerfile
|
||
# <20>?Dockerfile 銝剜溶<E5899C>删撩憭梁<E686AD>摨?RUN apt-get update && apt-get install -y \
|
||
libgl1-mesa-glx \ # OpenCV
|
||
libglib2.0-0 \ # OpenCV
|
||
libgomp1 \ # Polars
|
||
libmupdf-dev \ # PyMuPDF
|
||
&& rm -rf /var/lib/apt/lists/*
|
||
```
|
||
|
||
#### <20>桅<EFBFBD> 2嚗䥪DF <20>𣂼<EFBFBD>頞<EFBFBD>𧒄
|
||
|
||
**<2A><>𠶖**嚗?```
|
||
霂瑟<EFBFBD>頞<EFBFBD>𧒄嚗? 300蝘𡜐<E89D98>
|
||
<EFBFBD>亙<EFBFBD><EFBFBD>曄內嚗関imeout error
|
||
```
|
||
|
||
**<2A>埝䰻甇仿炊**嚗?```bash
|
||
# 1. 璉<><E79289>交<EFBFBD>隞嗅之撠?# 憒<><E68692><EFBFBD><EFBFBD>辣 > 50MB嚗諹<E59A97><E8ABB9><EFBFBD><EFBFBD><EFBFBD><EFBFBD>憭<EFBFBD><E686AD>
|
||
|
||
# 2. 憓𧼮<E68693>頞<EFBFBD>𧒄<EFBFBD>園𡢿
|
||
# SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂<F0A0899B>滨蔭 <20>?<3F>臬<EFBFBD><E887AC>㗛<EFBFBD>
|
||
TIMEOUT=600
|
||
|
||
# 3. 隡睃<E99AA1><E79D83>𣂼<EFBFBD><F0A382BC>餉<EFBFBD>
|
||
# 頝唾<E9A09D><E594BE>曄<EFBFBD>憿萸<E686BF><E890B8><EFBFBD>蝻拙㦛<E68B99><E3A69B><EFBFBD>
|
||
```
|
||
|
||
#### <20>桅<EFBFBD> 3嚗𡁜<E59A97>摮䀹滯<E480B9>綽<EFBFBD>OOM嚗?
|
||
**<2A><>𠶖**嚗?```
|
||
摰孵膥<EFBFBD>芸𢆡<EFBFBD>滚鍳
|
||
<EFBFBD>亙<EFBFBD><EFBFBD>曄內嚗鐗illed (signal 9)
|
||
```
|
||
|
||
**閫<><E996AB>**嚗?```bash
|
||
# 1. 憓𧼮<E68693><F0A7BCAE><EFBFBD><EFBFBD><EFBFBD>滨蔭
|
||
# SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂<F0A0899B>滨蔭 <20>?閫<>聢
|
||
<EFBFBD><EFBFBD><EFBFBD>: 2GB <20>?4GB
|
||
|
||
# 2. 隡睃<E99AA1>隞<EFBFBD><E99A9E>嚗<EFBFBD><E59A97>撘誩<E69298><E8AAA9><EFBFBD><EFBFBD>
|
||
# 銝滩<E98A9D>銝<EFBFBD>甈⊥<E79488>批<EFBFBD>頧賣㟲銝芣<E98A9D>隞嗅<E99A9E><E59785><EFBFBD><EFBFBD>
|
||
with open(pdf_path, 'rb') as f:
|
||
# <20><><EFBFBD>憭<EFBFBD><E686AD>
|
||
for chunk in read_in_chunks(f):
|
||
process(chunk)
|
||
```
|
||
|
||
#### <20>桅<EFBFBD> 4嚗𡁜<E59A97>蝡舀<E89DA1>瘜閗<E7989C><E99697>?Python <20>滚𦛚嚗<F0A69B9A><E59A97>憸煾<E686B8>霂荔<E99C82>
|
||
|
||
**<2A><>𠶖**嚗?```
|
||
<EFBFBD>𡒊垢<EFBFBD>亙<EFBFBD>嚗鋴onnection refused
|
||
<EFBFBD>?ECONNREFUSED: connect ECONNREFUSED 172.17.x.x:8000
|
||
<EFBFBD>?Error: getaddrinfo ENOTFOUND extraction-service.internal
|
||
```
|
||
|
||
**<2A>寞𧋦<E5AF9E>笔<EFBFBD><E7AC94>埝䰻**嚗?
|
||
**<2A>笔<EFBFBD> 1嚗𡁜<E59A97>蝵穃𧑐<E7A983><F0A79190><EFBFBD>滨蔭<E6BBA8>躰秤嚗<E7A7A4><E59A97>撣貉<E692A3>嚗?*
|
||
```bash
|
||
# <20>?<3F>躰秤<E8BAB0>滨蔭嚗<E894AD><E59A97>瘚讠<E7989A><E8AEA0>笔<EFBFBD>嚗?EXTRACTION_SERVICE_URL=http://extraction-service.internal:8000
|
||
|
||
# <20>?甇<>&<EFBFBD>滨蔭嚗𠄎AE <20>批<EFBFBD><E689B9>唳遬蝷箇<E89DB7><E7AE87>笔<EFBFBD><E7AC94>啣<EFBFBD>嚗?EXTRACTION_SERVICE_URL=http://172.17.x.x:8000
|
||
```
|
||
|
||
**閫<><E996AB><EFBFBD>寞<EFBFBD>**嚗?```bash
|
||
# 1. <20>瑕<EFBFBD><E79195>笔<EFBFBD><E7AC94><EFBFBD><EFBFBD><EFBFBD>啣<EFBFBD>
|
||
# SAE <20>批<EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?摨𠉛鍂霂行<E99C82> <20>?摨𠉛鍂霈輸䔮<E8BCB8>滨蔭
|
||
# 憭滚<E686AD><E6BB9A>曄內<E69B84>?VPC <20><><EFBFBD>霈輸䔮<E8BCB8>啣<EFBFBD>"
|
||
|
||
# 2. <20>湔鰵<E6B994>𡒊垢<F0A1928A>臬<EFBFBD><E887AC>㗛<EFBFBD>
|
||
# SAE <20>批<EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?摨𠉛鍂<F0A0899B>滨蔭 <20>?<3F>臬<EFBFBD><E887AC>㗛<EFBFBD>
|
||
EXTRACTION_SERVICE_URL=http://<<3C>笔<EFBFBD><E7AC94><EFBFBD><EFBFBD>IP>:8000
|
||
|
||
# 3. <20>滚鍳<E6BB9A>𡒊垢摨𠉛鍂
|
||
# SAE <20>批<EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?<3F>滚鍳
|
||
```
|
||
|
||
**<2A>笔<EFBFBD> 2嚗䥪ython <20>滚𦛚<E6BB9A>芸鍳<E88AB8>?*
|
||
```bash
|
||
# 璉<><E79289>?Python <20>滚𦛚<E6BB9A>嗆<EFBFBD>?# SAE <20>批<EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?摰硺<E691B0><E7A1BA>𡑒”
|
||
# 蝖株恕摰硺<E691B0><E7A1BA>嗆<EFBFBD><E59786>蛹"餈鞱<E9A488>銝?
|
||
|
||
# <20>亦<EFBFBD><E4BAA6>臬𢆡<E887AC>亙<EFBFBD>
|
||
# SAE <20>批<EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?<3F>亙<EFBFBD>
|
||
# 摨磰砲<E7A3B0>见<EFBFBD>嚗?# INFO: Application startup complete.
|
||
# INFO: Uvicorn running on http://0.0.0.0:8000
|
||
```
|
||
|
||
**<2A>笔<EFBFBD> 3嚗𡁜<E59A97><F0A1819C>函<EFBFBD>閫<EFBFBD><E996AB><EFBFBD>𣂼<EFBFBD>**
|
||
```bash
|
||
# SAE 暺䁅恕<E48185>?VPC <20><><EFBFBD><EFBFBD>典虾鈭垍㮾霈輸䔮
|
||
# 憒<><E68692>隞齿<E99A9E>瘜閗<E7989C><E99697>伐<EFBFBD>璉<EFBFBD><E79289>伐<EFBFBD>
|
||
# SAE <20>批<EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?蝵𤑳<E89DB5><F0A491B3>滨蔭 <20>?摰匧<E691B0>蝏?# 蝖株恕<E6A0AA>亦<EFBFBD>閫<EFBFBD><E996AB><EFBFBD><EFBFBD>捂 VPC <20><>挪<EFBFBD>?8000 蝡臬藁
|
||
```
|
||
|
||
**瘚贝<E7989A><E8B49D><EFBFBD><EFBFBD>餈鮋<E9A488>𡁏<EFBFBD>?*嚗?```bash
|
||
# <20>寞<EFBFBD> 1嚗𡁜銁 SAE <20>批<EFBFBD><E689B9>啁<EFBFBD>"Webshell"銝剜<E98A9D>霂𤏪<E99C82>憒<EFBFBD><E68692><EFBFBD>舀<EFBFBD>嚗?curl http://<Python<6F>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>IP>:8000/health
|
||
|
||
# <20>寞<EFBFBD> 2嚗𡁜銁<F0A1819C>𡒊垢摨𠉛鍂<F0A0899B><E98D82>鍳<EFBFBD>刻<EFBFBD><E588BB>砌葉瘛餃<E7989B>瘚贝<E7989A>
|
||
echo "Testing extraction service connectivity..."
|
||
curl -f http://<Python<6F>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>IP>:8000/health || echo "<22>?Cannot connect to extraction service"
|
||
|
||
# <20>寞<EFBFBD> 3嚗帋蝙<E5B88B>?telnet 瘚贝<E7989A>蝡臬藁
|
||
telnet <Python<6F>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>IP> 8000
|
||
```
|
||
|
||
---
|
||
|
||
## 瘜冽<E7989C>鈭钅★銝𡒊<E98A9D>敹?
|
||
### <20>?<3F><>雿喳<E99BBF>頝?
|
||
#### 1. **<2A>𨅯<EFBFBD>隡睃<E99AA1>**
|
||
|
||
```dockerfile
|
||
# <20>?雿輻鍂憭𡁻𧫴畾菜<E795BE>撱?FROM python:3.11-slim as builder
|
||
# ... <20><>遣 ...
|
||
FROM python:3.11-slim
|
||
COPY --from=builder /opt/venv /opt/venv
|
||
|
||
# <20>?皜<><E79A9C>蝻枏<E89DBB>
|
||
RUN apt-get update && apt-get install -y ... \
|
||
&& rm -rf /var/lib/apt/lists/*
|
||
|
||
# <20>?雿輻鍂 .dockerignore
|
||
# <20>踹<EFBFBD>撠<EFBFBD><E692A0>敹<EFBFBD><E695B9><EFBFBD><EFBFBD><EFBFBD>隞嗆<E99A9E><E59786><EFBFBD><EFBFBD><EFBFBD>𨅯<EFBFBD>
|
||
```
|
||
|
||
#### 2. **<2A><>𧋦蝞∠<E89D9E>**
|
||
|
||
```bash
|
||
# <20>?雿輻鍂霂凋<E99C82><E5878B>𣇉<EFBFBD><F0A38789>?v1.0.0 # 銝餌<E98A9D><E9A48C>?甈∠<E79488><E288A0>?銵乩<E98AB5><E4B9A9><EFBFBD>𧋦
|
||
|
||
# <20>?靽萘<E99DBD>憭帋葵<E5B88B><E891B5>𧋦
|
||
docker tag ... extraction-service:v1.0.0
|
||
docker tag ... extraction-service:v1.0
|
||
docker tag ... extraction-service:latest
|
||
|
||
# <20>?霈啣<E99C88><E595A3>䀹凒
|
||
# CHANGELOG.md
|
||
## v1.0.1 (2025-12-20)
|
||
- 靽桀<E99DBD>: PDF <20>𣂼<EFBFBD>頞<EFBFBD>𧒄<EFBFBD>桅<EFBFBD>
|
||
- 隡睃<E99AA1>: <20>誩<EFBFBD><E8AAA9>𨅯<EFBFBD>雿梶妖 30%
|
||
```
|
||
|
||
#### 3. **摰匧<E691B0><E58CA7>惩𤐄**
|
||
|
||
```python
|
||
# <20>?<3F><>辣憭批<E686AD><E689B9>𣂼<EFBFBD>
|
||
MAX_FILE_SIZE = 50 * 1024 * 1024 # 50MB
|
||
|
||
@app.post("/extract/pdf")
|
||
async def extract_pdf(file: UploadFile):
|
||
if file.size > MAX_FILE_SIZE:
|
||
raise HTTPException(
|
||
status_code=413,
|
||
detail="File too large"
|
||
)
|
||
|
||
# <20>?<3F><>辣蝐餃<E89D90>撉諹<E69289>
|
||
ALLOWED_TYPES = {'application/pdf', 'application/msword'}
|
||
|
||
if file.content_type not in ALLOWED_TYPES:
|
||
raise HTTPException(
|
||
status_code=415,
|
||
detail="Unsupported file type"
|
||
)
|
||
```
|
||
|
||
#### 4. **<2A>扯<EFBFBD>隡睃<E99AA1>**
|
||
|
||
```python
|
||
# <20>?撘<>郊憭<E9838A><E686AD>憭扳<E686AD>隞?import asyncio
|
||
|
||
async def extract_large_pdf(pdf_path: str):
|
||
# 雿輻鍂撘<E98D82>郊 I/O
|
||
async with aiofiles.open(pdf_path, 'rb') as f:
|
||
content = await f.read()
|
||
|
||
# <20>函瑪蝔𧢲<E89D94>銝剜<E98A9D>銵?CPU 撖<><E69296><EFBFBD>衤遙<E8A1A4>? loop = asyncio.get_event_loop()
|
||
text = await loop.run_in_executor(None, pymupdf_extract, content)
|
||
|
||
return text
|
||
|
||
# <20>?餈墧𦻖瘙?from sqlalchemy.pool import NullPool
|
||
|
||
engine = create_engine(
|
||
DATABASE_URL,
|
||
poolclass=NullPool, # SAE <20>臬<EFBFBD><E887AC>刻<EFBFBD>
|
||
echo=False
|
||
)
|
||
```
|
||
|
||
### <20>?蝏嘥笆蝳<E7AC86>迫
|
||
|
||
#### 1. **蝳<>迫<EFBFBD>𨀣<EFBFBD><F0A880A3>硋<EFBFBD>霈曉<E99C88>蝵穃𧑐<E7A983><F0A79190>嚗<EFBFBD>稲<EFBFBD>賡<EFBFBD>霂荔<E99C82>**
|
||
|
||
```bash
|
||
# <20>?<3F>躰秤<E8BAB0>𡁏<EFBFBD>嚗<EFBFBD><E59A97>撖潸稲餈墧𦻖憭梯揖嚗?EXTRACTION_SERVICE_URL=http://extraction-service.internal:8000
|
||
EXTRACTION_SERVICE_URL=http://localhost:8000
|
||
EXTRACTION_SERVICE_URL=http://extraction-service:8000
|
||
|
||
# <20>?甇<>&<EFBFBD>𡁏<EFBFBD>嚗帋<E59A97> SAE <20>批<EFBFBD><E689B9>啗繮<E59597>𣇉<EFBFBD>摰𧼮𧑐<F0A7BCAE><F0A79190>
|
||
# SAE <20>批<EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?摨𠉛鍂霈輸䔮<E8BCB8>滨蔭
|
||
# 憭滚<E686AD><E6BB9A>曄內<E69B84>?VPC <20><><EFBFBD>霈輸䔮<E8BCB8>啣<EFBFBD>"
|
||
EXTRACTION_SERVICE_URL=http://172.17.x.x:8000
|
||
```
|
||
|
||
**<2A>笔<EFBFBD>**嚗?- SAE 摰硺<E691B0><E7A1BA>湔糓頝其蜓<E585B6>箇<EFBFBD>嚗䔶<E59A97><E494B6>賭蝙<E8B3AD>?Docker <20>滚𦛚<E6BB9A>?- SAE <20>?K8s Service <20>笔<EFBFBD><E7AC94>澆<EFBFBD><E6BE86>𣳇<EFBFBD>蝵株<E89DB5><E6A0AA><EFBFBD>嚗䔶<E59A97><E494B6>賢<EFBFBD>霈?- <20><>蝔喳戎<E596B3><E6888E>糓雿輻鍂 SAE <20>批<EFBFBD><E689B9>唳遬蝷箇<E89DB7> IP <20>啣<EFBFBD>
|
||
|
||
#### 2. **蝳<>迫<EFBFBD>券<EFBFBD><E588B8>譍葉蝖祉<E89D96><E7A589><EFBFBD><EFBFBD><EFBFBD>煺縑<E785BA>?*
|
||
|
||
```dockerfile
|
||
# <20>?<3F>躰秤蝷箔<E89DB7>
|
||
ENV DATABASE_PASSWORD=my-secret-password
|
||
|
||
# <20>?甇<>&<EFBFBD>𡁏<EFBFBD>嚗𡁜銁 SAE <20>臬<EFBFBD><E887AC>㗛<EFBFBD>銝剝<E98A9D>蝵?```
|
||
|
||
#### 3. **蝳<>迫雿輻鍂<E8BCBB>砍𧑐<E7A08D><F0A79190>辣<EFBFBD><E8BEA3><EFBFBD><EFBFBD>硋<EFBFBD><E7A18B>?*
|
||
|
||
```python
|
||
# <20>?<3F>躰秤蝷箔<E89DB7>嚗<EFBFBD>捆<EFBFBD>券<EFBFBD><E588B8>臬<EFBFBD>銝W仃嚗?output_path = '/app/output/result.txt'
|
||
with open(output_path, 'w') as f:
|
||
f.write(result)
|
||
|
||
# <20>?甇<>&<EFBFBD>𡁏<EFBFBD>嚗帋蝙<E5B88B>?/tmp 摮䀝葩<E4809D>嗆<EFBFBD>隞塚<E99A9E>蝏𤘪<E89D8F>銝𠹺<E98A9D><F0A0B9BA>?OSS
|
||
import tempfile
|
||
with tempfile.NamedTemporaryFile(mode='w', delete=False) as f:
|
||
f.write(result)
|
||
# 銝𠹺<E98A9D><F0A0B9BA>?OSS嚗<53>蝙<EFBFBD>?oss2 摨橒<E691A8>
|
||
# <20><><EFBFBD>𤾸<EFBFBD><F0A4BEB8>支葩<E694AF>嗆<EFBFBD>隞?```
|
||
|
||
#### 4. **蝳<>迫雿輻鍂 :latest <20><>倌<EFBFBD>函<EFBFBD>鈭抒㴓憓?*
|
||
|
||
```bash
|
||
# <20>?<3F>躰秤<E8BAB0>𡁏<EFBFBD>嚗<EFBFBD><E59A97>瘜訫<E7989C>皛𡄯<E79A9B>
|
||
image: extraction-service:latest
|
||
|
||
# <20>?甇<>&<EFBFBD>𡁏<EFBFBD>嚗<EFBFBD>祗銋匧<E98A8B><E58CA7><EFBFBD>𧋦嚗?image: extraction-service:v1.0.0
|
||
```
|
||
|
||
#### 5. **蝳<>迫<EFBFBD>典捆<E585B8>典<EFBFBD>靽格㺿隞<E3BABF><E99A9E>**
|
||
|
||
```bash
|
||
# <20>?<3F>躰秤<E8BAB0>滢<EFBFBD>嚗<EFBFBD>捆<EFBFBD>券<EFBFBD><E588B8>臬<EFBFBD>銝W仃嚗?# SAE Webshell <20>?vi /app/main.py
|
||
|
||
# <20>?甇<>&瘚<EFBC86><E7989A>嚗?# 1. <20>砍𧑐靽格㺿隞<E3BABF><E99A9E>
|
||
# 2. <20>滚遣<E6BB9A>𨅯<EFBFBD>
|
||
# 3. <20>券<EFBFBD><E588B8><EFBFBD> ACR
|
||
# 4. SAE 銝剜凒<E5899C>圈<EFBFBD><E59C88>讐<EFBFBD><E8AE90>?```
|
||
|
||
#### 6. **蝳<>迫雿輻鍂<E8BCBB>𣳇<EFBFBD>憓鮋鵭<E9AE8B><E9B5AD><EFBFBD>撅<EFBFBD><E69285>㗛<EFBFBD>**
|
||
|
||
```python
|
||
# <20>?<3F>躰秤蝷箔<E89DB7>嚗<EFBFBD><E59A97>摮䀹<E691AE>瞍𧶏<E79E8D>
|
||
CACHE = {} # <20>典<EFBFBD>蝻枏<E89DBB>嚗峕<E59A97><E5B395>𣂼<EFBFBD><F0A382BC>?
|
||
@app.post("/extract/pdf")
|
||
async def extract_pdf(file: UploadFile):
|
||
key = file.filename
|
||
if key not in CACHE:
|
||
CACHE[key] = extract(file) # <20><><EFBFBD>隡𡁏<E99AA1>蝏剖<E89D8F><E58996>選<EFBFBD>
|
||
return CACHE[key]
|
||
|
||
# <20>?甇<>&<EFBFBD>𡁏<EFBFBD>嚗帋蝙<E5B88B>冽<EFBFBD><E586BD>𣂼捆<F0A382BC>讐<EFBFBD>蝻枏<E89DBB>
|
||
from functools import lru_cache
|
||
|
||
@lru_cache(maxsize=100) # <20><>憭𡁶<E686AD>摮?100 銝芰<E98A9D><E88AB0>?def extract_with_cache(file_hash: str):
|
||
return extract(file_hash)
|
||
```
|
||
|
||
#### 7. **蝳<>迫敹賜裦 /tmp <20>桀<EFBFBD><E6A180><EFBFBD>之撠誯<E692A0><E8AAAF>?*
|
||
|
||
```python
|
||
# <20>𩤃<EFBFBD> 瘜冽<E7989C>嚗锭AE 摰孵膥<E5ADB5>?/tmp <20>桀<EFBFBD><E6A180>𡁜虜<F0A1819C>匧之撠誯<E692A0><E8AAAF>塚<EFBFBD>憒?1-2GB嚗?# 憭<><E686AD>憭扳<E686AD>隞嗅<E99A9E>敹<EFBFBD>◆皜<E29786><E79A9C>銝湔𧒄<E6B994><F0A79284>辣
|
||
|
||
import os
|
||
import tempfile
|
||
|
||
async def extract_large_pdf(file: UploadFile):
|
||
# 靽嘥<E99DBD><E598A5>唬葩<E594AC>嗆<EFBFBD>隞? with tempfile.NamedTemporaryFile(delete=False, suffix='.pdf') as tmp:
|
||
content = await file.read()
|
||
tmp.write(content)
|
||
tmp_path = tmp.name
|
||
|
||
try:
|
||
# 憭<><E686AD><EFBFBD><EFBFBD>辣
|
||
result = extract_pdf_pymupdf(tmp_path)
|
||
return result
|
||
finally:
|
||
# <20>?<3F>喲睸嚗𡁜<E59A97>憿餅<E686BF><E9A485><EFBFBD>葩<EFBFBD>嗆<EFBFBD>隞? if os.path.exists(tmp_path):
|
||
os.unlink(tmp_path)
|
||
```
|
||
|
||
---
|
||
|
||
## <20><> <20><><EFBFBD>
|
||
|
||
### A. 摰峕㟲<E5B395>?requirements.txt嚗<74>𧫴畾?嚗?
|
||
```txt
|
||
# Web 獢<>沲
|
||
fastapi==0.115.5
|
||
uvicorn[standard]==0.32.1
|
||
python-multipart==0.0.20
|
||
|
||
# <20><>﹝<EFBFBD>𣂼<EFBFBD>
|
||
PyMuPDF==1.24.14
|
||
mammoth==1.8.0
|
||
python-docx==1.1.2
|
||
|
||
# <20>唳旿憭<E697BF><E686AD>
|
||
polars==1.17.1
|
||
numpy==1.26.4
|
||
|
||
# 颲<>𨭌撌亙<E6928C>
|
||
langdetect==1.0.9
|
||
chardet==5.2.0
|
||
aiofiles==23.2.1
|
||
|
||
# <20>唳旿摨?sqlalchemy==2.0.25
|
||
asyncpg==0.29.0
|
||
|
||
# <20>輸<EFBFBD>鈭?OSS
|
||
oss2==2.18.3
|
||
|
||
# <20>亙<EFBFBD><E4BA99>𣬚<EFBFBD><F0A3AC9A>?python-json-logger==2.0.7
|
||
psutil==5.9.8
|
||
```
|
||
|
||
### B. Dockerfile 摰峕㟲<E5B395>?
|
||
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>齿<EFBFBD> [<5B><>遣 Docker <20>𨅯<EFBFBD> - 甇仿炊 1](#甇仿炊-1<>𥕦遣隡睃<E99AA1><E79D83>?dockerfile)
|
||
|
||
### C. <20>砍𧑐瘚贝<E7989A><E8B49D>𡁏𧋦
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
# test-local.sh
|
||
|
||
echo "Building Docker image..."
|
||
docker build -t extraction-service:test .
|
||
|
||
echo "Starting container..."
|
||
docker run -d \
|
||
--name extraction-test \
|
||
-p 8000:8000 \
|
||
-e DATABASE_URL="postgresql://user:pass@host:5432/db" \
|
||
extraction-service:test
|
||
|
||
echo "Waiting for service to start..."
|
||
sleep 10
|
||
|
||
echo "Testing health endpoint..."
|
||
curl http://localhost:8000/health
|
||
|
||
echo "Testing PDF extraction..."
|
||
curl -X POST \
|
||
-F "file=@test.pdf" \
|
||
http://localhost:8000/extract/pdf
|
||
|
||
echo "Cleaning up..."
|
||
docker stop extraction-test
|
||
docker rm extraction-test
|
||
|
||
echo "Done!"
|
||
```
|
||
|
||
### D. <20>詨<EFBFBD><E8A9A8><EFBFBD>﹝<EFBFBD>暹𦻖
|
||
|
||
- [<5B>輸<EFBFBD>鈭?SAE <20><>﹝](https://help.aliyun.com/product/134532.html)
|
||
- [Docker <20><>﹝](https://docs.docker.com/)
|
||
- [FastAPI <20><>﹝](https://fastapi.tiangolo.com/)
|
||
- [PyMuPDF <20><>﹝](https://pymupdf.readthedocs.io/)
|
||
- [Polars <20><>﹝](https://pola-rs.github.io/polars/)
|
||
|
||
---
|
||
|
||
## <20>㴓 敹恍<E695B9>笔<EFBFBD><E7AC94>?
|
||
### 撣貊鍂<E8B28A>賭誘
|
||
|
||
```bash
|
||
# <20><>遣<EFBFBD>𨅯<EFBFBD>
|
||
docker build -t extraction-service:v1.0 .
|
||
|
||
# <20>券<EFBFBD><E588B8><EFBFBD><EFBFBD>?docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.0
|
||
|
||
# <20>亦<EFBFBD> SAE <20>亙<EFBFBD>
|
||
# SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>亙<EFBFBD>
|
||
|
||
# <20>滚鍳 SAE 摨𠉛鍂
|
||
# SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>滚鍳
|
||
|
||
# 瘚贝<E7989A><E8B49D><EFBFBD><EFBFBD>餈鮋<E9A488>𡁏<EFBFBD>?curl http://extraction-service.internal:8000/health
|
||
|
||
# <20>亦<EFBFBD>摰孵膥韏<E886A5><E99F8F>
|
||
docker stats extraction-service
|
||
```
|
||
|
||
### <20>喲睸<E596B2>滨蔭
|
||
|
||
| <20>滨蔭憿?| <20>刻<EFBFBD><E588BB>?| 霂湔<E99C82> |
|
||
|-------|--------|------|
|
||
| CPU | 1<>?| <20>嘥<EFBFBD><E598A5>滨蔭 |
|
||
| <20><><EFBFBD> | 2GB | 銝滚鉄 Nougat |
|
||
| 摰硺<E691B0><E7A1BA>?| 1-3 | <20>芸𢆡撘寞<E69298>找撓蝻?|
|
||
| 頞<>𧒄<EFBFBD>園𡢿 | 300蝘?| 憭扳<E686AD>隞嗅<E99A9E><E59785>?|
|
||
| <20>亙熒璉<E78692><E79289>?| 30蝘?| <20>嘥<EFBFBD>撱嗉<E692B1> |
|
||
| Worker <20>圈<EFBFBD> | 2 | Uvicorn workers |
|
||
|
||
---
|
||
|
||
**<EFBFBD><EFBFBD>﹝蝏湔擪**嚗?- 憒<><E68692><EFBFBD>桅<EFBFBD><E6A185>硋遣霈殷<E99C88>霂瑁<E99C82>蝟餅<E89D9F><E9A485>航<EFBFBD>韐<EFBFBD>犖
|
||
- <20><><EFBFBD>擧凒<E693A7>堆<EFBFBD>2025-12-13
|
||
- 銝𧢲活摰⊥䰻嚗?025-03-13
|
||
|