Major Changes: - Add StreamingService with OpenAI Compatible format - Upgrade Chat component V2 with Ant Design X integration - Implement AIA module with 12 intelligent agents - Update API routes to unified /api/v1 prefix - Update system documentation Backend (~1300 lines): - common/streaming: OpenAI Compatible adapter - modules/aia: 12 agents, conversation service, streaming integration - Update route versions (RVW, PKB to v1) Frontend (~3500 lines): - modules/aia: AgentHub + ChatWorkspace (100% prototype restoration) - shared/Chat: AIStreamChat, ThinkingBlock, useAIStream Hook - Update API endpoints to v1 Documentation: - AIA module status guide - Universal capabilities catalog - System overview updates - All module documentation sync Tested: Stream response verified, authentication working Status: AIA V2.0 core completed (85%)
37 KiB
Python 敺格<E695BA><E6A0BC>?SAE 摰孵膥<E5ADB5>函蔡摰<E894A1><E691B0><EFBFBD><EFBFBD><EFBFBD>
<EFBFBD><EFBFBD>﹝<EFBFBD><EFBFBD>𧋦: v1.1 (靽桀<E99DBD><E6A180><EFBFBD><EFBFBD><EFBFBD>啣<EFBFBD><E595A3>䔶葩<E494B6>嗆<EFBFBD>隞園䔮憸?
<EFBFBD>𥕦遣<EFBFBD>園𡢿: 2025-12-13
**<2A><><EFBFBD>𦒘耨霈?: 2025-12-13
<EFBFBD><EFBFBD>鍂<EFBFBD><EFBFBD>凒: AIclinicalresearch 撟喳蝱 - Python 敺格<E695BA><E6A0BC>∴<EFBFBD>extraction_service嚗?
**<2A>格<EFBFBD>霂餉<E99C82>?: 餈鞟輕撌亦<E6928C>撣<EFBFBD><E692A3><EFBFBD><EFBFBD>蝡臬<E89DA1><E887AC>穃極蝔见<E89D94>
v1.1 <20>湔鰵<E6B994>亙<EFBFBD>:
- <EFBFBD>?靽桀<E99DBD>嚗𡁜<E59A97>蝵穃𧑐<E7A983><F0A79190>雿輻鍂 SAE <20>批<EFBFBD><E689B9>唳遬蝷箇<E89DB7><E7AE87>笔<EFBFBD> IP嚗<50><E59A97><EFBFBD>𨀣<EFBFBD><F0A880A3>笔<EFBFBD>嚗?- <20>?隡睃<E99AA1>嚗鋽ockerfile 蝟餌<E89D9F>靘肽<E99D98>霂湔<E99C82>嚗ēibmupdf-dev <20>舫<EFBFBD>㚁<EFBFBD>
- <EFBFBD>?<3F>啣<EFBFBD>嚗𡁶&靽?/tmp <20>桀<EFBFBD><E6A180>臬<EFBFBD>嚗<EFBFBD>之<EFBFBD><E4B98B>辣銝湔𧒄摮睃<E691AE>嚗?- <20>?摰<><E691B0>嚗𡁜<E59A97><F0A1819C>賡<EFBFBD>霂<EFBFBD><E99C82>蝔见<E89D94><E8A781>烐綉<E78390><E7B689><EFBFBD>
<EFBFBD><EFBFBD> <20><>﹝<EFBFBD>桀<EFBFBD>
- 銝箔<EFBFBD>銋<EFBFBD><EFBFBD>㗇𥋘 SAE 摰孵膥<E5ADB5>函蔡
- [<5B>函蔡<E587BD>嗆<EFBFBD><E59786>霄(#<23>函蔡<E587BD>嗆<EFBFBD><E59786>?
- <EFBFBD>滨蔭<EFBFBD><EFBFBD><EFBFBD>皜<EFBFBD><EFBFBD>
- Python <20>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>
- 靘肽<EFBFBD>隡睃<EFBFBD>蝑𣇉裦
- <EFBFBD><EFBFBD>遣 Docker <20>𨅯<EFBFBD>
- <EFBFBD>函蔡<EFBFBD>?SAE
- [瘚贝<E7989A>銝𡡞<E98A9D>霂<EFBFBD>(#瘚贝<E7989A>銝𡡞<E98A9D>霂?
- [<5B>烐綉銝𡒊輕<F0A1928A>也(#<23>烐綉銝𡒊輕<F0A1928A>?
- <EFBFBD><EFBFBD><EFBFBD><EFBFBD>埝䰻
- [瘜冽<E7989C>鈭钅★銝𡒊<E98A9D>敹䀉(#瘜冽<E7989C>鈭钅★銝𡒊<E98A9D>敹?
銝箔<EFBFBD>銋<EFBFBD><EFBFBD>㗇𥋘 SAE 摰孵膥<E5ADB5>函蔡
<EFBFBD>?SAE 摰孵膥<E5ADB5>函蔡 vs. SAE Python 餈鞱<E9A488><E99EB1>?
| 撖寞<EFBFBD>蝏游漲 | SAE Python 餈鞱<E9A488><E99EB1>? | SAE 摰孵膥<E5ADB5>函蔡 (<28>刻<EFBFBD>) |
|---|---|---|
| 蝟餌<EFBFBD>靘肽<EFBFBD> | <EFBFBD>?<3F>䭾<EFBFBD>摰㕑<E691B0>蝟餌<E89D9F>摨? | <EFBFBD>?摰<><E691B0><EFBFBD>舀綉 |
| 憭齿<EFBFBD>靘肽<EFBFBD> | <EFBFBD>?PyMuPDF/OpenCV <20>仿<EFBFBD> | <EFBFBD>?摰𣬚<E691B0><F0A3AC9A>舀<EFBFBD> |
| *<EFBFBD>臬<EFBFBD>銝<EFBFBD><EFBFBD>湔<EFBFBD>? | <EFBFBD>𩤃<EFBFBD> 鈭睲<E988AD><E79DB2>峕𧋦<E5B395>啣虾<E595A3>賭<EFBFBD><E8B3AD>? | <EFBFBD>?<3F>砍𧑐頝煾<E9A09D>?= 鈭睲<E988AD>頝煾<E9A09D>? |
| Nougat (Torch) | <EFBFBD>?<3F><>𧋦<EFBFBD>脩<EFBFBD>憌𡡞埯擃? | <EFBFBD>?頧餅𠹭<E9A485>舀<EFBFBD> |
| <EFBFBD>函蔡<EFBFBD>孵<EFBFBD> | 銝𠹺<EFBFBD> ZIP <20>? | <EFBFBD>券<EFBFBD>?Docker <20>𨅯<EFBFBD> |
| <EFBFBD>臬𢆡<EFBFBD>笔漲 | 敹恬<EFBFBD>< 5蝘𡜐<E89D98> | 颲<EFBFBD>翰嚗?0-20蝘𡜐<E89D98> |
| *餈鞟輕憭齿<EFBFBD>摨? | 雿? | 銝? |
| *<EFBFBD>刻<EFBFBD>摨? | <EFBFBD>?銝齿綫<E9BDBF>? | <EFBFBD>?撘箇<E69298><E7AE87>刻<EFBFBD> |
<EFBFBD>㴓 <20>詨<EFBFBD><E8A9A8>笔<EFBFBD>
1. *蝟餌<EFBFBD>蝥找<EFBFBD>韏𣇉撩憭梧<EFBFBD><EFBFBD>游𦶢<EFBFBD>桅<EFBFBD>嚗?
# <20>函<EFBFBD>隞<EFBFBD><E99A9E>雿輻鍂鈭<E98D82><E988AD>鈭𥕦<E988AD>嚗?import fitz # PyMuPDF <20>?靘肽<E99D98> libmupdf.so, libfreetype.so
import cv2 # OpenCV <20>?靘肽<E99D98> libGL.so.1, libgthread-2.0.so
import polars # Polars <20>?靘肽<E99D98> libgomp.so
**SAE Python 餈鞱<E9A488><E99EB1>?*嚗?```bash <0A>?<3F>芣<EFBFBD>靘𥟇<E99D98><F0A59F87>?Python <20>臬<EFBFBD> <0A>?<3F>䭾<EFBFBD><E4ADBE>扯<EFBFBD> apt-get install <0A>?餈鞱<E9A488><E99EB1>嗆𥁒<E59786>辷<EFBFBD>ImportError: libGL.so.1: cannot open shared object file
**SAE 摰孵膥<E5ADB5>函蔡**嚗?```dockerfile
<0A>?Dockerfile 銝剛䌊<E5899B>勗<EFBFBD>鋆<EFBFBD><E98B86>
RUN apt-get update && apt-get install -y \
libgl1-mesa-glx \
libglib2.0-0 \
libgomp1
2. <EFBFBD>臬<EFBFBD>摰<EFBFBD><EFBFBD><EFBFBD>舀綉
<EFBFBD>砍𧑐撘<EFBFBD><EFBFBD>𤑳㴓憓?= Docker <20>𨅯<EFBFBD> = SAE <20>煺漣<E785BA>臬<EFBFBD>
- <EFBFBD>典銁<EFBFBD>砍𧑐 Docker 銝剛<E98A9D><E5899B>帋<EFBFBD>嚗峕綫<E5B395>?SAE 撠曹<E692A0>摰朞<E691B0>頝煾<E9A09D>?- 瘝⊥<E7989D>"<22>砍𧑐憟賜鍂<E8B39C><E98D82><EFBFBD>銝𦠜𥁒<F0A6A09C>?<3F><>䔮憸?
3. <EFBFBD>拙<EFBFBD><EFBFBD>批撩
<EFBFBD>芣䔉<EFBFBD><EFBFBD>瘙<EFBFBD><EFBFBD>
<20>鎿<EFBFBD> 瘛餃<E7989B> Nougat OCR (<28><>閬?PyTorch + GPU <20>舀<EFBFBD>)
<20>鎿<EFBFBD> 瘛餃<E7989B><E9A483>曉<EFBFBD>憸<EFBFBD><E686B8><EFBFBD>?(<28><>閬?OpenCV)
<20>鎿<EFBFBD> 瘛餃<E7989B><E9A483>游<EFBFBD><E6B8B8><EFBFBD>﹝<EFBFBD>澆<EFBFBD> (<28><>閬<EFBFBD>凒憭𡁶頂蝏笔<E89D8F>)
<20>婙<EFBFBD> 摰孵膥<E5ADB5>函蔡<E587BD>質<EFBFBD>頧餅𠹭<E9A485>舀<EFBFBD>
4. 餈鞟輕蝏煺<EFBFBD>
<EFBFBD>函<EFBFBD><EFBFBD>港<EFBFBD><EFBFBD>嗆<EFBFBD>嚗? <20>鎿<EFBFBD> <20>滨垢 Nginx <20>?SAE 摰孵膥
<20>鎿<EFBFBD> <20>𡒊垢 Node.js <20>?SAE 摰孵膥
<20>婙<EFBFBD> Python <20>滚𦛚 <20>?SAE 摰孵膥 <20>?(蝏煺<E89D8F>蝞∠<E89D9E>)
<EFBFBD>函蔡<EFBFBD>嗆<EFBFBD><EFBFBD>?
<EFBFBD>𢞖<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?<3F>? <20>輸<EFBFBD>鈭烐沲<E78390>? <20>?<3F>? <20>?<3F>? <20>𢞖<EFBFBD><F0A29E96><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>𢞖<EFBFBD><F0A29E96><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>?<3F>? <20>? SAE (<28>𡒊垢) <20>?<3F>𣂼<EFBFBD>蝵爗<E89DB5> <20>? SAE (Python 敺格<E695BA><E6A0BC>? <20>? <20>?<3F>? <20>? <20>? <20>? <20>? <20>?<3F>? <20>? Node.js <20>? <20>? <20>𢞖<EFBFBD><F0A29E96><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>? <20>?<3F>? <20>? Backend <20>? <20>? <20>? Docker 摰孵膥: <20>? <20>? <20>?<3F>? <20>? <20>? <20>? <20>? - FastAPI <20>? <20>? <20>?<3F>? <20>? <20>? <20>? <20>? - PyMuPDF <20>? <20>? <20>?<3F>? <20>? <20>? <20>? <20>? - Polars <20>? <20>? <20>?<3F>? <20>婙<EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>? <20>? - Mammoth <20>? <20>? <20>?<3F>? <20>? <20>? <20>婙<EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>? <20>?<3F>? <20>? <20>婙<EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? <20>?<3F>? <20>? <20>?<3F>? <20>鎿<EFBFBD><E98EBF><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? RDS PostgreSQL 15 <20>?<3F>? <20>婙<EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>? OSS (<28><>﹝摮睃<E691AE>) <20>?<3F>婙<EFBFBD><E5A999><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?```
**<2A>喲睸<E596B2>?*嚗?- Python 敺格<E695BA><E6A0BC>∪<EFBFBD> Node.js <20>𡒊垢<F0A1928A>賡<EFBFBD>蝵脣銁 SAE 銝𠺪<E98A9D><F0A0BAAA>䔶<EFBFBD> VPC嚗?- <20>朞<EFBFBD><E69C9E><EFBFBD><EFBFBD><EFBFBD>帋縑嚗<E7B891>辣<EFBFBD>?< 5ms嚗?- <20>曹澈 RDS <20>?OSS 韏<><E99F8F>
---
## <20>滨蔭<E6BBA8><E894AD><EFBFBD>皜<EFBFBD><E79A9C>
### <20>?敹<><E695B9>韏<EFBFBD><E99F8F>
| 韏<><E99F8F>蝐餃<E89D90> | <20>滨蔭撱箄悅 | 憸<>摯韐寧鍂 | <20>券<EFBFBD>?|
|---------|---------|---------|-----|
| **SAE 摨𠉛鍂** | 1<>?G / 1摰硺<E691B0> | ~100<30>?<3F>?| 餈鞱<E9A488> Python <20>滚𦛚 |
| **摰孵膥<E5ADB5>𨅯<EFBFBD>隞枏<E99A9E>** | <20>輸<EFBFBD>鈭?ACR 銝芯犖<E88AAF>?| <20>滩晶嚗?GB嚗?| 摮睃<E691AE> Docker <20>𨅯<EFBFBD> |
| **OSS 摮睃<E691AE>** | 撌脫<E6928C>嚗<EFBFBD><E59A97><EFBFBD>剁<EFBFBD> | 0<><30><EFBFBD>憓鮋<E68693>嚗?| <20><>﹝摮睃<E691AE> |
| **RDS PostgreSQL** | 撌脫<E6928C>嚗<EFBFBD><E59A97><EFBFBD>剁<EFBFBD> | 0<>?| <20>唳旿摨?|
### <20>?頧臭辣<E887AD><E8BEA3><EFBFBD>
```bash
# <20>砍𧑐撘<F0A79190><E69298>烐㦤<E78390>券<EFBFBD>閬<EFBFBD><E996AC>鋆?- Docker Desktop
- <20>輸<EFBFBD>鈭?CLI嚗<49>虾<EFBFBD>㚁<EFBFBD>
# 銝漤<E98A9D>閬<EFBFBD>銁 SAE 銝𠰴<E98A9D>鋆<EFBFBD>遙雿蓥<E99BBF>镼選<E995BC>摰孵膥撌脣<E6928C><E884A3>恬<EFBFBD>
<EFBFBD>?韐血噡銝擧<E98A9D><E693A7>?
- <EFBFBD>輸<EFBFBD>鈭𤏸揭<EFBFBD>瘀<EFBFBD>撌脫<EFBFBD>嚗?- 摰孵膥<E5ADB5>𨅯<EFBFBD>隞枏<E99A9E>霈輸䔮<E8BCB8><E494AE><EFBFBD>
- SAE 摨𠉛鍂<F0A0899B>𥕦遣<F0A595A6><E981A3><EFBFBD>
Python <20>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>
<EFBFBD><EFBFBD> 敶枏<E695B6><E69E8F>滚𦛚璁<F0A69B9A><E79281>
<EFBFBD>滚𦛚 1: extraction_service嚗<65><E59A97>獢<EFBFBD><E78DA2><EFBFBD>吔<EFBFBD>
雿滨蔭: AIclinicalresearch/extraction_service/
**<2A>券<EFBFBD>?*:
- PKB 璅∪<E79285>: 銝𠹺<E98A9D><F0A0B9BA><EFBFBD>﹝<EFBFBD>?Dify <20>㵪<EFBFBD><E3B5AA><EFBFBD><EFBFBD><EFBFBD>𡝗<EFBFBD><F0A19D97>?- ASL 璅∪<E79285>: <20>𣂼<EFBFBD> PDF <20>冽<EFBFBD><E586BD>其<EFBFBD>瘛勗漲<E58B97><E6BCB2>粉
- DC 璅∪<E79285>: <20>𣂼<EFBFBD> Excel/CSV <20>唳旿
<EFBFBD>詨<EFBFBD><EFBFBD><EFBFBD>辣:
extraction_service/
<0A>鎿<EFBFBD><E98EBF><EFBFBD> main.py # FastAPI <20>亙藁
<0A>鎿<EFBFBD><E98EBF><EFBFBD> requirements.txt # 靘肽<E99D98><E882BD>𡑒”
<0A>鎿<EFBFBD><E98EBF><EFBFBD> services/
<0A>? <20>鎿<EFBFBD><E98EBF><EFBFBD> pdf_extractor.py # PDF <20>𣂼<EFBFBD>嚗<EFBFBD><E59A97>摨血膥嚗?<3F>? <20>鎿<EFBFBD><E98EBF><EFBFBD> pymupdf_extractor.py # PyMuPDF 摰䂿緵
<0A>? <20>鎿<EFBFBD><E98EBF><EFBFBD> nougat_extractor.py # Nougat OCR 摰䂿緵
<0A>? <20>鎿<EFBFBD><E98EBF><EFBFBD> docx_extractor.py # Word <20>𣂼<EFBFBD>
<0A>? <20>婙<EFBFBD><E5A999><EFBFBD> txt_extractor.py # 蝥舀<E89DA5><E88880>祆<EFBFBD><E7A586>?<3F>婙<EFBFBD><E5A999><EFBFBD> operations/
<20>婙<EFBFBD><E5A999><EFBFBD> fillna_operations.py # <20>唳旿皜<E697BF><E79A9C>嚗㇊olars嚗?```
**<2A>喲睸蝡舐<E89DA1>**:
```python
POST /extract/pdf # PDF <20>𣂼<EFBFBD>
POST /extract/docx # Word <20>𣂼<EFBFBD>
POST /extract/txt # <20><>𧋦<EFBFBD>𣂼<EFBFBD>
POST /operations/fillna # <20>唳旿皜<E697BF><E79A9C>
<EFBFBD><EFBFBD> 靘肽<E99D98><E882BD><EFBFBD><EFBFBD>
敶枏<EFBFBD> requirements.txt <20><>捆嚗?
fastapi==0.115.5
uvicorn[standard]==0.32.1
python-multipart==0.0.20
PyMuPDF==1.24.14
pdfplumber==0.11.4
nougat-ocr==0.1.17
torch==2.1.0
torchvision==0.16.0
mammoth==1.8.0
python-docx==1.1.2
langdetect==1.0.9
chardet==5.2.0
polars==1.17.1
numpy==1.26.4
靘肽<EFBFBD>憭批<EFBFBD>憸<EFBFBD>摯嚗?
| <EFBFBD><EFBFBD><EFBFBD> | 憭批<EFBFBD> | <EFBFBD>券<EFBFBD>? | <EFBFBD>臬炏敹<EFBFBD><EFBFBD> |
|---|---|---|---|
| PyMuPDF | ~50MB | PDF <20>𣂼<EFBFBD>嚗<EFBFBD>瓲敹<E793B2><E695B9> | <EFBFBD>?敹<><E695B9> |
| pdfplumber | ~10MB | PDF 銵冽聢<E586BD>𣂼<EFBFBD> | <EFBFBD>𩤃<EFBFBD> <20>舫<EFBFBD>㚁<EFBFBD><E39A81><EFBFBD>𧊋雿輻鍂嚗? |
| nougat-ocr | ~300MB | 摮行钟霈箸<EFBFBD> OCR | <EFBFBD>𩤃<EFBFBD> <20>嗆挾<E59786>改<EFBFBD>閫<EFBFBD><E996AB><EFBFBD><EFBFBD><EFBFBD> |
| torch | ~800MB | Nougat 靘肽<E99D98> | <EFBFBD>𩤃<EFBFBD> <20>嗆挾<E59786>? |
| torchvision | ~100MB | Nougat 靘肽<E99D98> | <EFBFBD>𩤃<EFBFBD> <20>嗆挾<E59786>? |
| mammoth | ~5MB | Word <20>𣂼<EFBFBD> | <EFBFBD>?敹<><E695B9> |
| python-docx | ~3MB | Word <20>𣂼<EFBFBD> | <EFBFBD>?敹<><E695B9> |
| polars | ~50MB | <EFBFBD>唳旿皜<EFBFBD><EFBFBD> | <EFBFBD>?敹<><E695B9> |
| numpy | ~20MB | <EFBFBD>啣<EFBFBD>潸恣蝞? | <EFBFBD>?敹<><E695B9> |
| fastapi | ~10MB | Web 獢<>沲 | <EFBFBD>?敹<><E695B9> |
| uvicorn | ~5MB | ASGI <20>滚𦛚<E6BB9A>? | <EFBFBD>?敹<><E695B9> |
| <EFBFBD>嗡<EFBFBD> | ~10MB | 颲<EFBFBD>𨭌摨? | <EFBFBD>?敹<><E695B9> |
| *<EFBFBD>餉恣嚗<EFBFBD>鉄 Nougat嚗? | ~1.4GB | - | - |
| *<EFBFBD>餉恣嚗<EFBFBD><EFBFBD><EFBFBD>?Nougat嚗? | ~163MB | - | - |
靘肽<EFBFBD>隡睃<EFBFBD>蝑𣇉裦
<EFBFBD>㴓 <20>嗆挾 1嚗𡁏<E59A97>撠誩<E692A0><E8AAA9>函蔡嚗<E894A1>綫<EFBFBD>鞟鍂鈭𡡞<E988AD>甈⊿<E79488>蝵莎<E89DB5>
<EFBFBD>格<EFBFBD>: 敹恍<E695B9>煺<EFBFBD>蝥選<E89DA5>撉諹<E69289><E8ABB9>詨<EFBFBD><E8A9A8>蠘<EFBFBD>
蝑𣇉裦:
- <EFBFBD>?靽萘<E99DBD> PyMuPDF嚗<46>瓲敹?PDF <20>𣂼<EFBFBD>嚗?- <20>?靽萘<E99DBD> Mammoth/python-docx嚗Áord <20>𣂼<EFBFBD>嚗?- <20>?靽萘<E99DBD> Polars嚗<73>㺭<EFBFBD>格<EFBFBD>瘣梹<E798A3>
- <EFBFBD>?<3F><>𧒄蝘駁膄 Nougat嚗<74><E59A97>蝘臬之嚗䔶蝙<E494B6>券<EFBFBD><E588B8><EFBFBD><EFBFBD>嚗?
隡睃<EFBFBD><EFBFBD>𡒊<EFBFBD>
requirements.txt:
# Web 獢<>沲
fastapi==0.115.5
uvicorn[standard]==0.32.1
python-multipart==0.0.20
# <20><>﹝<EFBFBD>𣂼<EFBFBD>嚗<EFBFBD>瓲敹<E793B2><E695B9>
PyMuPDF==1.24.14
mammoth==1.8.0
python-docx==1.1.2
# <20>唳旿憭<E697BF><E686AD>
polars==1.17.1
numpy==1.26.4
# 颲<>𨭌撌亙<E6928C>
langdetect==1.0.9
chardet==5.2.0
# <20>亙<EFBFBD><E4BA99>𣬚<EFBFBD><F0A3AC9A>?python-json-logger==2.0.7
<EFBFBD>𨅯<EFBFBD>憭批<EFBFBD>憸<EFBFBD>摯: ~500MB嚗<42>鉄 Python <20>箇<EFBFBD><E7AE87>𨅯<EFBFBD>嚗? 隞<EFBFBD><EFBFBD>靽格㺿:
# services/pdf_extractor.py
# 瘜券<E7989C><E588B8>?Nougat <20>詨<EFBFBD>隞<EFBFBD><E99A9E>
# from .nougat_extractor import extract_pdf_nougat, check_nougat_available
async def extract_pdf(pdf_path: str, filename: str):
"""PDF <20>𣂼<EFBFBD>嚗<EFBFBD>𧫴畾?嚗帋<E59A97> PyMuPDF嚗?""
# 璉<>瘚贝祗閮<E7A597><E996AE>峕<EFBFBD>獢<EFBFBD>掩<EFBFBD>? language = detect_language(pdf_path)
is_academic = detect_academic_paper(pdf_path)
# <20>嗆挾1嚗𡁶凒<F0A181B6>乩蝙<E4B9A9>?PyMuPDF
text = extract_pdf_pymupdf(pdf_path)
# <20>嗆挾2嚗𡁜虾隞亙<E99A9E><E4BA99>?Nougat <20>滨漣<E6BBA8>餉<EFBFBD>
# if language == 'english' and is_academic:
# try:
# if check_nougat_available():
# text = extract_pdf_nougat(pdf_path)
# except:
# text = extract_pdf_pymupdf(pdf_path) # <20>滨漣
return {
'text': text,
'method': 'pymupdf',
'language': language,
'is_academic': is_academic
}
<EFBFBD>㴓 <20>嗆挾 2嚗𡁜<E59A97><F0A1819C>湧<EFBFBD>蝵莎<E89DB5><E88E8E>芣䔉<E88AA3><E49489>閬<EFBFBD>𧒄嚗?
<EFBFBD>嗆㦤:
- 敶梶鍂<EFBFBD>瑕<EFBFBD>擐<EFBFBD>㘚<EFBFBD><EFBFBD>郎<EFBFBD>航捏<EFBFBD><EFBFBD><EFBFBD><EFBFBD>𤥁捶<EFBFBD>譍<EFBFBD>雿單𧒄
- <EFBFBD>㕑雲憭毺<EFBFBD> GPU 韏<><E99F8F><EFBFBD>? 蝑𣇉裦:
- <EFBFBD>?<3F>惩<EFBFBD> Nougat + Torch
- <EFBFBD>?雿輻鍂 GPU 摰硺<E691B0>嚗𠄎AE <20>桀<EFBFBD>銝齿𣈲<E9BDBF>?GPU嚗屸<E59A97>餈<EFBFBD>宏<EFBFBD>?ECS嚗?
摰峕㟲<EFBFBD>?
requirements.txt:
# <20>W<EFBFBD><EFBCB7>券<EFBFBD>靘肽<E99D98>嚗<EFBFBD><E59A97><EFBFBD>?Nougat嚗?fastapi==0.115.5
uvicorn[standard]==0.32.1
python-multipart==0.0.20
PyMuPDF==1.24.14
pdfplumber==0.11.4
nougat-ocr==0.1.17
torch==2.1.0
torchvision==0.16.0
mammoth==1.8.0
python-docx==1.1.2
langdetect==1.0.9
chardet==5.2.0
polars==1.17.1
numpy==1.26.4
<EFBFBD>𨅯<EFBFBD>憭批<EFBFBD>憸<EFBFBD>摯: ~2GB
<EFBFBD><EFBFBD>遣 Docker <20>𨅯<EFBFBD>
甇仿炊 1嚗𡁜<E59A97>撱箔<E692B1><E7AE94>𣇉<EFBFBD> Dockerfile
<EFBFBD>?extraction_service/ <20>桀<EFBFBD>銝见<E98A9D>撱?Dockerfile:
# ========================================
# 憭𡁻𧫴畾菜<E795BE>撱綽<E692B1><E7B6BD>誩<EFBFBD><E8AAA9>𨅯<EFBFBD>雿梶妖
# ========================================
# <20>嗆挾 1: <20><>遣<EFBFBD>嗆挾嚗<E68CBE><E59A97>鋆<EFBFBD><E98B86>韏吔<E99F8F>
FROM python:3.11-slim as builder
# 霈曄蔭撌乩<E6928C><E4B9A9>桀<EFBFBD>
WORKDIR /app
# 摰㕑<E691B0>蝟餌<E89D9F>靘肽<E99D98>嚗<EFBFBD><E59A97>撱箸𧒄<E7AEB8><F0A79284>閬<EFBFBD><E996AC>
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc \
g++ \
make \
libffi-dev \
libssl-dev \
&& rm -rf /var/lib/apt/lists/*
# 憭滚<E686AD>靘肽<E99D98><E882BD><EFBFBD>辣
COPY requirements.txt .
# 摰㕑<E691B0> Python 靘肽<E99D98><E882BD>啗<EFBFBD><E59597>毺㴓憓?RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r requirements.txt
# ========================================
# <20>嗆挾 2: 餈鞱<E9A488><E99EB1>嗆挾嚗<E68CBE><E59A97>撠誩<E692A0><E8AAA9>𨅯<EFBFBD>嚗?# ========================================
FROM python:3.11-slim
# 霈曄蔭撌乩<E6928C><E4B9A9>桀<EFBFBD>
WORKDIR /app
# 摰㕑<E691B0>餈鞱<E9A488><E99EB1>嗡<EFBFBD>韏吔<E99F8F>蝟餌<E89D9F>蝥批<E89DA5> + <20>嗅躹<E59785>唳旿嚗?RUN apt-get update && apt-get install -y --no-install-recommends \
# PyMuPDF 靘肽<E99D98>
# 瘜剁<E7989C>libmupdf-dev <20>𡁜虜<F0A1819C>其<EFBFBD>蝻𤥁<E89DBB>嚗俰ip 摰㕑<E691B0><E39591>?PyMuPDF wheel <20><>歇<EFBFBD>芸蒂<E88AB8>冽<EFBFBD><E586BD><EFBFBD>
# 靽萘<E99DBD>摰<EFBFBD><E691B0>銝箔<E98A9D><E7AE94>抬<EFBFBD>憒<EFBFBD><E68692><EFBFBD>西澈<E8A5BF>臬<EFBFBD>霂閧宏<E996A7>文<EFBFBD>撉諹<E69289>
libmupdf-dev \
libfreetype6 \
libjpeg62-turbo \
libopenjp2-7 \
# Polars 靘肽<E99D98>
libgomp1 \
# <20>嗡<EFBFBD>撌亙<E6928C>
curl \
# <20>嗅躹<E59785>唳旿
tzdata \
&& rm -rf /var/lib/apt/lists/*
# <20>𩤃<EFBFBD> 蝏煺<E89D8F><E785BA>嗅躹嚗鋫sia/Shanghai
ENV TZ=Asia/Shanghai
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
# 蝖桐<E89D96>銝湔𧒄<E6B994>桀<EFBFBD><E6A180>臬<EFBFBD>嚗<EFBFBD>之<EFBFBD><E4B98B>辣銝𠹺<E98A9D><F0A0B9BA>園<EFBFBD>閬<EFBFBD><E996AC>
RUN mkdir -p /tmp && chmod 1777 /tmp
# 隞擧<E99A9E>撱粹𧫴畾萄<E795BE><E89084>嗉<EFBFBD><E59789>毺㴓憓?COPY --from=builder /opt/venv /opt/venv
# 憭滚<E686AD>摨𠉛鍂隞<E98D82><E99A9E>
COPY . .
# 霈曄蔭<E69B84>臬<EFBFBD><E887AC>㗛<EFBFBD>
ENV PATH="/opt/venv/bin:$PATH" \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PORT=8000
# <20>湧蠧蝡臬藁
EXPOSE 8000
# <20>亙熒璉<E78692><E79289>?HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# <20>臬𢆡<E887AC>賭誘
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
甇仿炊 2嚗𡁜<E59A97>撱?.dockerignore
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
env/
ENV/
# IDE
.vscode/
.idea/
*.swp
*.swo
# 瘚贝<E7989A><E8B49D>峕<EFBFBD>獢?tests/
test_files/
*.md
README.md
# Git
.git/
.gitignore
# <20>亙<EFBFBD>
*.log
# 銝湔𧒄<E6B994><F0A79284>辣
tmp/
temp/
甇仿炊 3嚗𡁏𧋦<F0A1818F>唳<EFBFBD>撱粹<E692B1><E7B2B9>?
# 餈𥕦<E9A488> extraction_service <20>桀<EFBFBD>
cd d:\MyCursor\AIclinicalresearch\extraction_service
# <20><>遣<EFBFBD>𨅯<EFBFBD>嚗<EFBFBD>𧋦<EFBFBD>唳<EFBFBD>霂𤏪<E99C82>
docker build -t extraction-service:latest .
# <20>亦<EFBFBD><E4BAA6>𨅯<EFBFBD>憭批<E686AD>
docker images extraction-service
甇仿炊 4嚗𡁏𧋦<F0A1818F>唳<EFBFBD>霂閖<E99C82><E99696>?
# <20>臬𢆡摰孵膥嚗<E886A5>𧋦<EFBFBD>唳<EFBFBD>霂𤏪<E99C82>
docker run -d \
--name extraction-test \
-p 8000:8000 \
-e DATABASE_URL="postgresql://user:pass@host:5432/dbname" \
extraction-service:latest
# <20>亦<EFBFBD><E4BAA6>亙<EFBFBD>
docker logs -f extraction-test
# 瘚贝<E7989A><E8B49D>亙熒璉<E78692><E79289>?curl http://localhost:8000/health
# 瘚贝<E7989A> PDF <20>𣂼<EFBFBD>
curl -X POST \
-F "file=@test.pdf" \
http://localhost:8000/extract/pdf
# <20>𨀣迫撟嗅<E6929F><E59785>斗<EFBFBD>霂訫捆<E8A8AB>?docker stop extraction-test
docker rm extraction-test
甇仿炊 5嚗𡁏綫<F0A1818F><E7B6AB><EFBFBD><EFBFBD>輸<EFBFBD>鈭穃捆<E7A983>券<EFBFBD><E588B8>譍<EFBFBD>摨?
5.1 <20>𥕦遣<F0A595A6>𨅯<EFBFBD>隞枏<E99A9E>嚗<EFBFBD><E59A97>甈⊿<E79488>蝵莎<E89DB5>
-
<EFBFBD>餃<EFBFBD><EFBFBD>輸<EFBFBD>鈭烐綉<EFBFBD>嗅蝱 <20>?摰孵膥<EFBFBD>𨅯<EFBFBD><EFBFBD>滚𦛚 ACR
-
**<2A>𥕦遣銝芯犖摰硺<E691B0>**嚗<><E59A97>韐寧<E99F90>嚗?
摰硺<EFBFBD><EFBFBD>滨妍: extraction-service <0A>啣<EFBFBD>: <20>𦒘<EFBFBD>1嚗<31>㜺撌痹<E6928C> -
<EFBFBD>𥕦遣<EFBFBD>賢<EFBFBD>蝛粹𡢿:
<EFBFBD>賢<EFBFBD>蝛粹𡢿: clinical-research -
<EFBFBD>𥕦遣<EFBFBD>𨅯<EFBFBD>隞枏<EFBFBD>:
隞枏<EFBFBD><EFBFBD>滨妍: extraction-service 隞<><E99A9E>皞? <20>砍𧑐隞枏<E99A9E>
5.2 <20>券<EFBFBD><E588B8><EFBFBD><EFBFBD>?
# 1. <20>餃<EFBFBD><E9A483>輸<EFBFBD>鈭穃捆<E7A983>券<EFBFBD><E588B8>𤩺<EFBFBD><F0A4A9BA>?# <20>瑕<EFBFBD><E79195>餃<EFBFBD><E9A483>賭誘嚗𡁻燵<F0A181BB>䔶<EFBFBD><E494B6>批<EFBFBD><E689B9>?<3F>?摰孵膥<E5ADB5>𨅯<EFBFBD><F0A885AF>滚𦛚 <20>?霈輸䔮<E8BCB8>剛<EFBFBD> <20>?霈曄蔭Registry<72>餃<EFBFBD>撖<EFBFBD><E69296>
docker login --username=<your-username> registry.cn-beijing.aliyuncs.com
# 2. 蝏䠷<E89D8F><E4A0B7>𤩺<EFBFBD><F0A4A9BA><EFBFBD>倌
docker tag extraction-service:latest \
registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.0
# 3. <20>券<EFBFBD><E588B8><EFBFBD><EFBFBD>輸<EFBFBD>鈭?docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.0
# 4. <20>券<EFBFBD>?latest <20><>倌嚗<E5808C>噶鈭𤾸<E988AD>蝏剜凒<E5899C>堆<EFBFBD>
docker tag extraction-service:latest \
registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:latest
docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:latest
<EFBFBD>函蔡<EFBFBD>?SAE
甇仿炊 1嚗𡁜<E59A97>撱?SAE 摨𠉛鍂
-
<EFBFBD>餃<EFBFBD><EFBFBD>輸<EFBFBD>鈭烐綉<EFBFBD>嗅蝱 <20>?Serverless 摨𠉛鍂撘閙<E69298> SAE
-
<EFBFBD>𥕦遣摨𠉛鍂:
摨𠉛鍂<EFBFBD>滨妍: extraction-service <0A>賢<EFBFBD>蝛粹𡢿: <20>㗇𥋘<E39787>𡒊垢<F0A1928A><E59EA2><EFBFBD>函<EFBFBD><E587BD>賢<EFBFBD>蝛粹𡢿嚗<F0A1A2BF><E59A97> VPC嚗? <20>函蔡<E587BD>孵<EFBFBD>: <20>𨅯<EFBFBD> -
<EFBFBD>𨅯<EFBFBD><EFBFBD>滨蔭:
<EFBFBD>𨅯<EFBFBD><EFBFBD>啣<EFBFBD>: registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:latest <0A>𨅯<EFBFBD><F0A885AF><EFBFBD>𧋦: latest <0A>𨅯<EFBFBD><F0A885AF>匧<EFBFBD>蝑𣇉裦: Always嚗<73><E59A97>甈⊿<E79488>蝵脤<E89DB5><E884A4>匧<EFBFBD><E58CA7><EFBFBD><EFBFBD>圈<EFBFBD><E59C88>𧶏<EFBFBD> -
閫<EFBFBD>聢<EFBFBD>滨蔭:
CPU: 1<>? <20><><EFBFBD>: 2GB 摰硺<E691B0><E7A1BA>? 1嚗<31><E59A97>憪页<E686AA> 撘寞<E69298>扳<EFBFBD>蝻拙捆: - <20><>撠誩<E692A0>靘𧢲㺭: 1 - <20><>憭批<E686AD>靘𧢲㺭: 3 - CPU 閫血<E996AB><E8A180><EFBFBD><EFBFBD>? 70% -
蝵𤑳<EFBFBD><EFBFBD>滨蔭:
銝𤘪<EFBFBD>蝵𤑳<EFBFBD> VPC: <20>㗇𥋘<E39787>𡒊垢<F0A1928A><E59EA2><EFBFBD>函<EFBFBD> VPC vSwitch: <20>㗇𥋘<E39787>𡒊垢<F0A1928A><E59EA2><EFBFBD>函<EFBFBD>鈭斗揢<E69697>? 摰匧<E691B0>蝏? <20><>捂 VPC <20><>挪<EFBFBD>? ```
甇仿炊 2嚗𡁻<E59A97>蝵桃㴓憓<E3B493><E68693><EFBFBD>?
<EFBFBD>?SAE 摨𠉛鍂<F0A0899B>滨蔭銝剜溶<E5899C>牐誑銝讠㴓憓<E3B493><E68693><EFBFBD>𧶏<EFBFBD>
# ========= <20>唳旿摨㯄<E691A8>蝵?=========
DATABASE_URL=postgresql://user:password@rm-xxxx.pg.rds.aliyuncs.com:5432/clinical_research
# ========= 摮睃<E691AE><E79D83>滨蔭 =========
OSS_ENDPOINT=oss-cn-hangzhou-internal.aliyuncs.com
OSS_BUCKET=your-bucket-name
OSS_ACCESS_KEY_ID=<your-id>
OSS_ACCESS_KEY_SECRET=<your-secret>
# ========= <20>滚𦛚<E6BB9A>滨蔭 =========
SERVICE_NAME=extraction-service
SERVICE_VERSION=v1.0
LOG_LEVEL=INFO
# ========= <20>扯<EFBFBD><E689AF>滨蔭 =========
WORKERS=2
TIMEOUT=300
MAX_FILE_SIZE=52428800
# ========= <20>嗅躹 =========
TZ=Asia/Shanghai
甇仿炊 3嚗𡁻<E59A97>蝵桀<E89DB5>摨瑟<E691A8><E7919F>?
<EFBFBD>亙熒璉<EFBFBD><EFBFBD>亥楝敺? /health
<0A>亙熒璉<E78692><E79289>亦垢<E4BAA6>? 8000
<0A>亙熒璉<E78692><E79289>亙<EFBFBD>霈? HTTP
<0A>嘥<EFBFBD>撱嗉<E692B1>: 30蝘?璉<><E79289>仿𡢿<E4BBBF>? 10蝘?頞<>𧒄<EFBFBD>園𡢿: 5蝘?<3F>亙熒<E4BA99><E78692><EFBFBD>? 2甈?銝滚<E98A9D>摨琿<E691A8><E790BF>? 3甈?```
### 甇仿炊 4嚗𡁻<E59A97>蝵格𠯫敹?
```bash
<0A>亙<EFBFBD><E4BA99>桀<EFBFBD>: /app/logs
<0A>亙<EFBFBD><E4BA99><EFBFBD>辣: extraction-service.log
<0A>亙<EFBFBD>蝥批<E89DA5>: INFO
<0A>亙<EFBFBD>靽萘<E99DBD>憭拇㺭: 7憭?```
### 甇仿炊 5嚗𡁻<E59A97>蝵?SLB嚗<42>虾<EFBFBD>㚁<EFBFBD>憒<EFBFBD><E68692><EFBFBD><EFBFBD>閬<EFBFBD><E996AC>蝵𤏸挪<F0A48FB8>殷<EFBFBD>
```bash
# <20>𡁜虜 Python 敺格<E695BA><E6A0BC>∪蘨<E288AA><E898A8>閬<EFBFBD><E996AC>蝵𤏸挪<F0A48FB8>殷<EFBFBD>鋡怠<E98BA1>蝡航<E89DA1><E888AA>剁<EFBFBD>
# 憒<><E68692><EFBFBD><EFBFBD>閬<EFBFBD><E996AC>蝵𤏸挪<F0A48FB8>殷<EFBFBD>憒<EFBFBD><E68692>靚<EFBFBD><E99D9A><EFBFBD><EFBFBD>洵銝㗇䲮<E39787><E4B2AE><EFBFBD>嚗㚁<E59A97>
韐蠘蝸<E8A098><E89DB8>﹛蝐餃<E89D90>: <20>祉<EFBFBD>
<0A>穃𨯬蝡臬藁: 80
<0A>𡒊垢蝡臬藁: 8000
<0A>亙熒璉<E78692><E79289>? <20>舐鍂
甇仿炊 6嚗𡁻<E59A97>蝵脣<E89DB5><E884A3>?
-
<EFBFBD>孵稬"<22>函蔡摨𠉛鍂"
-
**蝑匧<E89D91><E58CA7>函蔡摰峕<E691B0>**嚗<>漲 2-3 <20><><EFBFBD>嚗?
-
<EFBFBD>亦<EFBFBD><EFBFBD>函蔡<EFBFBD>亙<EFBFBD>:
[INFO] Pulling image... [INFO] Image pulled successfully [INFO] Starting container... [INFO] Container started successfully [INFO] Health check passed [INFO] Application is running
瘚贝<EFBFBD>銝𡡞<EFBFBD>霂?
甇仿炊 1嚗朞繮<E69C9E>硋<EFBFBD>蝵穃𧑐<E7A983><F0A79190>嚗<EFBFBD><E59A97><EFBFBD>格郊撉歹<E69289>
<EFBFBD>𩤃<EFBFBD> <20>滩<EFBFBD>嚗锭AE 摰硺<E691B0><E7A1BA>湔糓頝其蜓<E585B6>箇<EFBFBD>嚗<EFBFBD><E59A97>憿颱蝙<E9A2B1>?SAE <20>𣂷<EFBFBD><F0A382B7><EFBFBD><EFBFBD>蝵穃𧑐<E7A983><F0A79190>
<EFBFBD>瑕<EFBFBD><EFBFBD>笔<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>啣<EFBFBD><EFBFBD><EFBFBD>迤蝖格䲮瘜𤏪<EFBFBD>
-
<EFBFBD>餃<EFBFBD> SAE <20>批<EFBFBD><E689B9>? <20>?摨𠉛鍂<EFBFBD>𡑒” <20>?<EFBFBD>孵稬 extraction-service 摨𠉛鍂*
-
<EFBFBD>典<EFBFBD><EFBFBD>刻祕<EFBFBD><EFBFBD>△嚗峕𪄳<EFBFBD>?摨𠉛鍂霈輸䔮<E8BCB8>滨蔭"<22>?VPC <20><><EFBFBD>霈輸䔮"<22>典<EFBFBD>
-
**<2A>亦<EFBFBD>撟嗅<E6929F><E59785>?<3F><><EFBFBD>霈輸䔮<E8BCB8>啣<EFBFBD>"**嚗屸<E59A97>𡁜虜<F0A1819C>臭誑銝𧢲聢撘譍<E69298>銝<EFBFBD>嚗? ```
<EFBFBD>澆<EFBFBD> 1: <20><><EFBFBD> IP + 蝡臬藁嚗<E89781><E59A97>潃鐥<E6BD83>潃鐥<E6BD83> 撘箇<E69298><E7AE87>刻<EFBFBD>嚗峕<E59A97>蝔喳<E89D94>嚗? 172.17.x.x:8000
<EFBFBD>澆<EFBFBD> 2: SAE <20><><EFBFBD> Service <20>笔<EFBFBD>嚗<EFBFBD><E59A97>閬<EFBFBD><E996AC>憭㚚<E686AD>蝵格<E89DB5><E6A0BC>∪<EFBFBD><E288AA>堆<EFBFBD>銝齿綫<E9BDBF>琜<EFBFBD>
extraction-service-xxxxx.cn-hangzhou.sae.aliyuncs.com:8000
<EFBFBD>澆<EFBFBD> 3: K8s Service <20>笔<EFBFBD>嚗<EFBFBD><E59A97>閬<EFBFBD><E996AC>蝵堉8s<38>滚𦛚<E6BB9A>𤑳緵嚗<E7B7B5><E59A97><EFBFBD><EFBFBD><EFBFBD>銝齿綫<E9BDBF>琜<EFBFBD>
extraction-service.namespace.svc.cluster.local:8000
-
**<2A>?<3F>躰秤<E8BAB0>𡁏<EFBFBD>嚗<EFBFBD><E59A97>撖潸稲餈墧𦻖憭梯揖嚗?*嚗? ```bash
<EFBFBD>?銝滩<E98A9D><E6BBA9>𨀣<EFBFBD><F0A880A3>硋<EFBFBD>霈曉<E99C88><E69B89>㵪<EFBFBD>100%憭梯揖嚗? http://extraction-service.sae:8000 # .sae <20>笔<EFBFBD>銝滚<E98A9D><E6BB9A>? http://extraction-service.internal:8000 # .internal <20>笔<EFBFBD>銝滚<E98A9D><E6BB9A>? http://extraction-service.cluster.local:8000 # <20><>閬<EFBFBD>8s<38>滚𦛚<E6BB9A>𤑳緵<F0A491B3>滨蔭
<EFBFBD>?銝滩<E98A9D>雿輻鍂 localhost
http://localhost:8000 # SAE 摰硺<E691B0><E7A1BA>湔糓頝其蜓<E585B6>箇<EFBFBD>
<EFBFBD>?銝滩<E98A9D>雿輻鍂 Docker <20>滚𦛚<E6BB9A>? http://extraction-service:8000 # 餈嗘<E9A488><E59798>臬<EFBFBD><E887AC>?Docker Compose
-
**<2A>?<3F>刻<EFBFBD><E588BB>𡁏<EFBFBD>嚗<EFBFBD><E59A97>隡睃<E99AA1>蝥扳<E89DA5>摨𧶏<E691A8>**嚗? ```bash
潃鐥<EFBFBD>潃鐥<EFBFBD>潃?<3F>寞<EFBFBD>A嚗𡁶凒<F0A181B6>乩蝙<E4B9A9>典<EFBFBD>蝵飡P嚗<50>撩<EFBFBD><E692A9>綫<EFBFBD>琜<EFBFBD>
EXTRACTION_SERVICE_URL=http://172.17.x.x:8000
<EFBFBD>瑕<EFBFBD><EFBFBD>孵<EFBFBD>嚗锭AE<EFBFBD>批<EFBFBD><EFBFBD>?> Python摨𠉛鍂 > 摰硺<E691B0><E7A1BA>𡑒” > <20>亦<EFBFBD><E4BAA6><EFBFBD><EFBFBD>IP
潃鐥<EFBFBD>潃?<3F>寞<EFBFBD>B嚗帋蝙<E5B88B>沒AE<41>滚𦛚<E6BB9A>𤑳緵嚗<E7B7B5><E59A97>閬<EFBFBD><E996AC>憭㚚<E686AD>蝵殷<E89DB5>銝齿綫<E9BDBF>𣂼<EFBFBD><F0A382BC>煺蝙<E785BA>剁<EFBFBD>
<EFBFBD><EFBFBD>閬<EFBFBD>銁SAE<EFBFBD>批<EFBFBD><EFBFBD>圈<EFBFBD>蝵?敺格<E695BA><E6A0BC>⊥釣<E28AA5>䔶葉敹?
EXTRACTION_SERVICE_URL=http://extraction-service-xxxxx.cn-hangzhou.sae.aliyuncs.com:8000
甇仿炊 2嚗𡁻<E59A97>蝵桀<E89DB5>蝡舐㴓憓<E3B493><E68693><EFBFBD>?
<EFBFBD>?SAE <20>𡒊垢摨𠉛鍂<F0A0899B><E98D82>㴓憓<E3B493><E68693><EFBFBD>譍葉瘛餃<E7989B>嚗?
# <20>𩤃<EFBFBD> 雿輻鍂 SAE <20>批<EFBFBD><E689B9>唳遬蝷箇<E89DB7><E7AE87>笔<EFBFBD><E7AC94><EFBFBD><EFBFBD><EFBFBD>啣<EFBFBD>
EXTRACTION_SERVICE_URL=http://172.17.x.x:8000
# 瘜冽<E7989C>嚗?# 1. 銝滩<E98A9D>雿輻鍂<E8BCBB>𨀣<EFBFBD><F0A880A3><EFBFBD><EFBFBD><EFBFBD>?# 2. 敹<>◆隞?SAE <20>批<EFBFBD><E689B9>啁<EFBFBD>"摨𠉛鍂霈輸䔮<E8BCB8>滨蔭"銝剛繮<E5899B>?# 3. 憒<><E68692> IP <20>睃<EFBFBD>嚗<EFBFBD><E59A97><EFBFBD>齿鰵<E9BDBF>函蔡嚗㚁<E59A97><E39A81><EFBFBD>閬<EFBFBD><E996AC>甇交凒<E4BAA4>啗<EFBFBD>銝芰㴓憓<E3B493><E68693><EFBFBD>?```
**<2A>滨蔭<E6BBA8>𡡞<EFBFBD><F0A1A19E>臬<EFBFBD>蝡臬<E89DA1><E887AC>?*嚗?- SAE <20>批<EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?<3F>滚鍳
### 甇仿炊 3嚗帋<E59A97><E5B88B>𡒊垢<F0A1928A>滚𦛚瘚贝<E7989A>
<0A>冽<EFBFBD><E586BD>?Node.js <20>𡒊垢<F0A1928A>滚𦛚銝剜溶<E5899C>䭾<EFBFBD>霂閧垢<E996A7>對<EFBFBD>
```typescript
// backend/src/tests/test-extraction-service.ts
import axios from 'axios';
import FormData from 'form-data';
import fs from 'fs';
const EXTRACTION_SERVICE_URL = process.env.EXTRACTION_SERVICE_URL || 'http://extraction-service.internal:8000';
export async function testExtractionService() {
try {
// 1. <20>亙熒璉<E78692><E79289>? console.log('Testing health endpoint...');
const healthRes = await axios.get(`${EXTRACTION_SERVICE_URL}/health`);
console.log('Health check:', healthRes.data);
// 2. 瘚贝<E7989A> PDF <20>𣂼<EFBFBD>
console.log('Testing PDF extraction...');
const form = new FormData();
form.append('file', fs.createReadStream('./test.pdf'));
const pdfRes = await axios.post(
`${EXTRACTION_SERVICE_URL}/extract/pdf`,
form,
{ headers: form.getHeaders() }
);
console.log('PDF extraction result:', pdfRes.data);
// 3. 瘚贝<E7989A> Word <20>𣂼<EFBFBD>
console.log('Testing Word extraction...');
const form2 = new FormData();
form2.append('file', fs.createReadStream('./test.docx'));
const docxRes = await axios.post(
`${EXTRACTION_SERVICE_URL}/extract/docx`,
form2,
{ headers: form2.getHeaders() }
);
console.log('Word extraction result:', docxRes.data);
console.log('<27>?All tests passed!');
} catch (error) {
console.error('<27>?Test failed:', error.message);
if (error.response) {
console.error('Response:', error.response.data);
}
}
}
甇仿炊 4嚗𡁻<E59A97>霂<EFBFBD>垢<EFBFBD>啁垢瘚<E59EA2><E7989A>嚗<EFBFBD><E59A97><EFBFBD>港<EFBFBD><E6B8AF>∪㦤<E288AA>荔<EFBFBD>
瘚贝<EFBFBD>隞乩<EFBFBD>銝𡁜𦛚瘚<EFBFBD><EFBFBD>嚗?
<EFBFBD>箸艶 1: PKB <20><>﹝銝𠹺<E98A9D>
**銝𡁜𦛚瘚<F0A69B9A><E7989A>**嚗?``` <0A>冽<EFBFBD>銝𠹺<E98A9D> PDF <0A>?Node.js <20>𡒊垢<F0A1928A>交𤣰 <0A>?HTTP POST 頧砍<E9A0A7><E7A08D><EFBFBD>辣瘚<E8BEA3><E7989A> Python <20>滚𦛚 (EXTRACTION_SERVICE_URL) <0A>?Python <20>滚𦛚閫<F0A69B9A><E996AB> PDF嚗諹<E59A97><E8ABB9>?JSON <20><>𧋦 <0A>?Node.js <20>𡒊垢<F0A1928A>嗅<EFBFBD><E59785><EFBFBD>𧋦 <0A>?銝𠹺<E98A9D><F0A0B9BA>?Dify <0A>?餈𥪜<E9A488><F0A5AA9C>滨垢
**瘚贝<E7989A>甇仿炊**嚗?1. <20>典<EFBFBD>蝡臭<E89DA1>隡牐<E99AA1>銝?PDF <20><>﹝嚗<EFB99D>遣霈?< 5MB <20><><EFBFBD><EFBFBD>閙<EFBFBD>獢<EFBFBD><E78DA2>
2. **<2A>亦<EFBFBD> Node.js <20>𡒊垢<F0A1928A>亙<EFBFBD>**嚗𠄎AE <20>批<EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?<3F>亙<EFBFBD>嚗㚁<E59A97>
[INFO] Calling extraction service: http://172.17.x.x:8000/extract/pdf [INFO] Extraction completed in 2.3s [INFO] Extracted text preview: "This is a test document..."
3. **<2A>亦<EFBFBD> Python <20>滚𦛚<E6BB9A>亙<EFBFBD>**嚗𠄎AE <20>批<EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?<3F>亙<EFBFBD>嚗㚁<E59A97>
INFO: Request: POST /extract/pdf INFO: File size: 1.2MB, filename: test.pdf INFO: Using PyMuPDF extraction INFO: Response: 200 (took 2.10s)
4. **<2A>?Dify Web UI 銝剔&霈斗<E99C88>獢<EFBFBD>歇銝𠹺<E98A9D>**
**憒<><E68692>憭梯揖嚗峕<E59A97><E5B395>?*嚗?- <20>𡒊垢<F0A1928A>亙<EFBFBD><E4BA99>臬炏<E887AC>曄內 "Connection refused" <20>?璉<><E79289>?EXTRACTION_SERVICE_URL <20>滨蔭
- Python <20>亙<EFBFBD><E4BA99>臬炏<E887AC>曄內 "ImportError" <20>?璉<><E79289>?Dockerfile 蝟餌<E89D9F>靘肽<E99D98>
- <20>𣂼<EFBFBD>頞<EFBFBD>𧒄嚗? 300s嚗争<E59A97> <20><>辣憭芸之<E88AB8>㚚<EFBFBD>閬<EFBFBD><E996AC><EFBFBD>㰘<EFBFBD><E3B098>園<EFBFBD>蝵?
#### <20>箸艶 2: ASL 瘛勗漲<E58B97><E6BCB2>粉
<EFBFBD>冽<EFBFBD><EFBFBD>孵稬"瘛勗漲<E58B97><E6BCB2>粉" <20>?<3F>𡒊垢靚<E59EA2>鍂 Python <20>滚𦛚<E6BB9A>𣂼<EFBFBD><F0A382BC>冽<EFBFBD> <20>?餈𥪜<E9A488> LLM <20><><EFBFBD>蝏𤘪<E89D8F>
**瘚贝<E7989A>甇仿炊**嚗?1. <20>?ASL 璅∪<E79285><E288AA>孵稬"瘛勗漲<E58B97><E6BCB2>粉"
2. <20>亦<EFBFBD><E4BAA6>𡒊垢<F0A1928A>亙<EFBFBD>嚗<EFBFBD>&霈方<E99C88><E696B9>?Python <20>滚𦛚嚗?3. <20>亦<EFBFBD> Python <20>滚𦛚<E6BB9A>亙<EFBFBD>嚗<EFBFBD>&霈斗<E99C88><E69697>𡝗<EFBFBD><F0A19D97><EFBFBD><EFBFBD>
4. <20>滨垢<E6BBA8>曄內<E69B84><E585A7><EFBFBD>蝏𤘪<E89D8F>
#### <20>箸艶 3: DC <20>唳旿皜<E697BF><E79A9C>
<EFBFBD>冽<EFBFBD>銝𠹺<EFBFBD> Excel <20>?<3F>𡒊垢靚<E59EA2>鍂 Python <20>滚𦛚 fillna <20>?餈𥪜<E9A488>皜<EFBFBD><E79A9C><EFBFBD>擧㺭<E693A7>?```
瘚贝<EFBFBD>甇仿炊嚗?1. <20>?DC 璅∪<E79285>銝𠹺<E98A9D> Excel <20><>辣 2. <20>扯<EFBFBD> fillna <20>滢<EFBFBD> 3. <20>亦<EFBFBD> Python <20>滚𦛚<E6BB9A>亙<EFBFBD> 4. 撉諹<E69289>皜<EFBFBD><E79A9C>蝏𤘪<E89D8F>
<EFBFBD>烐綉銝𡒊輕<EFBFBD>?
<EFBFBD><EFBFBD> SAE <20>芸蒂<E88AB8>烐綉
1. <20>亦<EFBFBD>摨𠉛鍂<F0A0899B>烐綉
SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>烐綉
**<2A>喲睸<E596B2><E79DB8><EFBFBD>**嚗?- **CPU 雿輻鍂<E8BCBB>?*嚗? 70%嚗㚁<E59A97>PDF <20>𣂼<EFBFBD><F0A382BC>?CPU 撖<><E69296><EFBFBD>衤遙<E8A1A4>?- **<2A><><EFBFBD>雿輻鍂<E8BCBB>?*嚗? 80%嚗㚁<E59A97>憭扳<E686AD>隞嗅<E99A9E><E59785><EFBFBD>𧒄隡𡁜<E99AA1><F0A1819C>刻<EFBFBD>憭𡁜<E686AD>摮?- 霂瑟<EFBFBD> QPS嚗<EFBFBD><EFBFBD>蝘埝䰻霂X㺭嚗㚁<EFBFBD>鈭<EFBFBD>圾韐蠘蝸<EFBFBD><EFBFBD><EFBFBD>
- 撟喳<EFBFBD><EFBFBD>滚<EFBFBD><EFBFBD>園𡢿嚗? 1000ms嚗㚁<E59A97>撠𤩺<E692A0>隞嗅<E99A9E> < 2s嚗<73>之<EFBFBD><E4B98B>辣 < 30s
- **<2A>躰秤<E8BAB0>?*嚗? 1%嚗㚁<E59A97><E39A81>烐綉<E78390><E7B689>辣閫<E8BEA3><E996AB>憭梯揖<E6A2AF>?
**<2A>扯<EFBFBD><E689AF>箏<EFBFBD>嚗<EFBFBD><E59A97><EFBFBD><EFBFBD><EFBFBD>**嚗?
撠𤩺<E692A0>隞塚<E99A9E>< 1MB PDF嚗㚁<E59A97><E39A81>滚<EFBFBD><E6BB9A>園𡢿 1-3s 銝剔<E98A9D><E58994><EFBFBD>辣嚗?-10MB PDF嚗㚁<E59A97><E39A81>滚<EFBFBD><E6BB9A>園𡢿 5-15s 憭扳<E686AD>隞塚<E99A9E>10-50MB PDF嚗㚁<E59A97><E39A81>滚<EFBFBD><E6BB9A>園𡢿 20-60s 頞<>之<EFBFBD><E4B98B>辣嚗? 50MB嚗㚁<E59A97>撱箄悅<E7AE84>𣂼<EFBFBD><F0A382BC>𡝗<EFBFBD>蝏?
2. 摰墧𧒄<E5A2A7>亙<EFBFBD><E4BA99>亦<EFBFBD>
SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>亙<EFBFBD> <20>?摰墧𧒄<E5A2A7>亙<EFBFBD>
**<2A>亙<EFBFBD>蝐餃<E89D90>**嚗?- 摨𠉛鍂<F0A0899B>亙<EFBFBD>嚗ìtdout/stderr嚗㚁<E59A97>uvicorn <20>臬𢆡靽⊥<E99DBD><E28AA5><EFBFBD>窈瘙<E7AA88>𠯫敹?- 霈輸䔮<E8BCB8>亙<EFBFBD>嚗𠃍TTP 霂瑟<E99C82>嚗㚁<E59A97>霂瑟<E99C82>頝臬<E9A09D><E887AC><EFBFBD><EFBFBD>摨娍𧒄<E5A88D>氬<EFBFBD><E6B0AC>𠶖<EFBFBD><F0A0B696><EFBFBD>
- <EFBFBD>躰秤<EFBFBD>亙<EFBFBD>嚗<EFBFBD><EFBFBD>撣詨<EFBFBD><EFBFBD><EFBFBD><EFBFBD>嚗䥪ython 撘<>虜霂行<E99C82>
**<2A>喲睸<E596B2>亙<EFBFBD>蝷箔<E89DB7>**嚗?```bash
<EFBFBD>?甇<>虜<EFBFBD>臬𢆡
INFO: Started server process [1] INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000
<EFBFBD>?甇<>虜霂瑟<E99C82>
INFO: Request: POST /extract/pdf INFO: File: test.pdf (1.2MB) INFO: Response: 200 (took 2.10s)
<EFBFBD>?<3F>躰秤<E8BAB0>亙<EFBFBD>嚗<EFBFBD><E59A97><EFBFBD>單釣嚗?ERROR: ImportError: libGL.so.1: cannot open shared object file
ERROR: Timeout: PDF extraction took > 300s ERROR: Memory error: Cannot allocate memory
#### 3. 撘寞<E69298>找撓蝻拚<E89DBB>蝵?
SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?撘寞<E69298>找撓蝻?```
<EFBFBD>刻<EFBFBD><EFBFBD>滨蔭嚗?``` <0A><>撠誩<E692A0>靘𧢲㺭: 1嚗<31>&靽脲<E99DBD><E884B2>∩<EFBFBD>銝剜鱏嚗?<3F><>憭批<E686AD>靘𧢲㺭: 3嚗<33>覔<EFBFBD>桀<EFBFBD><E6A180><EFBFBD><EFBFBD>頧質<E9A0A7><E8B3AA>湛<EFBFBD>
閫血<EFBFBD><EFBFBD>∩辣:
- CPU 雿輻鍂<E8BCBB>?> 70% <20><>賒 3 <20><><EFBFBD> <20>?<3F>拙捆 1 銝芸<E98A9D>靘? - CPU 雿輻鍂<E8BCBB>?< 30% <20><>賒 5 <20><><EFBFBD> <20>?蝻拙捆 1 銝芸<E98A9D>靘?```
**瘜冽<E7989C>鈭钅★**嚗?- PDF <20>𣂼<EFBFBD><F0A382BC>?CPU 撖<><E69296><EFBFBD>页<EFBFBD><E9A1B5>拙捆銝餉<E98A9D><E9A489>?CPU
- 憒<EFBFBD><EFBFBD>蝏誩虜<EFBFBD>拙捆嚗諹<EFBFBD><EFBFBD><EFBFBD><EFBFBD>湔𦻖憓𧼮<EFBFBD>摰硺<EFBFBD>閫<EFBFBD>聢嚗?<3F>?<3F>?4<>賂<EFBFBD>
- SAE 隡朞䌊<E69C9E>刻<EFBFBD>頧賢<E9A0A7>銵∴<E98AB5><E288B4>𣳇<EFBFBD><F0A3B387>见𢆡<E8A781>滨蔭
<EFBFBD><EFBFBD> 摨𠉛鍂<F0A0899B><E98D82><EFBFBD><EFBFBD>?
瘛餃<EFBFBD><EFBFBD>亙熒璉<EFBFBD><EFBFBD>亦垢<EFBFBD>?
# main.py
from fastapi import FastAPI
import psutil
import os
app = FastAPI()
@app.get("/health")
async def health_check():
"""<22>亙熒璉<E78692><E79289>亦垢<E4BAA6>?""
return {
"status": "healthy",
"service": "extraction-service",
"version": os.getenv("SERVICE_VERSION", "unknown")
}
@app.get("/metrics")
async def metrics():
"""<EFBFBD>扯<EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝡舐<EFBFBD>"""
cpu_percent = psutil.cpu_percent(interval=1)
memory = psutil.virtual_memory()
disk = psutil.disk_usage('/app')
return {
"cpu": {
"percent": cpu_percent,
"count": psutil.cpu_count()
},
"memory": {
"total": memory.total,
"available": memory.available,
"percent": memory.percent
},
"disk": {
"total": disk.total,
"used": disk.used,
"free": disk.free,
"percent": disk.percent
}
}
瘛餃<EFBFBD>霂瑟<EFBFBD><EFBFBD>亙<EFBFBD>
# main.py
import logging
from fastapi import Request
import time
# <20>滨蔭<E6BBA8>亙<EFBFBD>
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('/app/logs/extraction-service.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
@app.middleware("http")
async def log_requests(request: Request, call_next):
"""霂瑟<E99C82><E7919F>亙<EFBFBD>銝剝𡢿隞?""
start_time = time.time()
# 霈啣<E99C88>霂瑟<E99C82>
logger.info(f"Request: {request.method} {request.url}")
# <20>扯<EFBFBD>霂瑟<E99C82>
response = await call_next(request)
# 霈啣<E99C88><E595A3>滚<EFBFBD>
process_time = time.time() - start_time
logger.info(
f"Response: {response.status_code} "
f"(took {process_time:.2f}s)"
)
return response
<EFBFBD><EFBFBD> 摰𡁏<E691B0>蝏湔擪隞餃𦛚
瘥誩𪂹隞餃𦛚
# 1. 璉<><E79289>交𠯫敹堒之撠?du -sh /app/logs
# 2. <20>亦<EFBFBD><E4BAA6>躰秤<E8BAB0>亙<EFBFBD>
tail -n 100 /app/logs/extraction-service.log | grep ERROR
# 3. <20>滚鍳摨𠉛鍂嚗<E98D82><E59A97><EFBFBD>𨀣<EFBFBD><F0A880A3><EFBFBD><EFBFBD>瘜<EFBFBD><E7989C>嚗?# SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>滚鍳
瘥𤩺<EFBFBD>隞餃𦛚
# 1. <20>湔鰵 Python 靘肽<E99D98>
pip list --outdated
# 2. <20>滚遣<E6BB9A>𨅯<EFBFBD>嚗<EFBFBD><E59A97><EFBFBD>怠<EFBFBD><E680A0>冽凒<E586BD>堆<EFBFBD>
docker build -t extraction-service:v1.1 .
docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.1
# 3. <20>?SAE 銝剜凒<E5899C>圈<EFBFBD><E59C88>讐<EFBFBD><E8AE90>?```
---
## <20><><EFBFBD><EFBFBD>埝䰻
### <20>𤣳 撣貉<E692A3><E8B289>桅<EFBFBD>
#### <20>桅<EFBFBD> 1嚗𡁜捆<F0A1819C>典鍳<E585B8>典仃韐?
**<2A><>𠶖**嚗?```
SAE <20>曄內嚗𡁜<E59A97><F0A1819C>典鍳<E585B8>典仃韐?<3F>亙<EFBFBD><E4BA99>曄內嚗䥑mportError: libXXX.so: cannot open shared object file
**<2A>笔<EFBFBD>**嚗𡁶撩撠𤑳頂蝏煺<E89D8F>韏? **閫<><E996AB>**嚗?```dockerfile
<EFBFBD>?Dockerfile 銝剜溶<E5899C>删撩憭梁<E686AD>摨?RUN apt-get update && apt-get install -y \
libgl1-mesa-glx \ # OpenCV
libglib2.0-0 \ # OpenCV
libgomp1 \ # Polars
libmupdf-dev \ # PyMuPDF
&& rm -rf /var/lib/apt/lists/*
#### <20>桅<EFBFBD> 2嚗䥪DF <20>𣂼<EFBFBD>頞<EFBFBD>𧒄
**<2A><>𠶖**嚗?```
霂瑟<E99C82>頞<EFBFBD>𧒄嚗? 300蝘𡜐<E89D98>
<0A>亙<EFBFBD><E4BA99>曄內嚗関imeout error
<EFBFBD>埝䰻甇仿炊嚗?```bash
1. 璉<><E79289>交<EFBFBD>隞嗅之撠?# 憒<><E68692><EFBFBD><EFBFBD>辣 > 50MB嚗諹<E59A97><E8ABB9><EFBFBD><EFBFBD><EFBFBD><EFBFBD>憭<EFBFBD><E686AD>
2. 憓𧼮<E68693>頞<EFBFBD>𧒄<EFBFBD>園𡢿
SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂<F0A0899B>滨蔭 <20>?<3F>臬<EFBFBD><E887AC>㗛<EFBFBD>
TIMEOUT=600
3. 隡睃<E99AA1><E79D83>𣂼<EFBFBD><F0A382BC>餉<EFBFBD>
頝唾<EFBFBD><EFBFBD>曄<EFBFBD>憿萸<EFBFBD><EFBFBD><EFBFBD>蝻拙㦛<EFBFBD><EFBFBD><EFBFBD>
#### <20>桅<EFBFBD> 3嚗𡁜<E59A97>摮䀹滯<E480B9>綽<EFBFBD>OOM嚗?
**<2A><>𠶖**嚗?```
摰孵膥<E5ADB5>芸𢆡<E88AB8>滚鍳
<0A>亙<EFBFBD><E4BA99>曄內嚗鐗illed (signal 9)
**閫<><E996AB>**嚗?```bash
1. 憓𧼮<E68693><F0A7BCAE><EFBFBD><EFBFBD><EFBFBD>滨蔭
SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂<F0A0899B>滨蔭 <20>?閫<>聢
<EFBFBD><EFBFBD><EFBFBD>: 2GB <20>?4GB
2. 隡睃<E99AA1>隞<EFBFBD><E99A9E>嚗<EFBFBD><E59A97>撘誩<E69298><E8AAA9><EFBFBD><EFBFBD>
銝滩<EFBFBD>銝<EFBFBD>甈⊥<EFBFBD>批<EFBFBD>頧賣㟲銝芣<EFBFBD>隞嗅<EFBFBD><EFBFBD><EFBFBD><EFBFBD>
with open(pdf_path, 'rb') as f: # <20><><EFBFBD>憭<EFBFBD><E686AD> for chunk in read_in_chunks(f): process(chunk)
#### <20>桅<EFBFBD> 4嚗𡁜<E59A97>蝡舀<E89DA1>瘜閗<E7989C><E99697>?Python <20>滚𦛚嚗<F0A69B9A><E59A97>憸煾<E686B8>霂荔<E99C82>
**<2A><>𠶖**嚗?```
<0A>𡒊垢<F0A1928A>亙<EFBFBD>嚗鋴onnection refused
<0A>?ECONNREFUSED: connect ECONNREFUSED 172.17.x.x:8000
<0A>?Error: getaddrinfo ENOTFOUND extraction-service.internal
<EFBFBD>寞𧋦<EFBFBD>笔<EFBFBD><EFBFBD>埝䰻嚗? *<EFBFBD>笔<EFBFBD> 1嚗𡁜<E59A97>蝵穃𧑐<E7A983><F0A79190><EFBFBD>滨蔭<E6BBA8>躰秤嚗<E7A7A4><E59A97>撣貉<E692A3>嚗?
# <20>?<3F>躰秤<E8BAB0>滨蔭嚗<E894AD><E59A97>瘚讠<E7989A><E8AEA0>笔<EFBFBD>嚗?EXTRACTION_SERVICE_URL=http://extraction-service.internal:8000
# <20>?甇<>&<EFBFBD>滨蔭嚗𠄎AE <20>批<EFBFBD><E689B9>唳遬蝷箇<E89DB7><E7AE87>笔<EFBFBD><E7AC94>啣<EFBFBD>嚗?EXTRACTION_SERVICE_URL=http://172.17.x.x:8000
**閫<><E996AB><EFBFBD>寞<EFBFBD>**嚗?```bash
1. <20>瑕<EFBFBD><E79195>笔<EFBFBD><E7AC94><EFBFBD><EFBFBD><EFBFBD>啣<EFBFBD>
SAE <20>批<EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?摨𠉛鍂霂行<E99C82> <20>?摨𠉛鍂霈輸䔮<E8BCB8>滨蔭
憭滚<EFBFBD><EFBFBD>曄內<EFBFBD>?VPC <20><><EFBFBD>霈輸䔮<E8BCB8>啣<EFBFBD>"
2. <20>湔鰵<E6B994>𡒊垢<F0A1928A>臬<EFBFBD><E887AC>㗛<EFBFBD>
SAE <20>批<EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?摨𠉛鍂<F0A0899B>滨蔭 <20>?<3F>臬<EFBFBD><E887AC>㗛<EFBFBD>
EXTRACTION_SERVICE_URL=http://<<3C>笔<EFBFBD><E7AC94><EFBFBD><EFBFBD>IP>:8000
3. <20>滚鍳<E6BB9A>𡒊垢摨𠉛鍂
SAE <20>批<EFBFBD><E689B9>?<3F>?<3F>𡒊垢摨𠉛鍂 <20>?<3F>滚鍳
**<2A>笔<EFBFBD> 2嚗䥪ython <20>滚𦛚<E6BB9A>芸鍳<E88AB8>?*
```bash
# 璉<><E79289>?Python <20>滚𦛚<E6BB9A>嗆<EFBFBD>?# SAE <20>批<EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?摰硺<E691B0><E7A1BA>𡑒”
# 蝖株恕摰硺<E691B0><E7A1BA>嗆<EFBFBD><E59786>蛹"餈鞱<E9A488>銝?
# <20>亦<EFBFBD><E4BAA6>臬𢆡<E887AC>亙<EFBFBD>
# SAE <20>批<EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?<3F>亙<EFBFBD>
# 摨磰砲<E7A3B0>见<EFBFBD>嚗?# INFO: Application startup complete.
# INFO: Uvicorn running on http://0.0.0.0:8000
<EFBFBD>笔<EFBFBD> 3嚗𡁜<E59A97><F0A1819C>函<EFBFBD>閫<EFBFBD><E996AB><EFBFBD>𣂼<EFBFBD>
# SAE 暺䁅恕<E48185>?VPC <20><><EFBFBD><EFBFBD>典虾鈭垍㮾霈輸䔮
# 憒<><E68692>隞齿<E99A9E>瘜閗<E7989C><E99697>伐<EFBFBD>璉<EFBFBD><E79289>伐<EFBFBD>
# SAE <20>批<EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?蝵𤑳<E89DB5><F0A491B3>滨蔭 <20>?摰匧<E691B0>蝏?# 蝖株恕<E6A0AA>亦<EFBFBD>閫<EFBFBD><E996AB><EFBFBD><EFBFBD>捂 VPC <20><>挪<EFBFBD>?8000 蝡臬藁
**瘚贝<E7989A><E8B49D><EFBFBD><EFBFBD>餈鮋<E9A488>𡁏<EFBFBD>?*嚗?```bash
<EFBFBD>寞<EFBFBD> 1嚗𡁜銁 SAE <20>批<EFBFBD><E689B9>啁<EFBFBD>"Webshell"銝剜<E98A9D>霂𤏪<E99C82>憒<EFBFBD><E68692><EFBFBD>舀<EFBFBD>嚗?curl http://<Python<6F>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>IP>:8000/health
<EFBFBD>寞<EFBFBD> 2嚗𡁜銁<F0A1819C>𡒊垢摨𠉛鍂<F0A0899B><E98D82>鍳<EFBFBD>刻<EFBFBD><E588BB>砌葉瘛餃<E7989B>瘚贝<E7989A>
echo "Testing extraction service connectivity..." curl -f http://<Python<6F>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>IP>:8000/health || echo "<22>?Cannot connect to extraction service"
<EFBFBD>寞<EFBFBD> 3嚗帋蝙<E5B88B>?telnet 瘚贝<E7989A>蝡臬藁
telnet <Python<6F>滚𦛚<E6BB9A><F0A69B9A><EFBFBD>IP> 8000
---
## 瘜冽<E7989C>鈭钅★銝𡒊<E98A9D>敹?
### <20>?<3F><>雿喳<E99BBF>頝?
#### 1. **<2A>𨅯<EFBFBD>隡睃<E99AA1>**
```dockerfile
# <20>?雿輻鍂憭𡁻𧫴畾菜<E795BE>撱?FROM python:3.11-slim as builder
# ... <20><>遣 ...
FROM python:3.11-slim
COPY --from=builder /opt/venv /opt/venv
# <20>?皜<><E79A9C>蝻枏<E89DBB>
RUN apt-get update && apt-get install -y ... \
&& rm -rf /var/lib/apt/lists/*
# <20>?雿輻鍂 .dockerignore
# <20>踹<EFBFBD>撠<EFBFBD><E692A0>敹<EFBFBD><E695B9><EFBFBD><EFBFBD><EFBFBD>隞嗆<E99A9E><E59786><EFBFBD><EFBFBD><EFBFBD>𨅯<EFBFBD>
2. <EFBFBD><EFBFBD>𧋦蝞∠<EFBFBD>
# <20>?雿輻鍂霂凋<E99C82><E5878B>𣇉<EFBFBD><F0A38789>?v1.0.0 # 銝餌<E98A9D><E9A48C>?甈∠<E79488><E288A0>?銵乩<E98AB5><E4B9A9><EFBFBD>𧋦
# <20>?靽萘<E99DBD>憭帋葵<E5B88B><E891B5>𧋦
docker tag ... extraction-service:v1.0.0
docker tag ... extraction-service:v1.0
docker tag ... extraction-service:latest
# <20>?霈啣<E99C88><E595A3>䀹凒
# CHANGELOG.md
## v1.0.1 (2025-12-20)
- 靽桀<E99DBD>: PDF <20>𣂼<EFBFBD>頞<EFBFBD>𧒄<EFBFBD>桅<EFBFBD>
- 隡睃<E99AA1>: <20>誩<EFBFBD><E8AAA9>𨅯<EFBFBD>雿梶妖 30%
3. 摰匧<EFBFBD><EFBFBD>惩𤐄
# <20>?<3F><>辣憭批<E686AD><E689B9>𣂼<EFBFBD>
MAX_FILE_SIZE = 50 * 1024 * 1024 # 50MB
@app.post("/extract/pdf")
async def extract_pdf(file: UploadFile):
if file.size > MAX_FILE_SIZE:
raise HTTPException(
status_code=413,
detail="File too large"
)
# <20>?<3F><>辣蝐餃<E89D90>撉諹<E69289>
ALLOWED_TYPES = {'application/pdf', 'application/msword'}
if file.content_type not in ALLOWED_TYPES:
raise HTTPException(
status_code=415,
detail="Unsupported file type"
)
4. <EFBFBD>扯<EFBFBD>隡睃<EFBFBD>
# <20>?撘<>郊憭<E9838A><E686AD>憭扳<E686AD>隞?import asyncio
async def extract_large_pdf(pdf_path: str):
# 雿輻鍂撘<E98D82>郊 I/O
async with aiofiles.open(pdf_path, 'rb') as f:
content = await f.read()
# <20>函瑪蝔𧢲<E89D94>銝剜<E98A9D>銵?CPU 撖<><E69296><EFBFBD>衤遙<E8A1A4>? loop = asyncio.get_event_loop()
text = await loop.run_in_executor(None, pymupdf_extract, content)
return text
# <20>?餈墧𦻖瘙?from sqlalchemy.pool import NullPool
engine = create_engine(
DATABASE_URL,
poolclass=NullPool, # SAE <20>臬<EFBFBD><E887AC>刻<EFBFBD>
echo=False
)
<EFBFBD>?蝏嘥笆蝳<E7AC86>迫
1. 蝳<EFBFBD>迫<EFBFBD>𨀣<EFBFBD><EFBFBD>硋<EFBFBD>霈曉<EFBFBD>蝵穃𧑐<EFBFBD><EFBFBD>嚗<EFBFBD>稲<EFBFBD>賡<EFBFBD>霂荔<EFBFBD>
# <20>?<3F>躰秤<E8BAB0>𡁏<EFBFBD>嚗<EFBFBD><E59A97>撖潸稲餈墧𦻖憭梯揖嚗?EXTRACTION_SERVICE_URL=http://extraction-service.internal:8000
EXTRACTION_SERVICE_URL=http://localhost:8000
EXTRACTION_SERVICE_URL=http://extraction-service:8000
# <20>?甇<>&<EFBFBD>𡁏<EFBFBD>嚗帋<E59A97> SAE <20>批<EFBFBD><E689B9>啗繮<E59597>𣇉<EFBFBD>摰𧼮𧑐<F0A7BCAE><F0A79190>
# SAE <20>批<EFBFBD><E689B9>?<3F>?extraction-service 摨𠉛鍂 <20>?摨𠉛鍂霈輸䔮<E8BCB8>滨蔭
# 憭滚<E686AD><E6BB9A>曄內<E69B84>?VPC <20><><EFBFBD>霈輸䔮<E8BCB8>啣<EFBFBD>"
EXTRACTION_SERVICE_URL=http://172.17.x.x:8000
**<2A>笔<EFBFBD>**嚗?- SAE 摰硺<E691B0><E7A1BA>湔糓頝其蜓<E585B6>箇<EFBFBD>嚗䔶<E59A97><E494B6>賭蝙<E8B3AD>?Docker <20>滚𦛚<E6BB9A>?- SAE <20>?K8s Service <20>笔<EFBFBD><E7AC94>澆<EFBFBD><E6BE86>𣳇<EFBFBD>蝵株<E89DB5><E6A0AA><EFBFBD>嚗䔶<E59A97><E494B6>賢<EFBFBD>霈?- <20><>蝔喳戎<E596B3><E6888E>糓雿輻鍂 SAE <20>批<EFBFBD><E689B9>唳遬蝷箇<E89DB7> IP <20>啣<EFBFBD>
2. *蝳<EFBFBD>迫<EFBFBD>券<EFBFBD><EFBFBD>譍葉蝖祉<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>煺縑<EFBFBD>?
# <20>?<3F>躰秤蝷箔<E89DB7>
ENV DATABASE_PASSWORD=my-secret-password
# <20>?甇<>&<EFBFBD>𡁏<EFBFBD>嚗𡁜銁 SAE <20>臬<EFBFBD><E887AC>㗛<EFBFBD>銝剝<E98A9D>蝵?```
#### 3. **蝳<>迫雿輻鍂<E8BCBB>砍𧑐<E7A08D><F0A79190>辣<EFBFBD><E8BEA3><EFBFBD><EFBFBD>硋<EFBFBD><E7A18B>?*
```python
# <20>?<3F>躰秤蝷箔<E89DB7>嚗<EFBFBD>捆<EFBFBD>券<EFBFBD><E588B8>臬<EFBFBD>銝W仃嚗?output_path = '/app/output/result.txt'
with open(output_path, 'w') as f:
f.write(result)
# <20>?甇<>&<EFBFBD>𡁏<EFBFBD>嚗帋蝙<E5B88B>?/tmp 摮䀝葩<E4809D>嗆<EFBFBD>隞塚<E99A9E>蝏𤘪<E89D8F>銝𠹺<E98A9D><F0A0B9BA>?OSS
import tempfile
with tempfile.NamedTemporaryFile(mode='w', delete=False) as f:
f.write(result)
# 銝𠹺<E98A9D><F0A0B9BA>?OSS嚗<53>蝙<EFBFBD>?oss2 摨橒<E691A8>
# <20><><EFBFBD>𤾸<EFBFBD><F0A4BEB8>支葩<E694AF>嗆<EFBFBD>隞?```
#### 4. **蝳<>迫雿輻鍂 :latest <20><>倌<EFBFBD>函<EFBFBD>鈭抒㴓憓?*
```bash
# <20>?<3F>躰秤<E8BAB0>𡁏<EFBFBD>嚗<EFBFBD><E59A97>瘜訫<E7989C>皛𡄯<E79A9B>
image: extraction-service:latest
# <20>?甇<>&<EFBFBD>𡁏<EFBFBD>嚗<EFBFBD>祗銋匧<E98A8B><E58CA7><EFBFBD>𧋦嚗?image: extraction-service:v1.0.0
5. 蝳<EFBFBD>迫<EFBFBD>典捆<EFBFBD>典<EFBFBD>靽格㺿隞<EFBFBD><EFBFBD>
# <20>?<3F>躰秤<E8BAB0>滢<EFBFBD>嚗<EFBFBD>捆<EFBFBD>券<EFBFBD><E588B8>臬<EFBFBD>銝W仃嚗?# SAE Webshell <20>?vi /app/main.py
# <20>?甇<>&瘚<EFBC86><E7989A>嚗?# 1. <20>砍𧑐靽格㺿隞<E3BABF><E99A9E>
# 2. <20>滚遣<E6BB9A>𨅯<EFBFBD>
# 3. <20>券<EFBFBD><E588B8><EFBFBD> ACR
# 4. SAE 銝剜凒<E5899C>圈<EFBFBD><E59C88>讐<EFBFBD><E8AE90>?```
#### 6. **蝳<>迫雿輻鍂<E8BCBB>𣳇<EFBFBD>憓鮋鵭<E9AE8B><E9B5AD><EFBFBD>撅<EFBFBD><E69285>㗛<EFBFBD>**
```python
# <20>?<3F>躰秤蝷箔<E89DB7>嚗<EFBFBD><E59A97>摮䀹<E691AE>瞍𧶏<E79E8D>
CACHE = {} # <20>典<EFBFBD>蝻枏<E89DBB>嚗峕<E59A97><E5B395>𣂼<EFBFBD><F0A382BC>?
@app.post("/extract/pdf")
async def extract_pdf(file: UploadFile):
key = file.filename
if key not in CACHE:
CACHE[key] = extract(file) # <20><><EFBFBD>隡𡁏<E99AA1>蝏剖<E89D8F><E58996>選<EFBFBD>
return CACHE[key]
# <20>?甇<>&<EFBFBD>𡁏<EFBFBD>嚗帋蝙<E5B88B>冽<EFBFBD><E586BD>𣂼捆<F0A382BC>讐<EFBFBD>蝻枏<E89DBB>
from functools import lru_cache
@lru_cache(maxsize=100) # <20><>憭𡁶<E686AD>摮?100 銝芰<E98A9D><E88AB0>?def extract_with_cache(file_hash: str):
return extract(file_hash)
7. *蝳<EFBFBD>迫敹賜裦 /tmp <20>桀<EFBFBD><E6A180><EFBFBD>之撠誯<E692A0><E8AAAF>?
# <20>𩤃<EFBFBD> 瘜冽<E7989C>嚗锭AE 摰孵膥<E5ADB5>?/tmp <20>桀<EFBFBD><E6A180>𡁜虜<F0A1819C>匧之撠誯<E692A0><E8AAAF>塚<EFBFBD>憒?1-2GB嚗?# 憭<><E686AD>憭扳<E686AD>隞嗅<E99A9E>敹<EFBFBD>◆皜<E29786><E79A9C>銝湔𧒄<E6B994><F0A79284>辣
import os
import tempfile
async def extract_large_pdf(file: UploadFile):
# 靽嘥<E99DBD><E598A5>唬葩<E594AC>嗆<EFBFBD>隞? with tempfile.NamedTemporaryFile(delete=False, suffix='.pdf') as tmp:
content = await file.read()
tmp.write(content)
tmp_path = tmp.name
try:
# 憭<><E686AD><EFBFBD><EFBFBD>辣
result = extract_pdf_pymupdf(tmp_path)
return result
finally:
# <20>?<3F>喲睸嚗𡁜<E59A97>憿餅<E686BF><E9A485><EFBFBD>葩<EFBFBD>嗆<EFBFBD>隞? if os.path.exists(tmp_path):
os.unlink(tmp_path)
<EFBFBD><EFBFBD> <20><><EFBFBD>
A. 摰峕㟲<E5B395>?requirements.txt嚗<74>𧫴畾?嚗?
# Web 獢<>沲
fastapi==0.115.5
uvicorn[standard]==0.32.1
python-multipart==0.0.20
# <20><>﹝<EFBFBD>𣂼<EFBFBD>
PyMuPDF==1.24.14
mammoth==1.8.0
python-docx==1.1.2
# <20>唳旿憭<E697BF><E686AD>
polars==1.17.1
numpy==1.26.4
# 颲<>𨭌撌亙<E6928C>
langdetect==1.0.9
chardet==5.2.0
aiofiles==23.2.1
# <20>唳旿摨?sqlalchemy==2.0.25
asyncpg==0.29.0
# <20>輸<EFBFBD>鈭?OSS
oss2==2.18.3
# <20>亙<EFBFBD><E4BA99>𣬚<EFBFBD><F0A3AC9A>?python-json-logger==2.0.7
psutil==5.9.8
B. Dockerfile 摰峕㟲<E5B395>?
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>齿<EFBFBD> <EFBFBD><EFBFBD>遣 Docker <20>𨅯<EFBFBD> - 甇仿炊 1
C. <20>砍𧑐瘚贝<E7989A><E8B49D>𡁏𧋦
#!/bin/bash
# test-local.sh
echo "Building Docker image..."
docker build -t extraction-service:test .
echo "Starting container..."
docker run -d \
--name extraction-test \
-p 8000:8000 \
-e DATABASE_URL="postgresql://user:pass@host:5432/db" \
extraction-service:test
echo "Waiting for service to start..."
sleep 10
echo "Testing health endpoint..."
curl http://localhost:8000/health
echo "Testing PDF extraction..."
curl -X POST \
-F "file=@test.pdf" \
http://localhost:8000/extract/pdf
echo "Cleaning up..."
docker stop extraction-test
docker rm extraction-test
echo "Done!"
D. <20>詨<EFBFBD><E8A9A8><EFBFBD>﹝<EFBFBD>暹𦻖
<EFBFBD>㴓 敹恍<E695B9>笔<EFBFBD><E7AC94>?
撣貊鍂<EFBFBD>賭誘
# <20><>遣<EFBFBD>𨅯<EFBFBD>
docker build -t extraction-service:v1.0 .
# <20>券<EFBFBD><E588B8><EFBFBD><EFBFBD>?docker push registry.cn-beijing.aliyuncs.com/clinical-research/extraction-service:v1.0
# <20>亦<EFBFBD> SAE <20>亙<EFBFBD>
# SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>亙<EFBFBD>
# <20>滚鍳 SAE 摨𠉛鍂
# SAE <20>批<EFBFBD><E689B9>?<3F>?摨𠉛鍂霂行<E99C82> <20>?<3F>滚鍳
# 瘚贝<E7989A><E8B49D><EFBFBD><EFBFBD>餈鮋<E9A488>𡁏<EFBFBD>?curl http://extraction-service.internal:8000/health
# <20>亦<EFBFBD>摰孵膥韏<E886A5><E99F8F>
docker stats extraction-service
<EFBFBD>喲睸<EFBFBD>滨蔭
| <EFBFBD>滨蔭憿? | <EFBFBD>刻<EFBFBD><EFBFBD>? | 霂湔<EFBFBD> |
|---|---|---|
| CPU | 1<EFBFBD>? | <EFBFBD>嘥<EFBFBD><EFBFBD>滨蔭 |
| <EFBFBD><EFBFBD><EFBFBD> | 2GB | 銝滚鉄 Nougat |
| 摰硺<EFBFBD><EFBFBD>? | 1-3 | <EFBFBD>芸𢆡撘寞<EFBFBD>找撓蝻? |
| 頞<EFBFBD>𧒄<EFBFBD>園𡢿 | 300蝘? | 憭扳<EFBFBD>隞嗅<EFBFBD><EFBFBD>? |
| <EFBFBD>亙熒璉<EFBFBD><EFBFBD>? | 30蝘? | <EFBFBD>嘥<EFBFBD>撱嗉<EFBFBD> |
| Worker <20>圈<EFBFBD> | 2 | Uvicorn workers |
<EFBFBD><EFBFBD>﹝蝏湔擪嚗?- 憒<><E68692><EFBFBD>桅<EFBFBD><E6A185>硋遣霈殷<E99C88>霂瑁<E99C82>蝟餅<E89D9F><E9A485>航<EFBFBD>韐<EFBFBD>犖
- <EFBFBD><EFBFBD><EFBFBD>擧凒<EFBFBD>堆<EFBFBD>2025-12-13
- 銝𧢲活摰⊥䰻嚗?025-03-13