Files
AIclinicalresearch/docs/02-通用能力层/02-文档处理引擎/README.md
HaHafeng 1b53ab9d52 feat(aia): Complete AIA V2.0 with universal streaming capabilities
Major Changes:
- Add StreamingService with OpenAI Compatible format
- Upgrade Chat component V2 with Ant Design X integration
- Implement AIA module with 12 intelligent agents
- Update API routes to unified /api/v1 prefix
- Update system documentation

Backend (~1300 lines):
- common/streaming: OpenAI Compatible adapter
- modules/aia: 12 agents, conversation service, streaming integration
- Update route versions (RVW, PKB to v1)

Frontend (~3500 lines):
- modules/aia: AgentHub + ChatWorkspace (100% prototype restoration)
- shared/Chat: AIStreamChat, ThinkingBlock, useAIStream Hook
- Update API endpoints to v1

Documentation:
- AIA module status guide
- Universal capabilities catalog
- System overview updates
- All module documentation sync

Tested: Stream response verified, authentication working
Status: AIA V2.0 core completed (85%)
2026-01-14 19:15:01 +08:00

120 lines
2.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 鏂囨。澶勭悊寮曟搸
> **鑳藉姏瀹氫綅锛?* 閫氱敤鑳藉姏灞?
> **澶嶇敤鐜囷細** 86% (6涓<36>ā鍧椾緷璧?
> **浼樺厛绾э細** P0
> **鐘舵€侊細** 鉁?宸插疄鐜帮紙Python寰<6E>湇鍔★級
---
## 馃搵 鑳藉姏姒傝堪
鏂囨。澶勭悊寮曟搸鏄<EFBFBD>钩鍙扮殑鏍稿績鍩虹<EFBFBD>鑳藉姏锛岃礋璐
- 澶氭牸寮忔枃妗枃鏈<E69E83>彁鍙栵紙PDF銆丏ocx銆乀xt銆丒xcel锛?
- OCR澶勭悊
- 琛ㄦ牸鎻愬彇
-<>█妫€娴?
- 璐ㄩ噺璇勪及
---
## 馃搳 渚濊禆妯″潡
**6涓<36>ā鍧椾緷璧栵紙86%澶嶇敤鐜囷級锛?*
1. **ASL** - AI鏅鸿兘鏂囩尞锛堟枃鐚甈DF鎻愬彇锛?
2. **PKB** - 涓<>汉鐭ヨ瘑搴擄紙鐭ヨ瘑搴撴枃妗笂浼狅級
3. **DC** - 鏁版嵁娓呮礂锛圗xcel/Docx鏁版嵁瀵煎叆锛?
4. **SSA** - 鏅鸿兘缁熻<E7BC81>鍒嗘瀽锛堟暟鎹<E69A9F><E98EB9>鍏ワ級
5. **ST** - 缁熻<E7BC81>鍒嗘瀽宸ュ叿锛堟暟鎹<E69A9F><E98EB9>鍏ワ級
6. **RVW** - 绋夸欢瀹℃煡锛堢ǹ浠舵枃妗f彁鍙栵級
---
## 馃挕 鏍稿績鍔熻兘
### 1. PDF鎻愬彇
- **Nougat**锛氳嫳鏂囧<E98F82><EFBFBD><E98F88>鏂囷紙楂樿川閲忥級
- **PyMuPDF**锛氫腑鏂嘝DF + 鍏滃簳鏂规<E98F82>锛堝揩閫燂級
- **璇<>█妫€娴?*锛氳嚜鍔ㄨ瘑鍒<E79891>腑鑻辨枃
- **璐ㄩ噺璇勪及**锛氭彁鍙栬川閲忚瘎鍒?
### 2. Docx鎻愬彇
- **Mammoth**锛氳浆Markdown
- **python-docx**锛氱粨鏋勫寲璇诲彇
### 3. Txt鎻愬彇
- **澶氱紪鐮佹敮鎸?*锛歎TF-8銆丟BK绛?
- **chardet**锛氳嚜鍔ㄦ<E98D94>娴嬬紪鐮?
### 4. Excel澶勭悊
- **openpyxl**锛氳<E9949B>鍙朎xcel
- **pandas**锛氭暟鎹<E69A9F><E98EB9>鐞?
---
## 馃彈锔?鎶€鏈<E282AC>灦鏋?
**Python寰<6E>湇鍔★紙FastAPI锛夛細**
```
extraction_service/
鈹溾攢鈹€ main.py (509琛? - FastAPI涓绘湇鍔?
鈹溾攢鈹€ services/
鈹? 鈹溾攢鈹€ pdf_extractor.py (242琛? - PDF鎻愬彇鎬诲崗璋?
鈹? 鈹溾攢鈹€ pdf_processor.py (280琛? - PyMuPDF瀹炵幇
鈹? 鈹溾攢鈹€ language_detector.py (120琛? - 璇<>█妫€娴?
鈹? 鈹溾攢鈹€ nougat_extractor.py (242琛? - Nougat瀹炵幇
鈹? 鈹溾攢鈹€ docx_extractor.py (253琛? - Docx鎻愬彇
鈹? 鈹斺攢鈹€ txt_extractor.py (316琛? - Txt鎻愬彇锛堝<E9949B>缂栫爜锛?
鈹斺攢鈹€ requirements.txt
```
---
## 馃摎 API绔<49>
```
POST /api/extract/pdf - PDF鏂囨湰鎻愬彇
POST /api/extract/docx - Docx鏂囨湰鎻愬彇
POST /api/extract/txt - Txt鏂囨湰鎻愬彇
POST /api/extract/excel - Excel琛ㄦ牸鎻愬彇
GET /health - 鍋ュ悍妫€鏌?
```
---
## 馃敆 鐩稿叧鏂囨。
- [閫氱敤鑳藉姏灞傛€昏<EFBFBD>](../README.md)
- [Python寰<6E>湇鍔唬鐮乚(../../../extraction_service/)
---
**鏈€鍚庢洿鏂帮細** 2025-11-06
**缁存姢浜猴細** 鎶€鏈<E282AC>灦鏋勫笀