feat(aia): Complete AIA V2.0 with universal streaming capabilities
Major Changes: - Add StreamingService with OpenAI Compatible format - Upgrade Chat component V2 with Ant Design X integration - Implement AIA module with 12 intelligent agents - Update API routes to unified /api/v1 prefix - Update system documentation Backend (~1300 lines): - common/streaming: OpenAI Compatible adapter - modules/aia: 12 agents, conversation service, streaming integration - Update route versions (RVW, PKB to v1) Frontend (~3500 lines): - modules/aia: AgentHub + ChatWorkspace (100% prototype restoration) - shared/Chat: AIStreamChat, ThinkingBlock, useAIStream Hook - Update API endpoints to v1 Documentation: - AIA module status guide - Universal capabilities catalog - System overview updates - All module documentation sync Tested: Stream response verified, authentication working Status: AIA V2.0 core completed (85%)
This commit is contained in:
@@ -1,29 +1,29 @@
|
||||
# 鏂囨。澶勭悊寮曟搸
|
||||
|
||||
> **能力定位:** 通用能力层
|
||||
> **复用率:** 86% (6个模块依赖)
|
||||
> **鑳藉姏瀹氫綅锛?* 閫氱敤鑳藉姏灞?
|
||||
> **澶嶇敤鐜囷細** 86% (6涓<EFBFBD>ā鍧椾緷璧?
|
||||
> **浼樺厛绾э細** P0
|
||||
> **状态:** ✅ 已实现(Python微服务)
|
||||
> **鐘舵€侊細** 鉁?宸插疄鐜帮紙Python寰<6E>湇鍔★級
|
||||
|
||||
---
|
||||
|
||||
## 馃搵 鑳藉姏姒傝堪
|
||||
|
||||
鏂囨。澶勭悊寮曟搸鏄<EFBFBD>钩鍙扮殑鏍稿績鍩虹<EFBFBD>鑳藉姏锛岃礋璐o細
|
||||
- 多格式文档文本提取(PDF、Docx、Txt、Excel)
|
||||
- 澶氭牸寮忔枃妗f枃鏈<EFBFBD>彁鍙栵紙PDF銆丏ocx銆乀xt銆丒xcel锛?
|
||||
- OCR澶勭悊
|
||||
- 琛ㄦ牸鎻愬彇
|
||||
- 语言检测
|
||||
- 璇<EFBFBD>█妫€娴?
|
||||
- 璐ㄩ噺璇勪及
|
||||
|
||||
---
|
||||
|
||||
## 馃搳 渚濊禆妯″潡
|
||||
|
||||
**6个模块依赖(86%复用率):**
|
||||
1. **ASL** - AI智能文献(文献PDF提取)
|
||||
**6涓<EFBFBD>ā鍧椾緷璧栵紙86%澶嶇敤鐜囷級锛?*
|
||||
1. **ASL** - AI鏅鸿兘鏂囩尞锛堟枃鐚甈DF鎻愬彇锛?
|
||||
2. **PKB** - 涓<>汉鐭ヨ瘑搴擄紙鐭ヨ瘑搴撴枃妗d笂浼狅級
|
||||
3. **DC** - 数据清洗(Excel/Docx数据导入)
|
||||
3. **DC** - 鏁版嵁娓呮礂锛圗xcel/Docx鏁版嵁瀵煎叆锛?
|
||||
4. **SSA** - 鏅鸿兘缁熻<E7BC81>鍒嗘瀽锛堟暟鎹<E69A9F><E98EB9>鍏ワ級
|
||||
5. **ST** - 缁熻<E7BC81>鍒嗘瀽宸ュ叿锛堟暟鎹<E69A9F><E98EB9>鍏ワ級
|
||||
6. **RVW** - 绋夸欢瀹℃煡锛堢ǹ浠舵枃妗f彁鍙栵級
|
||||
@@ -35,36 +35,36 @@
|
||||
### 1. PDF鎻愬彇
|
||||
- **Nougat**锛氳嫳鏂囧<E98F82>鏈<EFBFBD><E98F88>鏂囷紙楂樿川閲忥級
|
||||
- **PyMuPDF**锛氫腑鏂嘝DF + 鍏滃簳鏂规<E98F82>锛堝揩閫燂級
|
||||
- **语言检测**:自动识别中英文
|
||||
- **质量评估**:提取质量评分
|
||||
- **璇<EFBFBD>█妫€娴?*锛氳嚜鍔ㄨ瘑鍒<E79891>腑鑻辨枃
|
||||
- **璐ㄩ噺璇勪及**锛氭彁鍙栬川閲忚瘎鍒?
|
||||
|
||||
### 2. Docx鎻愬彇
|
||||
- **Mammoth**锛氳浆Markdown
|
||||
- **python-docx**锛氱粨鏋勫寲璇诲彇
|
||||
|
||||
### 3. Txt鎻愬彇
|
||||
- **多编码支持**:UTF-8、GBK等
|
||||
- **chardet**:自动检测编码
|
||||
- **澶氱紪鐮佹敮鎸?*锛歎TF-8銆丟BK绛?
|
||||
- **chardet**锛氳嚜鍔ㄦ<EFBFBD>娴嬬紪鐮?
|
||||
|
||||
### 4. Excel澶勭悊
|
||||
- **openpyxl**锛氳<E9949B>鍙朎xcel
|
||||
- **pandas**:数据处理
|
||||
- **pandas**锛氭暟鎹<EFBFBD><EFBFBD>鐞?
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ 技术架构
|
||||
## 馃彈锔?鎶€鏈<E282AC>灦鏋?
|
||||
|
||||
**Python寰<6E>湇鍔★紙FastAPI锛夛細**
|
||||
```
|
||||
extraction_service/
|
||||
├── main.py (509行) - FastAPI主服务
|
||||
鈹溾攢鈹€ main.py (509琛? - FastAPI涓绘湇鍔?
|
||||
鈹溾攢鈹€ services/
|
||||
│ ├── pdf_extractor.py (242行) - PDF提取总协调
|
||||
│ ├── pdf_processor.py (280行) - PyMuPDF实现
|
||||
│ ├── language_detector.py (120行) - 语言检测
|
||||
│ ├── nougat_extractor.py (242行) - Nougat实现
|
||||
│ ├── docx_extractor.py (253行) - Docx提取
|
||||
│ └── txt_extractor.py (316行) - Txt提取(多编码)
|
||||
鈹? 鈹溾攢鈹€ pdf_extractor.py (242琛? - PDF鎻愬彇鎬诲崗璋?
|
||||
鈹? 鈹溾攢鈹€ pdf_processor.py (280琛? - PyMuPDF瀹炵幇
|
||||
鈹? 鈹溾攢鈹€ language_detector.py (120琛? - 璇<EFBFBD>█妫€娴?
|
||||
鈹? 鈹溾攢鈹€ nougat_extractor.py (242琛? - Nougat瀹炵幇
|
||||
鈹? 鈹溾攢鈹€ docx_extractor.py (253琛? - Docx鎻愬彇
|
||||
鈹? 鈹斺攢鈹€ txt_extractor.py (316琛? - Txt鎻愬彇锛堝<EFBFBD>缂栫爜锛?
|
||||
鈹斺攢鈹€ requirements.txt
|
||||
```
|
||||
|
||||
@@ -77,7 +77,7 @@ POST /api/extract/pdf - PDF文本提取
|
||||
POST /api/extract/docx - Docx鏂囨湰鎻愬彇
|
||||
POST /api/extract/txt - Txt鏂囨湰鎻愬彇
|
||||
POST /api/extract/excel - Excel琛ㄦ牸鎻愬彇
|
||||
GET /health - 健康检查
|
||||
GET /health - 鍋ュ悍妫€鏌?
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user