docs: complete documentation system (250+ files)
- System architecture and design documentation - Business module docs (ASL/AIA/PKB/RVW/DC/SSA/ST) - ASL module complete design (quality assurance, tech selection) - Platform layer and common capabilities docs - Development standards and API specifications - Deployment and operations guides - Project management and milestone tracking - Architecture implementation reports - Documentation templates and guides
This commit is contained in:
88
docs/02-通用能力层/04-数据ETL引擎/README.md
Normal file
88
docs/02-通用能力层/04-数据ETL引擎/README.md
Normal file
@@ -0,0 +1,88 @@
|
||||
# 数据ETL引擎
|
||||
|
||||
> **能力定位:** 通用能力层
|
||||
> **复用率:** 29% (2个模块依赖)
|
||||
> **优先级:** P2
|
||||
> **状态:** ⏳ 待实现
|
||||
|
||||
---
|
||||
|
||||
## 📋 能力概述
|
||||
|
||||
数据ETL引擎负责:
|
||||
- Excel多表JOIN
|
||||
- 数据清洗
|
||||
- 数据转换
|
||||
- 数据验证
|
||||
|
||||
---
|
||||
|
||||
## 📊 依赖模块
|
||||
|
||||
**2个模块依赖(29%复用率):**
|
||||
1. **DC** - 数据清洗整理(核心依赖)
|
||||
2. **SSA** - 智能统计分析(数据预处理)
|
||||
|
||||
---
|
||||
|
||||
## 💡 核心功能
|
||||
|
||||
### 1. Excel多表处理
|
||||
- 读取多个Excel文件
|
||||
- 自动JOIN操作
|
||||
- GROUP BY聚合
|
||||
|
||||
### 2. 数据清洗
|
||||
- 缺失值处理
|
||||
- 重复值处理
|
||||
- 异常值检测
|
||||
|
||||
### 3. 数据转换
|
||||
- 类型转换
|
||||
- 格式标准化
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ 技术方案
|
||||
|
||||
### 云端版(最优)
|
||||
```python
|
||||
# 基于Polars(性能极高)
|
||||
class ETLEngine:
|
||||
def read_excel(self, files: List[File]) -> List[DataFrame]
|
||||
def join(self, dfs: List[DataFrame], keys: List[str]) -> DataFrame
|
||||
def clean(self, df: DataFrame, rules: Dict) -> DataFrame
|
||||
def export(self, df: DataFrame, format: str) -> bytes
|
||||
```
|
||||
|
||||
### 单机版(兼容)
|
||||
```python
|
||||
# 基于SQLite(内存友好)
|
||||
# 分块读取,数据库引擎处理JOIN
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔗 相关文档
|
||||
|
||||
- [通用能力层总览](../README.md)
|
||||
- [DC模块需求](../../03-业务模块/DC-数据清洗整理/README.md)
|
||||
|
||||
---
|
||||
|
||||
**最后更新:** 2025-11-06
|
||||
**维护人:** 技术架构师
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user