Files

HaHafeng 2e8699c217 feat(asl): Week 2 Day 2 - Excel import with template download and intelligent dedup

Features:
- feat: Excel template generation and download (with examples)
- feat: Excel file parsing in memory (cloud-native, no disk write)
- feat: Field validation (title + abstract required)
- feat: Smart deduplication (DOI priority + Title fallback)
- feat: Literature preview table with statistics
- feat: Complete submission flow (create project + import literatures)

Components:
- feat: Create excelUtils.ts with full Excel processing toolkit
- feat: Enhance TitleScreeningSettings page with upload/preview/submit
- feat: Update API interface signatures and export unified aslApi object

Dependencies:
- chore: Add xlsx library for Excel file processing

Ref: Week 2 Frontend Development - Day 2
Scope: ASL Module MVP - Title Abstract Screening
Cloud-Native: Memory parsing, no file persistence

2025-11-19 10:24:47 +08:00

1.4 KiB

Raw Blame History

数据ETL引擎

能力定位： 通用能力层
复用率： 29% (2个模块依赖)
优先级： P2
状态： ⏳ 待实现

📋 能力概述

数据ETL引擎负责：

Excel多表JOIN
数据清洗
数据转换
数据验证

📊 依赖模块

2个模块依赖（29%复用率）：

DC - 数据清洗整理（核心依赖）
SSA - 智能统计分析（数据预处理）

💡 核心功能

1. Excel多表处理

读取多个Excel文件
自动JOIN操作
GROUP BY聚合

2. 数据清洗

缺失值处理
重复值处理
异常值检测

3. 数据转换

类型转换
格式标准化

🏗️ 技术方案

云端版（最优）

# 基于Polars（性能极高）
class ETLEngine:
    def read_excel(self, files: List[File]) -> List[DataFrame]
    def join(self, dfs: List[DataFrame], keys: List[str]) -> DataFrame
    def clean(self, df: DataFrame, rules: Dict) -> DataFrame
    def export(self, df: DataFrame, format: str) -> bytes

单机版（兼容）

# 基于SQLite（内存友好）
# 分块读取，数据库引擎处理JOIN

🔗 相关文档

最后更新： 2025-11-06
维护人： 技术架构师

1.4 KiB Raw Blame History Unescape Escape

数据ETL引擎

📋 能力概述

📊 依赖模块

💡 核心功能

1. Excel多表处理

2. 数据清洗

3. 数据转换

🏗️ 技术方案

云端版（最优）

单机版（兼容）

🔗 相关文档

1.4 KiB

Raw Blame History