feat(admin): Add user management and upgrade to module permission system

Features - User Management (Phase 4.1):
- Database: Add user_modules table for fine-grained module permissions
- Database: Add 4 user permissions (view/create/edit/delete) to role_permissions
- Backend: UserService (780 lines) - CRUD with tenant isolation
- Backend: UserController + UserRoutes (648 lines) - 13 API endpoints
- Backend: Batch import users from Excel
- Frontend: UserListPage (412 lines) - list/filter/search/pagination
- Frontend: UserFormPage (341 lines) - create/edit with module config
- Frontend: UserDetailPage (393 lines) - details/tenant/module management
- Frontend: 3 modal components (592 lines) - import/assign/configure
- API: GET/POST/PUT/DELETE /api/admin/users/* endpoints

Architecture Upgrade - Module Permission System:
- Backend: Add getUserModules() method in auth.service
- Backend: Login API returns modules array in user object
- Frontend: AuthContext adds hasModule() method
- Frontend: Navigation filters modules based on user.modules
- Frontend: RouteGuard checks requiredModule instead of requiredVersion
- Frontend: Remove deprecated version-based permission system
- UX: Only show accessible modules in navigation (clean UI)
- UX: Smart redirect after login (avoid 403 for regular users)

Fixes:
- Fix UTF-8 encoding corruption in ~100 docs files
- Fix pageSize type conversion in userService (String to Number)
- Fix authUser undefined error in TopNavigation
- Fix login redirect logic with role-based access check
- Update Git commit guidelines v1.2 with UTF-8 safety rules

Database Changes:
- CREATE TABLE user_modules (user_id, tenant_id, module_code, is_enabled)
- ADD UNIQUE CONSTRAINT (user_id, tenant_id, module_code)
- INSERT 4 permissions + role assignments
- UPDATE PUBLIC tenant with 8 module subscriptions

Technical:
- Backend: 5 new files (~2400 lines)
- Frontend: 10 new files (~2500 lines)
- Docs: 1 development record + 2 status updates + 1 guideline update
- Total: ~4900 lines of code

Status: User management 100% complete, module permission system operational
This commit is contained in:
2026-01-16 13:42:10 +08:00
parent 98d862dbd4
commit 66255368b7
560 changed files with 70424 additions and 52353 deletions

View File

@@ -1,48 +1,48 @@
# 撌亙<EFBFBD>C - AI Copilot Few-shot蝷箔<EFBFBD>摨?
# 工具C - AI Copilot Few-shot示例库
> **文档版本**: V1.0
> **创建日期**: 2025-12-06
> **<EFBFBD><EFBFBD>?*: System Prompt銝剔<EFBFBD>Few-shot蝷箔<EFBFBD>
> **<EFBFBD><EFBFBD><EFBFBD>箸艶**: 隞𤾸抅蝖<E68A85><EFBFBD><E79A9C><EFBFBD><EFBFBD>蝥扳<E89DA5>銵伐<E98AB5>10銝芣瓲敹<E793B2><EFBFBD>?
> **用途**: System Prompt中的Few-shot示例
> **覆盖场景**: 从基础清洗到高级插补10个核心场景
---
## 📋 示例总览
| 蝻硋噡 | <20>箸艶<E7AEB8>滨妍 | 蝥批<E89DA5> | <20><><EFBFBD><EFBFBD><E888AA>?| <20><EFBFBD>隞瑕<E99A9E>?|
| 编号 | 场景名称 | 级别 | 技术要点 | 医疗价值 |
|------|---------|------|---------|---------|
| 1 | 蝏煺<EFBFBD>蝻箏仃<EFBFBD><EFBFBD>霈?| Level 1 | replace | <EFBFBD>唳旿<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?潃鐥<E6BD83>潃?|
| 2 | <EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD> | Level 1 | <EFBFBD><EFBFBD>+蝐餃<E89D90>頧祆揢 | 璉<><EFBFBD><E69289><EFBFBD><E6BE86>?潃鐥<E6BD83>潃鐥<E6BD83> |
| 3 | <EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝻𣇉<EFBFBD> | Level 2 | map | 蝏蠘恣撱箸芋 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
| 1 | 统一缺失值标记 | Level 1 | replace | 数据标准化 ⭐⭐⭐ |
| 2 | 数值列清洗 | Level 1 | 正则+类型转换 | 检验值处理 ⭐⭐⭐⭐ |
| 3 | 分类变量编码 | Level 2 | map | 统计建模 ⭐⭐⭐⭐⭐ |
| 4 | 连续变量分箱 | Level 2 | cut | 分层分析 ⭐⭐⭐⭐ |
| 5 | BMI霈∠<EFBFBD>銝𤾸<EFBFBD>蝐?| Level 3 | <EFBFBD><EFBFBD>+<2B>∩辣 | 銝游<E98A9D><E6B8B8><EFBFBD><EFBFBD> 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
| 6 | <EFBFBD><EFBFBD>霈∠<EFBFBD> | Level 3 | datetime | <EFBFBD>園𡢿<EFBFBD><EFBFBD> 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
| 7 | <EFBFBD>∩辣蝑偦<EFBFBD>?| Level 3 | 憭𡁏辺隞嗉<EFBFBD>皛?| <20><EFBFBD><E4BAA6><EFBFBD><EFBFBD> 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
| 8 | <EFBFBD><EFBFBD>閧撩憭勗<EFBFBD>銵?| Level 4 | fillna | 蝻箏仃憭<EFBFBD><EFBFBD> 潃鐥<E6BD83>潃鐥<E6BD83> |
| 9 | 憭𡁻<EFBFBD><EFBFBD>(MICE) | Level 4 | IterativeImputer | 擃条漣憛怨‘ 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
| 5 | BMI计算与分类 | Level 3 | 公式+条件 | 临床指标 ⭐⭐⭐⭐⭐ |
| 6 | 日期计算 | Level 3 | datetime | 时间间隔 ⭐⭐⭐⭐⭐ |
| 7 | 条件筛选 | Level 3 | 多条件过滤 | 入组标准 ⭐⭐⭐⭐⭐ |
| 8 | 简单缺失值填补 | Level 4 | fillna | 缺失处理 ⭐⭐⭐⭐ |
| 9 | 多重插补(MICE) | Level 4 | IterativeImputer | 高级填补 ⭐⭐⭐⭐⭐ |
| 10 | 智能去重 | Level 4 | sort+drop_duplicates | 数据质量 ⭐⭐⭐⭐ |
---
## <EFBFBD> Level 1: <EFBFBD><EFBFBD><EFBFBD>唳旿皜<EFBFBD><EFBFBD>嚗?銝迎<E98A9D>
## 🎯 Level 1: 基础数据清洗2个
### 蝷箔<EFBFBD>1: 蝏煺<E89D8F>蝻箏仃<E7AE8F><EFBFBD>霈?
### 示例1: 统一缺失值标记
**用户指令**:
```
<EFBFBD>𦠜<EFBFBD><EFBFBD>劐誨銵函撩憭梁<EFBFBD>蝚血噡嚗?<3F><><EFBFBD>霂艾<E99C82><E889BE>A<EFBFBD><41>/A嚗厩<E59A97><EFBFBD><E98A9D>踵揢銝箸<E98A9D><E7AEB8><EFBFBD><EFBFBD>?
把所有代表缺失的符号(-、不详、NA、N/A统一替换为标准空值
```
**AI生成代码**:
```python
# 蝏煺<EFBFBD>蝻箏仃<EFBFBD><EFBFBD>霈?
# 统一缺失值标记
df = df.replace(['-', '不详', 'NA', 'N/A', '\\', '未查'], np.nan)
```
**代码说明**:
- <EFBFBD><EFBFBD><EFBFBD>唳旿撣貉<EFBFBD>憭𡁶<EFBFBD>蝻箏仃<EFBFBD>潸”蝷箸䲮撘?
- 医疗数据常见多种缺失值表示方式
- 统一为标准NaN便于后续统计分析
- <EFBFBD><EFBFBD><EFBFBD>箸艶: <20>唳旿皜<E697BF><E79A9C>蝚砌<E89D9A>甇?
- 适用场景: 数据清洗第一步
**预期结果**:
- 所有非标准缺失值标记被统一为NaN
@@ -54,30 +54,30 @@ df = df.replace(['-', '不详', 'NA', 'N/A', '\\', '未查'], np.nan)
**用户指令**:
```
<EFBFBD>𡃏<EFBFBD><EFBFBD>𣂼<EFBFBD><EFBFBD>𣬚<EFBFBD><EFBFBD>墧㺭摮㛖泵<EFBFBD>瑕縧<EFBFBD><EFBFBD><0.1<EFBFBD>?.05<EFBFBD><EFBFBD>嚗諹蓮銝箸㺭<EFBFBD>潛掩<EFBFBD>?
把肌酐列里的非数字符号去掉,<0.1按0.05处理,转为数值类型
```
**AI生成代码**:
```python
# <EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
# 处理数值列的符号
df['creatinine'] = df['creatinine'].astype(str).str.replace('>', '').str.replace('<', '')
df.loc[df['creatinine'] == '0.1', 'creatinine'] = '0.05'
df['creatinine'] = pd.to_numeric(df['creatinine'], errors='coerce')
```
**代码说明**:
- <EFBFBD>撉𣬚<EFBFBD>撖澆枂<EFBFBD>唳旿撣詨鉄蝚血噡嚗?<3F>?<3F>?蝑㚁<E89D91>
- 检验科导出数据常含符号(>、<、+等)
- 需清理后才能进行数值计算和统计分析
- errors='coerce'将无法转换的值设为NaN
**预期结果**:
- <EFBFBD><EFBFBD><EFBFBD><EFBFBD>銝箇滲<EFBFBD><EFBFBD>潛掩<EFBFBD>?
- 蝚血噡鋡急迤蝖桀<EFBFBD><EFBFBD>?
- 肌酐列变为纯数值类型
- 符号被正确处理
- 异常值转为NaN
**医疗场景**:
- 检验报告:肌酐、白细胞、血红蛋白等
- 撣貉<EFBFBD>蝚血噡嚗?100<EFBFBD>?0.1<EFBFBD>?2.5+
- 常见符号:>100、<0.1、12.5+
---
@@ -87,28 +87,28 @@ df['creatinine'] = pd.to_numeric(df['creatinine'], errors='coerce')
**用户指令**:
```
<EFBFBD>𦠜<EFBFBD><EFBFBD><EFBFBD>𡑒蓮銝箸㺭摮梹<EFBFBD><EFBFBD>?1嚗<31>=0
把性别列转为数字,男=1=0
```
**AI生成代码**:
```python
# 性别编码
df['gender_code'] = df['gender'].map({'<EFBFBD>?: 1, '?: 0})
df['gender_code'] = df['gender'].map({'': 1, '': 0})
```
**代码说明**:
- <EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝐餃<EFBFBD><EFBFBD>讛蓮銝箸㺭<EFBFBD>?
- 将文本分类变量转为数值
- 便于Logistic回归、神经网络等建模
- map<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
- map方法简洁高效
**预期结果**:
- <EFBFBD><EFBFBD>gender_code<EFBFBD>?
- 新增gender_code
- 男→1, 女→0
- <EFBFBD>ender<EFBFBD>𦯀<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
- 原gender列保持不变
**扩展场景**:
- 憍𡁜宏<EFBFBD><EFBFBD>? 撌脣<E6928C>=1, <EFBFBD><EFBFBD>=0
- <EFBFBD><EFBFBD><EFBFBD>? <20>?1, <EFBFBD>?0
- 婚姻状态: 已婚=1, 未婚=0
- 吸烟史: 是=1, 否=0
- 教育程度: 小学=1, 初中=2, 高中=3, 大学=4
---
@@ -117,7 +117,7 @@ df['gender_code'] = df['gender'].map({'
**用户指令**:
```
<EFBFBD>𠰴僑樴<EFBFBD><EFBFBD>18撗<EFBFBD><EFBFBD>?0撗<30><E69297>銝箸𧊋<E7AEB8>𣂼僑<F0A382BC><E58391><EFBFBD>撟氬<E6929F><E6B0AC><EFBFBD><EFBFBD>僑銝厩<E98A9D>
把年龄按18岁、60岁分为未成年、成年、老年三组
```
**AI生成代码**:
@@ -125,35 +125,35 @@ df['gender_code'] = df['gender'].map({'
# 年龄分组
df['age_group'] = pd.cut(df['age'],
bins=[0, 18, 60, 120],
labels=['<EFBFBD><EFBFBD>撟?, '<EFBFBD>𣂼僑', '<EFBFBD><EFBFBD>'],
labels=['未成年', '成年', '老年'],
right=False)
```
**代码说明**:
- <EFBFBD><EFBFBD>蝏剖<EFBFBD><EFBFBD>讐氖<EFBFBD><EFBFBD><EFBFBD>銝箸<EFBFBD>摨誩<EFBFBD>蝐餃<EFBFBD><EFBFBD>?
- 将连续变量离散化为有序分类变量
- bins定义区间边界labels定义标签
- right=False表示左闭右开区间[0,18)
**预期结果**:
- <EFBFBD><EFBFBD>age_group<EFBFBD>?
- 0-17<EFBFBD><EFBFBD><EFBFBD><EFBFBD>撟?
- 新增age_group
- 0-17岁→未成年
- 18-59岁→成年
- 60+岁→老年
**医疗应用**:
- <EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>撟湧<EFBFBD>畾萄<EFBFBD><EFBFBD><EFBFBD><EFBFBD>嚗?
- <EFBFBD>⊥䲮璉<EFBFBD>撉䕘<EFBFBD>撟湧<EFBFBD><EFBFBD>s<EFBFBD><EFBFBD>嚗?
- 分层分析(各年龄段发病率)
- 卡方检验年龄组vs疾病
- 描述性统计(基线特征表)
---
## 🏥 Level 3: 医学计算与筛选3个
### 蝷箔<EFBFBD>5: BMI霈∠<EFBFBD>銝𤾸<EFBFBD>蝐?
### 示例5: BMI计算与分类
**用户指令**:
```
<EFBFBD>寞旿頨恍<EFBFBD>(cm)<29><EFBFBD><E494B6>?kg)霈∠<E99C88>BMI嚗<49><EFBFBD><E5838E>扇BMI<4D>?8銝箄<E98A9D><E7AE84>?
根据身高(cm)和体重(kg)计算BMI并标记BMI≥28为肥胖
```
**AI生成代码**:
@@ -167,23 +167,23 @@ df['obesity'] = df['BMI'].apply(lambda x: '肥胖' if x >= 28 else '正常')
**代码说明**:
- BMI公式: 体重(kg) / 身高(m)²
- 銝剖𤙴<EFBFBD><EFBFBD><EFBFBD>: BMI<4D>?8銝箄<E98A9D><E7AE84><EFBFBD>
- 中国标准: BMI≥28为肥胀
- 向量化计算,无需循环
**预期结果**:
- 新增BMI列数值
- <EFBFBD><EFBFBD>obesity<EFBFBD><EFBFBD><EFBFBD><EFBFBD>掩嚗?
- 新增obesity列(分类)
**临床标准**:
- 偏瘦: BMI < 18.5
- <EFBFBD>: 18.5 <EFBFBD>?BMI < 24
- <EFBFBD><EFBFBD>: 24 <EFBFBD>?BMI < 28
- <EFBFBD><EFBFBD>: BMI <EFBFBD>?28
- 正常: 18.5 BMI < 24
- 超重: 24 BMI < 28
- 肥胖: BMI 28
**扩展场景**:
- 体表面积(BSA): 化疗剂量计算
- eGFR: <EFBFBD><EFBFBD><EFBFBD><EFBFBD>隡?
- APACHE<EFBFBD><EFBFBD>: <20><EFBFBD><E6A2A2><EFBFBD><EFBFBD>隡?
- eGFR: 肾功能评估
- APACHE评分: 危重症评估
---
@@ -191,7 +191,7 @@ df['obesity'] = df['BMI'].apply(lambda x: '肥胖' if x >= 28 else '正常')
**用户指令**:
```
<EFBFBD>寞旿<EFBFBD>仿堺<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>𠯫<EFBFBD>蠘恣蝞𦯀<EFBFBD><EFBFBD><EFBFBD>?
根据入院日期和出院日期计算住院天数
```
**AI生成代码**:
@@ -208,13 +208,13 @@ df['length_of_stay'] = (df['discharge_date'] - df['admission_date']).dt.days
- .dt.days提取天数
**预期结果**:
- <EFBFBD><EFBFBD>length_of_stay<EFBFBD>?
- 新增length_of_stay
- 数值类型,单位为天
**医疗应用**:
- 住院天数分析
- 撟喳<EFBFBD>雿誯堺<EFBFBD>?ALOS)蝏蠘恣
- 雿誯堺韐寧鍂<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
- 平均住院日(ALOS)统计
- 住院费用相关性分析
**扩展场景**:
- 计算年龄: (今天 - 出生日期) / 365.25
@@ -223,34 +223,34 @@ df['length_of_stay'] = (df['discharge_date'] - df['admission_date']).dt.days
---
### 蝷箔<EFBFBD>7: <20>∩辣蝑偦<E89D91><EFBFBD><E39A81><EFBFBD><E4BAA6><EFBFBD><EFBFBD>嚗?
### 示例7: 条件筛选(入组标准)
**用户指令**:
```
蝑偦<EFBFBD>匧枂撟湧<EFBFBD><EFBFBD>?8撗<38><E69297><EFBFBD><EFBFBD><EFBFBD>凋蛹蝟硋倏<E7A18B><E5808F><EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝟砽竉7.0<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
筛选出年龄≥18岁、诊断为糖尿病、且血糖≥7.0的患者
```
**AI生成代码**:
```python
# 憭𡁏辺隞嗥<EFBFBD><EFBFBD>?
# 多条件筛选
df_selected = df[
(df['age'] >= 18) &
(df['diagnosis'] == '蝟硋倏<EFBFBD>?) &
(df['diagnosis'] == '糖尿病') &
(df['glucose'] >= 7.0)
]
```
**代码说明**:
- <EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>∩辣<EFBFBD>?餈墧𦻖
- 瘥譍葵<EFBFBD>∩辣<EFBFBD><EFBFBD><EFBFBD>䭾𡠺<EFBFBD>?
- 餈𥪜<EFBFBD>皛∟雲<EFBFBD><EFBFBD><EFBFBD>㗇辺隞嗥<EFBFBD>銵?
- 布尔索引,多条件用&连接
- 每个条件需加括号
- 返回满足所有条件的行
**预期结果**:
- 新DataFrame包含符合条件的病例
- 原df不变
**临床研究应用**:
- 蝥喳<EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝑偦<EFBFBD>?
- 纳入标准筛选
- 排除标准剔除
- 亚组分析(如老年糖尿病患者)
@@ -260,7 +260,7 @@ df_selected = df[
included = df[
(df['age'] >= 18) &
(df['age'] <= 75) &
(df['diagnosis'].isin(['<EFBFBD><EFBFBD><EFBFBD>?, '<EFBFBD><EFBFBD><EFBFBD>?])) &
(df['diagnosis'].isin(['肺腺癌', '肺鳞癌'])) &
(df['stage'].isin(['III', 'IV']))
]
@@ -274,43 +274,43 @@ excluded = df[
---
## <EFBFBD>𥈡 Level 4: 擃条漣<EFBFBD>唳旿瘝餌<EFBFBD>嚗?銝迎<E98A9D>
## 🔬 Level 4: 高级数据治理3个
### 蝷箔<EFBFBD>8: 蝞<><E89D9E>閧撩憭勗<E686AD>銵?
### 示例8: 简单缺失值填补
**用户指令**:
```
<EFBFBD>其葉雿齿㺭憛怨BMI<EFBFBD><EFBFBD>蝻箏仃<EFBFBD>?
用中位数填补BMI列的缺失值
```
**AI生成代码**:
```python
# <EFBFBD>其葉雿齿㺭憛怨蝻箏仃<EFBFBD>?
# 用中位数填补缺失值
bmi_median = df['BMI'].median()
df['BMI'] = df['BMI'].fillna(bmi_median)
```
**代码说明**:
- median()霈∠<EFBFBD>銝凋<EFBFBD><EFBFBD><EFBFBD><EFBFBD>㘾膄NaN嚗?
- fillna()憛怨蝻箏仃<EFBFBD>?
- median()计算中位数(排除NaN
- fillna()填补缺失值
- 中位数受离群值影响小
**预期结果**:
- BMI<EFBFBD><EFBFBD>蝻箏仃<EFBFBD>?
- BMI列无缺失值
- 缺失位置被中位数替代
**填补方法选择**:
| 方法 | 适用场景 | 优点 | 缺点 |
|------|---------|------|------|
| <EFBFBD><EFBFBD><EFBFBD>?| 甇<><E79487><EFBFBD><EFBFBD>撣?| 蝞<><E89D9E>?| <20>㛖氖蝢文<E89DA2>澆蔣<E6BE86>?|
| 銝凋<EFBFBD><EFBFBD>?| <20>𤩺<EFBFBD><F0A4A9BA><EFBFBD>撣?| 蝔喳<E89D94> | 靽⊥<E99DBD><E28AA5>笔仃 |
| 隡埈㺭 | <20><><EFBFBD><EFBFBD> | 靽萘<E99DBD><E89098><EFBFBD><EFBFBD> | <20><EFBFBD>銝滚<E98A9D><E6BB9A>?|
| 均值 | 正态分布 | 简单 | 受离群值影响 |
| 中位数 | 偏态分布 | 稳健 | 信息损失 |
| 众数 | 分类变量 | 保留分布 | 可能不合理 |
| 前向填充 | 时间序列 | 保留趋势 | 仅适用时序 |
**注意事项**:
- ⚠️ 仅适用于缺失率<5%
- ⚠️ 假设缺失为MCAR完全随机缺失
- <EFBFBD>𩤃<EFBFBD> <20><EFBFBD>雿𦒘摯<F0A69298><E691AF><EFBFBD>撌?
- ⚠️ 可能低估标准差
---
@@ -318,7 +318,7 @@ df['BMI'] = df['BMI'].fillna(bmi_median)
**用户指令**:
```
雿輻鍂憭𡁻<EFBFBD><EFBFBD>瘜訫笆BMI<EFBFBD><EFBFBD>僑樴<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>𣂼<EFBFBD><EFBFBD><EFBFBD>撩憭勗<EFBFBD><EFBFBD><EFBFBD>銵?
使用多重插补法对BMI、年龄、肌酐列的缺失值进行填补
```
**AI生成代码**:
@@ -336,42 +336,42 @@ df[cols] = imputer.fit_transform(df[cols])
**代码说明**:
- MICE (Multivariate Imputation by Chained Equations)
- <EFBFBD>拍鍂<EFBFBD><EFBFBD><EFBFBD>渡㮾<EFBFBD><EFBFBD><EFBFBD>瘚讠撩憭勗<EFBFBD>?
- max_iter=10: <EFBFBD><EFBFBD>憭朞翮隞?0甈?
- random_state=0: <EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
- 利用变量间相关性预测缺失值
- max_iter=10: 最多迭代10次
- random_state=0: 可复现结果
**算法原理**:
1. 初始填补(如均值)
2. 敺芰㴓餈凋誨嚗?
- 撖寞<EFBFBD>銝芣<EFBFBD>蝻箏仃<EFBFBD><EFBFBD><EFBFBD><EFBFBD>𧶏<EFBFBD><EFBFBD><EFBFBD>隞硋<EFBFBD><EFBFBD><EFBFBD>瘚?
- <EFBFBD>湔鰵憛怨<EFBFBD>?
3. <EFBFBD><EFBFBD><EFBFBD>𤾸<EFBFBD>甇?
2. 循环迭代:
- 对每个有缺失的变量,用其他变量预测
- 更新填补值
3. 收敛后停止
**适用场景**:
- <EFBFBD>?蝻箏仃<E7AE8F>?%-30%
- <EFBFBD>?蝻箏仃<E7AE8F><EFBFBD>銝撤AR嚗<52><E59A97><EFBFBD>箇撩憭梧<E686AD>
- <EFBFBD>?<3F><EFBFBD><E3979B><EFBFBD><E6B8B8>函㮾<E587BD><EFBFBD>?
- <EFBFBD>?<3F><><EFBFBD><E996AC><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>鸌敺?
- ✅ 缺失率5%-30%
- ✅ 缺失机制为MAR随机缺失
- ✅ 变量间存在相关性
- ✅ 需要保持数据分布特征
**优势**:
- <EFBFBD>拍鍂<EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝟?
- 利用变量间关系
- 保持数据分布
- 减少偏差
- 蝏蠘恣摮虫<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
- 统计学上更合理
**vs <EFBFBD><EFBFBD>銵?*:
| <EFBFBD><EFBFBD><EFBFBD> | 蝞<><E89D9E>銵?| 憭𡁻<E686AD><F0A181BB> |
**vs 简单填补**:
| 指标 | 简单填补 | 多重插补 |
|------|---------|---------|
| 憭齿<EFBFBD>摨?| 雿?| 銝剔<E98A9D> |
| 霈∠<EFBFBD><EFBFBD>園𡢿 | 敹?| 颲<><E9A2B2> |
| 复杂度 | 低 | 中等 |
| 计算时间 | 快 | 较慢 |
| 偏差 | 可能较大 | 较小 |
| <EFBFBD><EFBFBD><EFBFBD>霂?| 雿𦒘摯 | <20><> |
| <EFBFBD><EFBFBD><EFBFBD>靽脲<EFBFBD> | 撌?| 憟?|
| 标准误 | 低估 | 准确 |
| 分布保持 | 差 | 好 |
**注意事项**:
- <EFBFBD><EFBFBD><EFBFBD>鍂鈭擧㺭<EFBFBD><EFBFBD><EFBFBD>?
- <EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
- <EFBFBD><EFBFBD><EFBFBD>園𡢿<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
- 仅适用于数值变量
- 分类变量需先编码
- 如有时间变量需先转换
---
@@ -384,7 +384,7 @@ df[cols] = imputer.fit_transform(df[cols])
**AI生成代码**:
```python
# <EFBFBD><EFBFBD><EFBFBD><EFBFBD>嚗𡁜<EFBFBD><EFBFBD>㗇𠯫<EFBFBD><EFBFBD><EFBFBD>摨𧶏<EFBFBD><EFBFBD>齿<EFBFBD>ID<EFBFBD><EFBFBD>靽萘<EFBFBD><EFBFBD><EFBFBD><EFBFBD>𦒘<EFBFBD><EFBFBD>?
# 智能去重先按日期排序再按ID去重保留最后一条
df['check_date'] = pd.to_datetime(df['check_date'])
df = df.sort_values('check_date').drop_duplicates(subset=['patient_id'], keep='last')
```
@@ -395,19 +395,19 @@ df = df.sort_values('check_date').drop_duplicates(subset=['patient_id'], keep='l
- keep='last'保留最后一条(即最新日期)
**预期结果**:
- 瘥譍葵<EFBFBD><EFBFBD><EFBFBD><EFBFBD>蘨靽萘<EFBFBD><EFBFBD><EFBFBD>∟扇敶?
- 每个患者只保留一条记录
- 保留的是检查日期最新的那条
**扩展场景**:
**<EFBFBD>箸艶1: 靽萘<E99DBD><E89098>唳旿<E594B3><E697BF>摰峕㟲<E5B395><E39FB2>扇敶?*
**场景1: 保留数据最完整的记录**
```python
# 计算每行的完整度
df['completeness'] = df.notna().sum(axis=1)
df = df.sort_values('completeness', ascending=False).drop_duplicates(subset=['patient_id'], keep='first')
```
**<EFBFBD>箸艶2: 憭𡁜<E686AD>畾萇<E795BE><E89087><EFBFBD><EFBFBD>?*
**场景2: 多字段组合去重**
```python
# 按患者ID+就诊日期去重
df = df.drop_duplicates(subset=['patient_id', 'visit_date'], keep='first')
@@ -415,13 +415,13 @@ df = df.drop_duplicates(subset=['patient_id', 'visit_date'], keep='first')
**场景3: 复杂逻辑去重**
```python
# 隡睃<EFBFBD>蝥改<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?> 摰峕㟲摨行<E691A8>擃?
# 优先级:日期最新 > 完整度最高
df = df.sort_values(['check_date', 'completeness'], ascending=[False, False]).drop_duplicates(subset=['patient_id'], keep='first')
```
**医疗场景**:
- <EFBFBD>𣳇膄<EFBFBD><EFBFBD>敶訫<EFBFBD><EFBFBD><EFBFBD><EFBFBD>靘?
- 憭𡁏活撠梯<EFBFBD><EFBFBD><EFBFBD>甈?<3F>急活
- 删除重复录入的病例
- 多次就诊取首次/末次
- 检验结果去重(取最新)
---
@@ -432,44 +432,44 @@ df = df.sort_values(['check_date', 'completeness'], ascending=[False, False]).dr
```python
system_prompt = f"""
雿䭾糓<EFBFBD><EFBFBD>蝘𤑳<EFBFBD><EFBFBD>唳旿皜<EFBFBD><EFBFBD>銝枏振嚗諹<EFBFBD><EFBFBD><EFBFBD><EFBFBD>辥andas隞<EFBFBD><EFBFBD><EFBFBD><EFBFBD>瘣埈㟲<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
你是医疗科研数据清洗专家负责生成Pandas代码来清洗整理数据。
## 敶枏<EFBFBD><EFBFBD>唳旿<EFBFBD><EFBFBD><EFBFBD>?
- <EFBFBD><EFBFBD><EFBFBD>? {session.fileName}
## 当前数据集信息
- 文件名: {session.fileName}
- 行数: {session.totalRows}
- 列数: {session.totalCols}
- 列名: {', '.join(session.columns)}
## 安全规则(强制)
1. 只能操作df变量
2. <EFBFBD>迫撖澆<EFBFBD>os<EFBFBD><EFBFBD>ys蝑匧暒<EFBFBD>拇芋<EFBFBD>?
3. <EFBFBD>迫雿輻鍂eval<EFBFBD><EFBFBD>xec蝑匧暒<EFBFBD>拙遆<EFBFBD>?
2. 禁止导入os、sys等危险模块
3. 禁止使用eval、exec等危险函数
4. 必须进行异常处理
5. 返回格式: {{"code": "...", "explanation": "..."}}
## Few-shot示例
### 蝷箔<EFBFBD>1: 蝏煺<E89D8F>蝻箏仃<E7AE8F><EFBFBD>霈?
<EFBFBD><EFBFBD>: <20>𦠜<EFBFBD><F0A6A09C>劐誨銵函撩憭梁<E686AD>蝚血噡蝏煺<E89D8F><E785BA>踵揢銝箸<E98A9D><E7AEB8><EFBFBD><EFBFBD>?
### 示例1: 统一缺失值标记
用户: 把所有代表缺失的符号统一替换为标准空值
代码:
```python
df = df.replace(['-', '不详', 'NA', 'N/A'], np.nan)
```
### 示例2: 数值列清洗
<EFBFBD><EFBFBD>: <20>𡃏<EFBFBD><F0A1838F>𣂼<EFBFBD><F0A382BC>𣬚<EFBFBD><F0A3AC9A>墧㺭摮㛖泵<E39B96>瑕縧<E79195><EFBFBD>頧砌蛹<E7A08C><EFBFBD>潛掩<E6BD9B>?
用户: 把肌酐列里的非数字符号去掉,转为数值类型
代码:
```python
df['creatinine'] = df['creatinine'].astype(str).str.replace('>', '').str.replace('<', '')
df['creatinine'] = pd.to_numeric(df['creatinine'], errors='coerce')
```
[... <EFBFBD><EFBFBD>8銝芰內靘?...]
[... 其他8个示例 ...]
## 用户当前请求
{user_message}
霂瑞<EFBFBD><EFBFBD>𣂷誨<EFBFBD><EFBFBD>僎閫<EFBFBD><EFBFBD><EFBFBD>?
请生成代码并解释。
"""
```
@@ -477,56 +477,55 @@ df['creatinine'] = pd.to_numeric(df['creatinine'], errors='coerce')
## 🎯 质量标准
瘥譍葵蝷箔<EFBFBD><EFBFBD>◆皛∟雲嚗?
- <EFBFBD>?隞<><E99A9E><EFBFBD>舐凒<E88890><EFBFBD>銵?
- <EFBFBD>?<3F>㕑祕蝏<E7A595><EFBFBD>?
- <EFBFBD>?<3F><EFBFBD>蝖桃<E89D96>颲枏<E9A2B2>颲枏枂
- <EFBFBD>?蝚血<E89D9A>Python<EFBFBD><EFBFBD>雿喳<EFBFBD>頝?
- <EFBFBD>?<3F><><EFBFBD><EFBFBD><EFBFBD><E8999C><EFBFBD>
- <EFBFBD>?<3F>匧龫<E58CA7>堒㦤<E5A092>航秩<E888AA>?
每个示例必须满足:
- ✅ 代码可直接运行
- ✅ 有详细注释
- ✅ 有明确的输入输出
- ✅ 符合Python最佳实践
- ✅ 考虑异常情况
- ✅ 有医疗场景说明
---
## 📊 测试用例设计
<EFBFBD><EFBFBD>餈?0銝芰內靘页<E99D98>Day 3瘚贝<E7989A>摨𥪜<E691A8><F0A5AA9C><EFBFBD>
基于这10个示例Day 3测试应包含
**<EFBFBD><EFBFBD>瘚贝<EFBFBD>嚗?銝迎<E98A9D>**:
1. 蝷箔<EFBFBD>1瘚贝<EFBFBD><EFBFBD>撩憭勗<EFBFBD><EFBFBD><EFBFBD>嚗?
**基础测试4个**:
1. 示例1测试缺失值统一
2. 示例2测试数值清洗
3. 蝷箔<EFBFBD>3瘚贝<EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝻𣇉<EFBFBD>嚗?
3. 示例3测试性别编码
4. 示例4测试年龄分组
**銝剔漣瘚贝<EFBFBD>嚗?銝迎<E98A9D>**:
5. 蝷箔<EFBFBD>5瘚贝<EFBFBD>嚗㇂MI霈∠<EFBFBD>嚗?
**中级测试3个**:
5. 示例5测试BMI计算
6. 示例6测试住院天数
7. 示例7测试条件筛选
**擃条漣瘚贝<EFBFBD>嚗?銝迎<E98A9D>**:
8. 蝷箔<EFBFBD>8瘚贝<EFBFBD><EFBFBD>葉雿齿㺭憛怨嚗?
9. 蝷箔<EFBFBD>9瘚贝<EFBFBD><EFBFBD><EFBFBD><EFBFBD>齿<EFBFBD>銵伐<EFBFBD>潃?
**高级测试3个**:
8. 示例8测试中位数填补
9. 示例9测试多重插补
10. 示例10测试智能去重
**<EFBFBD><EFBFBD>瘚贝<EFBFBD>嚗?銝迎<E98A9D>**:
**扩展测试5个**:
11. 混合场景测试(先清洗再计算)
12. 错误场景测试(列不存在)
13. 边界场景测试(全部缺失)
14. <EFBFBD><EFBFBD>靽格迤瘚贝<EFBFBD><EFBFBD><EFBFBD><EFBFBD>𥁒<EFBFBD><EFBFBD><EFBFBD><EFBFBD>嚗?
14. 自我修正测试(代码报错后重试)
15. 端到端测试上传→AI处理→结果验证
---
## 🔄 维护记录
| <EFBFBD><EFBFBD> | <20><>𧋦 | 靽格㺿<E6A0BC><E3BABF>捆 | 靽格㺿鈭?|
| 日期 | 版本 | 修改内容 | 修改人 |
|------|------|---------|--------|
| 2025-12-06 | V1.0 | <EFBFBD><EFBFBD><EFBFBD>𥕦遣嚗?0銝芣瓲敹<E793B2>內靘?| AI Assistant |
| 2025-12-06 | V1.0 | 初始创建10个核心示例 | AI Assistant |
---
**<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?*: <20>?撌脩&霈?
**銝衤<EFBFBD>甇?*: 撘<>憪𠵿ay 3撘<33><E69298>𡢅<EFBFBD>AICodeService摰䂿緵嚗?
**文档状态**: ✅ 已确认
**下一步**: 开始Day 3开发AICodeService实现)