feat(admin): Add user management and upgrade to module permission system
Features - User Management (Phase 4.1): - Database: Add user_modules table for fine-grained module permissions - Database: Add 4 user permissions (view/create/edit/delete) to role_permissions - Backend: UserService (780 lines) - CRUD with tenant isolation - Backend: UserController + UserRoutes (648 lines) - 13 API endpoints - Backend: Batch import users from Excel - Frontend: UserListPage (412 lines) - list/filter/search/pagination - Frontend: UserFormPage (341 lines) - create/edit with module config - Frontend: UserDetailPage (393 lines) - details/tenant/module management - Frontend: 3 modal components (592 lines) - import/assign/configure - API: GET/POST/PUT/DELETE /api/admin/users/* endpoints Architecture Upgrade - Module Permission System: - Backend: Add getUserModules() method in auth.service - Backend: Login API returns modules array in user object - Frontend: AuthContext adds hasModule() method - Frontend: Navigation filters modules based on user.modules - Frontend: RouteGuard checks requiredModule instead of requiredVersion - Frontend: Remove deprecated version-based permission system - UX: Only show accessible modules in navigation (clean UI) - UX: Smart redirect after login (avoid 403 for regular users) Fixes: - Fix UTF-8 encoding corruption in ~100 docs files - Fix pageSize type conversion in userService (String to Number) - Fix authUser undefined error in TopNavigation - Fix login redirect logic with role-based access check - Update Git commit guidelines v1.2 with UTF-8 safety rules Database Changes: - CREATE TABLE user_modules (user_id, tenant_id, module_code, is_enabled) - ADD UNIQUE CONSTRAINT (user_id, tenant_id, module_code) - INSERT 4 permissions + role assignments - UPDATE PUBLIC tenant with 8 module subscriptions Technical: - Backend: 5 new files (~2400 lines) - Frontend: 10 new files (~2500 lines) - Docs: 1 development record + 2 status updates + 1 guideline update - Total: ~4900 lines of code Status: User management 100% complete, module permission system operational
This commit is contained in:
@@ -1,48 +1,48 @@
|
||||
# 撌亙<EFBFBD>C - AI Copilot Few-shot蝷箔<EFBFBD>摨?
|
||||
# 工具C - AI Copilot Few-shot示例库
|
||||
|
||||
> **文档版本**: V1.0
|
||||
> **创建日期**: 2025-12-06
|
||||
> **<EFBFBD>券<EFBFBD>?*: System Prompt銝剔<EFBFBD>Few-shot蝷箔<EFBFBD>
|
||||
> **閬<EFBFBD><EFBFBD><EFBFBD>箸艶**: 隞𤾸抅蝖<E68A85>皜<EFBFBD><E79A9C><EFBFBD>圈<EFBFBD>蝥扳<E89DA5>銵伐<E98AB5>10銝芣瓲敹<E793B2>㦤<EFBFBD>?
|
||||
> **用途**: System Prompt中的Few-shot示例
|
||||
> **覆盖场景**: 从基础清洗到高级插补,10个核心场景
|
||||
|
||||
---
|
||||
|
||||
## 📋 示例总览
|
||||
|
||||
| 蝻硋噡 | <20>箸艶<E7AEB8>滨妍 | 蝥批<E89DA5> | <20><><EFBFBD>航<EFBFBD><E888AA>?| <20>餌<EFBFBD>隞瑕<E99A9E>?|
|
||||
| 编号 | 场景名称 | 级别 | 技术要点 | 医疗价值 |
|
||||
|------|---------|------|---------|---------|
|
||||
| 1 | 蝏煺<EFBFBD>蝻箏仃<EFBFBD>潭<EFBFBD>霈?| Level 1 | replace | <EFBFBD>唳旿<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?潃鐥<E6BD83>潃?|
|
||||
| 2 | <EFBFBD>啣<EFBFBD>澆<EFBFBD>皜<EFBFBD><EFBFBD> | Level 1 | 甇<EFBFBD><EFBFBD>+蝐餃<E89D90>頧祆揢 | 璉<>撉<EFBFBD><E69289>澆<EFBFBD><E6BE86>?潃鐥<E6BD83>潃鐥<E6BD83> |
|
||||
| 3 | <EFBFBD><EFBFBD>掩<EFBFBD>㗛<EFBFBD>蝻𣇉<EFBFBD> | Level 2 | map | 蝏蠘恣撱箸芋 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
|
||||
| 1 | 统一缺失值标记 | Level 1 | replace | 数据标准化 ⭐⭐⭐ |
|
||||
| 2 | 数值列清洗 | Level 1 | 正则+类型转换 | 检验值处理 ⭐⭐⭐⭐ |
|
||||
| 3 | 分类变量编码 | Level 2 | map | 统计建模 ⭐⭐⭐⭐⭐ |
|
||||
| 4 | 连续变量分箱 | Level 2 | cut | 分层分析 ⭐⭐⭐⭐ |
|
||||
| 5 | BMI霈∠<EFBFBD>銝𤾸<EFBFBD>蝐?| Level 3 | <EFBFBD>砍<EFBFBD>+<2B>∩辣 | 銝游<E98A9D><E6B8B8><EFBFBD><EFBFBD> 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
|
||||
| 6 | <EFBFBD>交<EFBFBD>霈∠<EFBFBD> | Level 3 | datetime | <EFBFBD>園𡢿<EFBFBD>湧<EFBFBD> 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
|
||||
| 7 | <EFBFBD>∩辣蝑偦<EFBFBD>?| Level 3 | 憭𡁏辺隞嗉<EFBFBD>皛?| <20>亦<EFBFBD><E4BAA6><EFBFBD><EFBFBD> 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
|
||||
| 8 | 蝞<EFBFBD><EFBFBD>閧撩憭勗<EFBFBD>澆‵銵?| Level 4 | fillna | 蝻箏仃憭<EFBFBD><EFBFBD> 潃鐥<E6BD83>潃鐥<E6BD83> |
|
||||
| 9 | 憭𡁻<EFBFBD><EFBFBD>坿‘(MICE) | Level 4 | IterativeImputer | 擃条漣憛怨‘ 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
|
||||
| 5 | BMI计算与分类 | Level 3 | 公式+条件 | 临床指标 ⭐⭐⭐⭐⭐ |
|
||||
| 6 | 日期计算 | Level 3 | datetime | 时间间隔 ⭐⭐⭐⭐⭐ |
|
||||
| 7 | 条件筛选 | Level 3 | 多条件过滤 | 入组标准 ⭐⭐⭐⭐⭐ |
|
||||
| 8 | 简单缺失值填补 | Level 4 | fillna | 缺失处理 ⭐⭐⭐⭐ |
|
||||
| 9 | 多重插补(MICE) | Level 4 | IterativeImputer | 高级填补 ⭐⭐⭐⭐⭐ |
|
||||
| 10 | 智能去重 | Level 4 | sort+drop_duplicates | 数据质量 ⭐⭐⭐⭐ |
|
||||
|
||||
---
|
||||
|
||||
## <EFBFBD>㴓 Level 1: <EFBFBD>箇<EFBFBD><EFBFBD>唳旿皜<EFBFBD><EFBFBD>嚗?銝迎<E98A9D>
|
||||
## 🎯 Level 1: 基础数据清洗(2个)
|
||||
|
||||
### 蝷箔<EFBFBD>1: 蝏煺<E89D8F>蝻箏仃<E7AE8F>潭<EFBFBD>霈?
|
||||
### 示例1: 统一缺失值标记
|
||||
|
||||
**用户指令**:
|
||||
```
|
||||
<EFBFBD>𦠜<EFBFBD><EFBFBD>劐誨銵函撩憭梁<EFBFBD>蝚血噡嚗?<3F><><EFBFBD>霂艾<E99C82><E889BE>A<EFBFBD><41>/A嚗厩<E59A97>銝<EFBFBD><E98A9D>踵揢銝箸<E98A9D><E7AEB8><EFBFBD>征<EFBFBD>?
|
||||
把所有代表缺失的符号(-、不详、NA、N/A)统一替换为标准空值
|
||||
```
|
||||
|
||||
**AI生成代码**:
|
||||
```python
|
||||
# 蝏煺<EFBFBD>蝻箏仃<EFBFBD>潭<EFBFBD>霈?
|
||||
# 统一缺失值标记
|
||||
df = df.replace(['-', '不详', 'NA', 'N/A', '\\', '未查'], np.nan)
|
||||
```
|
||||
|
||||
**代码说明**:
|
||||
- <EFBFBD>餌<EFBFBD><EFBFBD>唳旿撣貉<EFBFBD>憭𡁶<EFBFBD>蝻箏仃<EFBFBD>潸”蝷箸䲮撘?
|
||||
- 医疗数据常见多种缺失值表示方式
|
||||
- 统一为标准NaN便于后续统计分析
|
||||
- <EFBFBD><EFBFBD>鍂<EFBFBD>箸艶: <20>唳旿皜<E697BF><E79A9C>蝚砌<E89D9A>甇?
|
||||
- 适用场景: 数据清洗第一步
|
||||
|
||||
**预期结果**:
|
||||
- 所有非标准缺失值标记被统一为NaN
|
||||
@@ -54,30 +54,30 @@ df = df.replace(['-', '不详', 'NA', 'N/A', '\\', '未查'], np.nan)
|
||||
|
||||
**用户指令**:
|
||||
```
|
||||
<EFBFBD>𡃏<EFBFBD><EFBFBD>𣂼<EFBFBD><EFBFBD>𣬚<EFBFBD><EFBFBD>墧㺭摮㛖泵<EFBFBD>瑕縧<EFBFBD>㚁<EFBFBD><0.1<EFBFBD>?.05憭<EFBFBD><EFBFBD>嚗諹蓮銝箸㺭<EFBFBD>潛掩<EFBFBD>?
|
||||
把肌酐列里的非数字符号去掉,<0.1按0.05处理,转为数值类型
|
||||
```
|
||||
|
||||
**AI生成代码**:
|
||||
```python
|
||||
# 憭<EFBFBD><EFBFBD><EFBFBD>啣<EFBFBD>澆<EFBFBD><EFBFBD><EFBFBD>泵<EFBFBD>?
|
||||
# 处理数值列的符号
|
||||
df['creatinine'] = df['creatinine'].astype(str).str.replace('>', '').str.replace('<', '')
|
||||
df.loc[df['creatinine'] == '0.1', 'creatinine'] = '0.05'
|
||||
df['creatinine'] = pd.to_numeric(df['creatinine'], errors='coerce')
|
||||
```
|
||||
|
||||
**代码说明**:
|
||||
- 璉<EFBFBD>撉𣬚<EFBFBD>撖澆枂<EFBFBD>唳旿撣詨鉄蝚血噡嚗?<3F>?<3F>?蝑㚁<E89D91>
|
||||
- 检验科导出数据常含符号(>、<、+等)
|
||||
- 需清理后才能进行数值计算和统计分析
|
||||
- errors='coerce'将无法转换的值设为NaN
|
||||
|
||||
**预期结果**:
|
||||
- <EFBFBD>屸<EFBFBD><EFBFBD>堒<EFBFBD>銝箇滲<EFBFBD>啣<EFBFBD>潛掩<EFBFBD>?
|
||||
- 蝚血噡鋡急迤蝖桀<EFBFBD><EFBFBD>?
|
||||
- 肌酐列变为纯数值类型
|
||||
- 符号被正确处理
|
||||
- 异常值转为NaN
|
||||
|
||||
**医疗场景**:
|
||||
- 检验报告:肌酐、白细胞、血红蛋白等
|
||||
- 撣貉<EFBFBD>蝚血噡嚗?100<EFBFBD>?0.1<EFBFBD>?2.5+
|
||||
- 常见符号:>100、<0.1、12.5+
|
||||
|
||||
---
|
||||
|
||||
@@ -87,28 +87,28 @@ df['creatinine'] = pd.to_numeric(df['creatinine'], errors='coerce')
|
||||
|
||||
**用户指令**:
|
||||
```
|
||||
<EFBFBD>𦠜<EFBFBD>批<EFBFBD><EFBFBD>𡑒蓮銝箸㺭摮梹<EFBFBD><EFBFBD>?1嚗<31>戊=0
|
||||
把性别列转为数字,男=1,女=0
|
||||
```
|
||||
|
||||
**AI生成代码**:
|
||||
```python
|
||||
# 性别编码
|
||||
df['gender_code'] = df['gender'].map({'<EFBFBD>?: 1, '憟?: 0})
|
||||
df['gender_code'] = df['gender'].map({'男': 1, '女': 0})
|
||||
```
|
||||
|
||||
**代码说明**:
|
||||
- 撠<EFBFBD><EFBFBD><EFBFBD>砍<EFBFBD>蝐餃<EFBFBD><EFBFBD>讛蓮銝箸㺭<EFBFBD>?
|
||||
- 将文本分类变量转为数值
|
||||
- 便于Logistic回归、神经网络等建模
|
||||
- map<EFBFBD>寞<EFBFBD>蝞<EFBFBD>瘣<EFBFBD><EFBFBD><EFBFBD>?
|
||||
- map方法简洁高效
|
||||
|
||||
**预期结果**:
|
||||
- <EFBFBD>啣<EFBFBD>gender_code<EFBFBD>?
|
||||
- 新增gender_code列
|
||||
- 男→1, 女→0
|
||||
- <EFBFBD>鰛ender<EFBFBD>𦯀<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
|
||||
- 原gender列保持不变
|
||||
|
||||
**扩展场景**:
|
||||
- 憍𡁜宏<EFBFBD>嗆<EFBFBD>? 撌脣<E6928C>=1, <EFBFBD>芸<EFBFBD>=0
|
||||
- <EFBFBD>貊<EFBFBD><EFBFBD>? <20>?1, <EFBFBD>?0
|
||||
- 婚姻状态: 已婚=1, 未婚=0
|
||||
- 吸烟史: 是=1, 否=0
|
||||
- 教育程度: 小学=1, 初中=2, 高中=3, 大学=4
|
||||
|
||||
---
|
||||
@@ -117,7 +117,7 @@ df['gender_code'] = df['gender'].map({'
|
||||
|
||||
**用户指令**:
|
||||
```
|
||||
<EFBFBD>𠰴僑樴<EFBFBD><EFBFBD>18撗<EFBFBD><EFBFBD>?0撗<30><E69297>銝箸𧊋<E7AEB8>𣂼僑<F0A382BC><E58391><EFBFBD>撟氬<E6929F><E6B0AC><EFBFBD><EFBFBD>僑銝厩<E98A9D>
|
||||
把年龄按18岁、60岁分为未成年、成年、老年三组
|
||||
```
|
||||
|
||||
**AI生成代码**:
|
||||
@@ -125,35 +125,35 @@ df['gender_code'] = df['gender'].map({'
|
||||
# 年龄分组
|
||||
df['age_group'] = pd.cut(df['age'],
|
||||
bins=[0, 18, 60, 120],
|
||||
labels=['<EFBFBD>芣<EFBFBD>撟?, '<EFBFBD>𣂼僑', '<EFBFBD><EFBFBD>僑'],
|
||||
labels=['未成年', '成年', '老年'],
|
||||
right=False)
|
||||
```
|
||||
|
||||
**代码说明**:
|
||||
- 撠<EFBFBD><EFBFBD>蝏剖<EFBFBD><EFBFBD>讐氖<EFBFBD><EFBFBD><EFBFBD>銝箸<EFBFBD>摨誩<EFBFBD>蝐餃<EFBFBD><EFBFBD>?
|
||||
- 将连续变量离散化为有序分类变量
|
||||
- bins定义区间边界,labels定义标签
|
||||
- right=False表示左闭右开区间[0,18)
|
||||
|
||||
**预期结果**:
|
||||
- <EFBFBD>啣<EFBFBD>age_group<EFBFBD>?
|
||||
- 0-17撗<EFBFBD><EFBFBD><EFBFBD>芣<EFBFBD>撟?
|
||||
- 新增age_group列
|
||||
- 0-17岁→未成年
|
||||
- 18-59岁→成年
|
||||
- 60+岁→老年
|
||||
|
||||
**医疗应用**:
|
||||
- <EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>嚗<EFBFBD><EFBFBD>撟湧<EFBFBD>畾萄<EFBFBD><EFBFBD><EFBFBD><EFBFBD>嚗?
|
||||
- <EFBFBD>⊥䲮璉<EFBFBD>撉䕘<EFBFBD>撟湧<EFBFBD>蝏<EFBFBD>s<EFBFBD>曄<EFBFBD>嚗?
|
||||
- 分层分析(各年龄段发病率)
|
||||
- 卡方检验(年龄组vs疾病)
|
||||
- 描述性统计(基线特征表)
|
||||
|
||||
---
|
||||
|
||||
## 🏥 Level 3: 医学计算与筛选(3个)
|
||||
|
||||
### 蝷箔<EFBFBD>5: BMI霈∠<EFBFBD>銝𤾸<EFBFBD>蝐?
|
||||
### 示例5: BMI计算与分类
|
||||
|
||||
**用户指令**:
|
||||
```
|
||||
<EFBFBD>寞旿頨恍<EFBFBD>(cm)<29>䔶<EFBFBD><E494B6>?kg)霈∠<E99C88>BMI嚗<49>僎<EFBFBD><E5838E>扇BMI<4D>?8銝箄<E98A9D><E7AE84>?
|
||||
根据身高(cm)和体重(kg)计算BMI,并标记BMI≥28为肥胖
|
||||
```
|
||||
|
||||
**AI生成代码**:
|
||||
@@ -167,23 +167,23 @@ df['obesity'] = df['BMI'].apply(lambda x: '肥胖' if x >= 28 else '正常')
|
||||
|
||||
**代码说明**:
|
||||
- BMI公式: 体重(kg) / 身高(m)²
|
||||
- 銝剖𤙴<EFBFBD><EFBFBD><EFBFBD>: BMI<4D>?8銝箄<E98A9D><E7AE84><EFBFBD>
|
||||
- 中国标准: BMI≥28为肥胀
|
||||
- 向量化计算,无需循环
|
||||
|
||||
**预期结果**:
|
||||
- 新增BMI列(数值)
|
||||
- <EFBFBD>啣<EFBFBD>obesity<EFBFBD>梹<EFBFBD><EFBFBD><EFBFBD>掩嚗?
|
||||
- 新增obesity列(分类)
|
||||
|
||||
**临床标准**:
|
||||
- 偏瘦: BMI < 18.5
|
||||
- 甇<EFBFBD>虜: 18.5 <EFBFBD>?BMI < 24
|
||||
- 頞<EFBFBD><EFBFBD>: 24 <EFBFBD>?BMI < 28
|
||||
- <EFBFBD>亥<EFBFBD>: BMI <EFBFBD>?28
|
||||
- 正常: 18.5 ≤ BMI < 24
|
||||
- 超重: 24 ≤ BMI < 28
|
||||
- 肥胖: BMI ≥ 28
|
||||
|
||||
**扩展场景**:
|
||||
- 体表面积(BSA): 化疗剂量计算
|
||||
- eGFR: <EFBFBD>曉<EFBFBD><EFBFBD>質<EFBFBD>隡?
|
||||
- APACHE霂<EFBFBD><EFBFBD>: <20>梢<EFBFBD><E6A2A2><EFBFBD><EFBFBD>隡?
|
||||
- eGFR: 肾功能评估
|
||||
- APACHE评分: 危重症评估
|
||||
|
||||
---
|
||||
|
||||
@@ -191,7 +191,7 @@ df['obesity'] = df['BMI'].apply(lambda x: '肥胖' if x >= 28 else '正常')
|
||||
|
||||
**用户指令**:
|
||||
```
|
||||
<EFBFBD>寞旿<EFBFBD>仿堺<EFBFBD>交<EFBFBD><EFBFBD><EFBFBD>枂<EFBFBD>X𠯫<EFBFBD>蠘恣蝞𦯀<EFBFBD><EFBFBD>W予<EFBFBD>?
|
||||
根据入院日期和出院日期计算住院天数
|
||||
```
|
||||
|
||||
**AI生成代码**:
|
||||
@@ -208,13 +208,13 @@ df['length_of_stay'] = (df['discharge_date'] - df['admission_date']).dt.days
|
||||
- .dt.days提取天数
|
||||
|
||||
**预期结果**:
|
||||
- <EFBFBD>啣<EFBFBD>length_of_stay<EFBFBD>?
|
||||
- 新增length_of_stay列
|
||||
- 数值类型,单位为天
|
||||
|
||||
**医疗应用**:
|
||||
- 住院天数分析
|
||||
- 撟喳<EFBFBD>雿誯堺<EFBFBD>?ALOS)蝏蠘恣
|
||||
- 雿誯堺韐寧鍂<EFBFBD>詨<EFBFBD><EFBFBD>批<EFBFBD><EFBFBD>?
|
||||
- 平均住院日(ALOS)统计
|
||||
- 住院费用相关性分析
|
||||
|
||||
**扩展场景**:
|
||||
- 计算年龄: (今天 - 出生日期) / 365.25
|
||||
@@ -223,34 +223,34 @@ df['length_of_stay'] = (df['discharge_date'] - df['admission_date']).dt.days
|
||||
|
||||
---
|
||||
|
||||
### 蝷箔<EFBFBD>7: <20>∩辣蝑偦<E89D91>㚁<EFBFBD><E39A81>亦<EFBFBD><E4BAA6><EFBFBD><EFBFBD>嚗?
|
||||
### 示例7: 条件筛选(入组标准)
|
||||
|
||||
**用户指令**:
|
||||
```
|
||||
蝑偦<EFBFBD>匧枂撟湧<EFBFBD><EFBFBD>?8撗<38><E69297><EFBFBD><EFBFBD><EFBFBD>凋蛹蝟硋倏<E7A18B><E5808F><EFBFBD><EFBFBD><EFBFBD>銵<EFBFBD>蝟砽竉7.0<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
|
||||
筛选出年龄≥18岁、诊断为糖尿病、且血糖≥7.0的患者
|
||||
```
|
||||
|
||||
**AI生成代码**:
|
||||
```python
|
||||
# 憭𡁏辺隞嗥<EFBFBD><EFBFBD>?
|
||||
# 多条件筛选
|
||||
df_selected = df[
|
||||
(df['age'] >= 18) &
|
||||
(df['diagnosis'] == '蝟硋倏<EFBFBD>?) &
|
||||
(df['diagnosis'] == '糖尿病') &
|
||||
(df['glucose'] >= 7.0)
|
||||
]
|
||||
```
|
||||
|
||||
**代码说明**:
|
||||
- 撣<EFBFBD><EFBFBD>蝝W<EFBFBD>嚗<EFBFBD><EFBFBD><EFBFBD>∩辣<EFBFBD>?餈墧𦻖
|
||||
- 瘥譍葵<EFBFBD>∩辣<EFBFBD><EFBFBD><EFBFBD>䭾𡠺<EFBFBD>?
|
||||
- 餈𥪜<EFBFBD>皛∟雲<EFBFBD><EFBFBD><EFBFBD>㗇辺隞嗥<EFBFBD>銵?
|
||||
- 布尔索引,多条件用&连接
|
||||
- 每个条件需加括号
|
||||
- 返回满足所有条件的行
|
||||
|
||||
**预期结果**:
|
||||
- 新DataFrame,包含符合条件的病例
|
||||
- 原df不变
|
||||
|
||||
**临床研究应用**:
|
||||
- 蝥喳<EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝑偦<EFBFBD>?
|
||||
- 纳入标准筛选
|
||||
- 排除标准剔除
|
||||
- 亚组分析(如老年糖尿病患者)
|
||||
|
||||
@@ -260,7 +260,7 @@ df_selected = df[
|
||||
included = df[
|
||||
(df['age'] >= 18) &
|
||||
(df['age'] <= 75) &
|
||||
(df['diagnosis'].isin(['<EFBFBD>箄<EFBFBD><EFBFBD>?, '<EFBFBD>粹<EFBFBD><EFBFBD>?])) &
|
||||
(df['diagnosis'].isin(['肺腺癌', '肺鳞癌'])) &
|
||||
(df['stage'].isin(['III', 'IV']))
|
||||
]
|
||||
|
||||
@@ -274,43 +274,43 @@ excluded = df[
|
||||
|
||||
---
|
||||
|
||||
## <EFBFBD>𥈡 Level 4: 擃条漣<EFBFBD>唳旿瘝餌<EFBFBD>嚗?銝迎<E98A9D>
|
||||
## 🔬 Level 4: 高级数据治理(3个)
|
||||
|
||||
### 蝷箔<EFBFBD>8: 蝞<><E89D9E>閧撩憭勗<E686AD>澆‵銵?
|
||||
### 示例8: 简单缺失值填补
|
||||
|
||||
**用户指令**:
|
||||
```
|
||||
<EFBFBD>其葉雿齿㺭憛怨‘BMI<EFBFBD>㛖<EFBFBD>蝻箏仃<EFBFBD>?
|
||||
用中位数填补BMI列的缺失值
|
||||
```
|
||||
|
||||
**AI生成代码**:
|
||||
```python
|
||||
# <EFBFBD>其葉雿齿㺭憛怨‘蝻箏仃<EFBFBD>?
|
||||
# 用中位数填补缺失值
|
||||
bmi_median = df['BMI'].median()
|
||||
df['BMI'] = df['BMI'].fillna(bmi_median)
|
||||
```
|
||||
|
||||
**代码说明**:
|
||||
- median()霈∠<EFBFBD>銝凋<EFBFBD><EFBFBD>堆<EFBFBD><EFBFBD>㘾膄NaN嚗?
|
||||
- fillna()憛怨‘蝻箏仃<EFBFBD>?
|
||||
- median()计算中位数(排除NaN)
|
||||
- fillna()填补缺失值
|
||||
- 中位数受离群值影响小
|
||||
|
||||
**预期结果**:
|
||||
- BMI<EFBFBD>埈<EFBFBD>蝻箏仃<EFBFBD>?
|
||||
- BMI列无缺失值
|
||||
- 缺失位置被中位数替代
|
||||
|
||||
**填补方法选择**:
|
||||
| 方法 | 适用场景 | 优点 | 缺点 |
|
||||
|------|---------|------|------|
|
||||
| <EFBFBD><EFBFBD><EFBFBD>?| 甇<><E79487><EFBFBD><EFBFBD>撣?| 蝞<><E89D9E>?| <20>㛖氖蝢文<E89DA2>澆蔣<E6BE86>?|
|
||||
| 銝凋<EFBFBD><EFBFBD>?| <20>𤩺<EFBFBD><F0A4A9BA><EFBFBD>撣?| 蝔喳<E89D94> | 靽⊥<E99DBD><E28AA5>笔仃 |
|
||||
| 隡埈㺭 | <20><>掩<EFBFBD>㗛<EFBFBD> | 靽萘<E99DBD><E89098><EFBFBD><EFBFBD> | <20>航<EFBFBD>銝滚<E98A9D><E6BB9A>?|
|
||||
| 均值 | 正态分布 | 简单 | 受离群值影响 |
|
||||
| 中位数 | 偏态分布 | 稳健 | 信息损失 |
|
||||
| 众数 | 分类变量 | 保留分布 | 可能不合理 |
|
||||
| 前向填充 | 时间序列 | 保留趋势 | 仅适用时序 |
|
||||
|
||||
**注意事项**:
|
||||
- ⚠️ 仅适用于缺失率<5%
|
||||
- ⚠️ 假设缺失为MCAR(完全随机缺失)
|
||||
- <EFBFBD>𩤃<EFBFBD> <20>航<EFBFBD>雿𦒘摯<F0A69298><E691AF><EFBFBD>撌?
|
||||
- ⚠️ 可能低估标准差
|
||||
|
||||
---
|
||||
|
||||
@@ -318,7 +318,7 @@ df['BMI'] = df['BMI'].fillna(bmi_median)
|
||||
|
||||
**用户指令**:
|
||||
```
|
||||
雿輻鍂憭𡁻<EFBFBD><EFBFBD>坿‘瘜訫笆BMI<EFBFBD><EFBFBD>僑樴<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>𣂼<EFBFBD><EFBFBD><EFBFBD>撩憭勗<EFBFBD>潸<EFBFBD>銵<EFBFBD>‵銵?
|
||||
使用多重插补法对BMI、年龄、肌酐列的缺失值进行填补
|
||||
```
|
||||
|
||||
**AI生成代码**:
|
||||
@@ -336,42 +336,42 @@ df[cols] = imputer.fit_transform(df[cols])
|
||||
|
||||
**代码说明**:
|
||||
- MICE (Multivariate Imputation by Chained Equations)
|
||||
- <EFBFBD>拍鍂<EFBFBD>㗛<EFBFBD><EFBFBD>渡㮾<EFBFBD>單<EFBFBD>折<EFBFBD>瘚讠撩憭勗<EFBFBD>?
|
||||
- max_iter=10: <EFBFBD><EFBFBD>憭朞翮隞?0甈?
|
||||
- random_state=0: <EFBFBD>臬<EFBFBD><EFBFBD>啁<EFBFBD><EFBFBD>?
|
||||
- 利用变量间相关性预测缺失值
|
||||
- max_iter=10: 最多迭代10次
|
||||
- random_state=0: 可复现结果
|
||||
|
||||
**算法原理**:
|
||||
1. 初始填补(如均值)
|
||||
2. 敺芰㴓餈凋誨嚗?
|
||||
- 撖寞<EFBFBD>銝芣<EFBFBD>蝻箏仃<EFBFBD><EFBFBD><EFBFBD><EFBFBD>𧶏<EFBFBD><EFBFBD>典<EFBFBD>隞硋<EFBFBD><EFBFBD>誯<EFBFBD>瘚?
|
||||
- <EFBFBD>湔鰵憛怨‘<EFBFBD>?
|
||||
3. <EFBFBD>嗆<EFBFBD><EFBFBD>𤾸<EFBFBD>甇?
|
||||
2. 循环迭代:
|
||||
- 对每个有缺失的变量,用其他变量预测
|
||||
- 更新填补值
|
||||
3. 收敛后停止
|
||||
|
||||
**适用场景**:
|
||||
- <EFBFBD>?蝻箏仃<E7AE8F>?%-30%
|
||||
- <EFBFBD>?蝻箏仃<E7AE8F>箏<EFBFBD>銝撤AR嚗<52><E59A97><EFBFBD>箇撩憭梧<E686AD>
|
||||
- <EFBFBD>?<3F>㗛<EFBFBD><E3979B>游<EFBFBD><E6B8B8>函㮾<E587BD>單<EFBFBD>?
|
||||
- <EFBFBD>?<3F><>閬<EFBFBD><E996AC><EFBFBD><EFBFBD>㺭<EFBFBD>桀<EFBFBD>撣<EFBFBD>鸌敺?
|
||||
- ✅ 缺失率5%-30%
|
||||
- ✅ 缺失机制为MAR(随机缺失)
|
||||
- ✅ 变量间存在相关性
|
||||
- ✅ 需要保持数据分布特征
|
||||
|
||||
**优势**:
|
||||
- <EFBFBD>拍鍂<EFBFBD>㗛<EFBFBD><EFBFBD>游<EFBFBD>蝟?
|
||||
- 利用变量间关系
|
||||
- 保持数据分布
|
||||
- 减少偏差
|
||||
- 蝏蠘恣摮虫<EFBFBD><EFBFBD>游<EFBFBD><EFBFBD>?
|
||||
- 统计学上更合理
|
||||
|
||||
**vs 蝞<EFBFBD><EFBFBD>訫‵銵?*:
|
||||
| <EFBFBD><EFBFBD><EFBFBD> | 蝞<><E89D9E>訫‵銵?| 憭𡁻<E686AD><F0A181BB>坿‘ |
|
||||
**vs 简单填补**:
|
||||
| 指标 | 简单填补 | 多重插补 |
|
||||
|------|---------|---------|
|
||||
| 憭齿<EFBFBD>摨?| 雿?| 銝剔<E98A9D> |
|
||||
| 霈∠<EFBFBD><EFBFBD>園𡢿 | 敹?| 颲<><E9A2B2> |
|
||||
| 复杂度 | 低 | 中等 |
|
||||
| 计算时间 | 快 | 较慢 |
|
||||
| 偏差 | 可能较大 | 较小 |
|
||||
| <EFBFBD><EFBFBD><EFBFBD>霂?| 雿𦒘摯 | <20><>& |
|
||||
| <EFBFBD><EFBFBD><EFBFBD>靽脲<EFBFBD> | 撌?| 憟?|
|
||||
| 标准误 | 低估 | 准确 |
|
||||
| 分布保持 | 差 | 好 |
|
||||
|
||||
**注意事项**:
|
||||
- 隞<EFBFBD><EFBFBD><EFBFBD>鍂鈭擧㺭<EFBFBD>澆<EFBFBD><EFBFBD>?
|
||||
- <EFBFBD><EFBFBD>掩<EFBFBD>㗛<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
|
||||
- 憒<EFBFBD><EFBFBD><EFBFBD>園𡢿<EFBFBD>㗛<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>蓮<EFBFBD>?
|
||||
- 仅适用于数值变量
|
||||
- 分类变量需先编码
|
||||
- 如有时间变量需先转换
|
||||
|
||||
---
|
||||
|
||||
@@ -384,7 +384,7 @@ df[cols] = imputer.fit_transform(df[cols])
|
||||
|
||||
**AI生成代码**:
|
||||
```python
|
||||
# <EFBFBD>箄<EFBFBD><EFBFBD>駁<EFBFBD>嚗𡁜<EFBFBD><EFBFBD>㗇𠯫<EFBFBD><EFBFBD><EFBFBD>摨𧶏<EFBFBD><EFBFBD>齿<EFBFBD>ID<EFBFBD>駁<EFBFBD>靽萘<EFBFBD><EFBFBD><EFBFBD><EFBFBD>𦒘<EFBFBD><EFBFBD>?
|
||||
# 智能去重:先按日期排序,再按ID去重保留最后一条
|
||||
df['check_date'] = pd.to_datetime(df['check_date'])
|
||||
df = df.sort_values('check_date').drop_duplicates(subset=['patient_id'], keep='last')
|
||||
```
|
||||
@@ -395,19 +395,19 @@ df = df.sort_values('check_date').drop_duplicates(subset=['patient_id'], keep='l
|
||||
- keep='last'保留最后一条(即最新日期)
|
||||
|
||||
**预期结果**:
|
||||
- 瘥譍葵<EFBFBD><EFBFBD><EFBFBD><EFBFBD>蘨靽萘<EFBFBD>銝<EFBFBD><EFBFBD>∟扇敶?
|
||||
- 每个患者只保留一条记录
|
||||
- 保留的是检查日期最新的那条
|
||||
|
||||
**扩展场景**:
|
||||
|
||||
**<EFBFBD>箸艶1: 靽萘<E99DBD><E89098>唳旿<E594B3><E697BF>摰峕㟲<E5B395><E39FB2>扇敶?*
|
||||
**场景1: 保留数据最完整的记录**
|
||||
```python
|
||||
# 计算每行的完整度
|
||||
df['completeness'] = df.notna().sum(axis=1)
|
||||
df = df.sort_values('completeness', ascending=False).drop_duplicates(subset=['patient_id'], keep='first')
|
||||
```
|
||||
|
||||
**<EFBFBD>箸艶2: 憭𡁜<E686AD>畾萇<E795BE><E89087><EFBFBD>縧<EFBFBD>?*
|
||||
**场景2: 多字段组合去重**
|
||||
```python
|
||||
# 按患者ID+就诊日期去重
|
||||
df = df.drop_duplicates(subset=['patient_id', 'visit_date'], keep='first')
|
||||
@@ -415,13 +415,13 @@ df = df.drop_duplicates(subset=['patient_id', 'visit_date'], keep='first')
|
||||
|
||||
**场景3: 复杂逻辑去重**
|
||||
```python
|
||||
# 隡睃<EFBFBD>蝥改<EFBFBD><EFBFBD>交<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?> 摰峕㟲摨行<E691A8>擃?
|
||||
# 优先级:日期最新 > 完整度最高
|
||||
df = df.sort_values(['check_date', 'completeness'], ascending=[False, False]).drop_duplicates(subset=['patient_id'], keep='first')
|
||||
```
|
||||
|
||||
**医疗场景**:
|
||||
- <EFBFBD>𣳇膄<EFBFBD>滚<EFBFBD>敶訫<EFBFBD><EFBFBD><EFBFBD><EFBFBD>靘?
|
||||
- 憭𡁏活撠梯<EFBFBD><EFBFBD>㚚<EFBFBD>甈?<3F>急活
|
||||
- 删除重复录入的病例
|
||||
- 多次就诊取首次/末次
|
||||
- 检验结果去重(取最新)
|
||||
|
||||
---
|
||||
@@ -432,44 +432,44 @@ df = df.sort_values(['check_date', 'completeness'], ascending=[False, False]).dr
|
||||
|
||||
```python
|
||||
system_prompt = f"""
|
||||
雿䭾糓<EFBFBD>餌<EFBFBD>蝘𤑳<EFBFBD><EFBFBD>唳旿皜<EFBFBD><EFBFBD>銝枏振嚗諹<EFBFBD>韐<EFBFBD><EFBFBD><EFBFBD>辥andas隞<EFBFBD><EFBFBD><EFBFBD>交<EFBFBD>瘣埈㟲<EFBFBD><EFBFBD>㺭<EFBFBD>柴<EFBFBD>?
|
||||
你是医疗科研数据清洗专家,负责生成Pandas代码来清洗整理数据。
|
||||
|
||||
## 敶枏<EFBFBD><EFBFBD>唳旿<EFBFBD><EFBFBD>縑<EFBFBD>?
|
||||
- <EFBFBD><EFBFBD>辣<EFBFBD>? {session.fileName}
|
||||
## 当前数据集信息
|
||||
- 文件名: {session.fileName}
|
||||
- 行数: {session.totalRows}
|
||||
- 列数: {session.totalCols}
|
||||
- 列名: {', '.join(session.columns)}
|
||||
|
||||
## 安全规则(强制)
|
||||
1. 只能操作df变量
|
||||
2. 蝳<EFBFBD>迫撖澆<EFBFBD>os<EFBFBD><EFBFBD>ys蝑匧暒<EFBFBD>拇芋<EFBFBD>?
|
||||
3. 蝳<EFBFBD>迫雿輻鍂eval<EFBFBD><EFBFBD>xec蝑匧暒<EFBFBD>拙遆<EFBFBD>?
|
||||
2. 禁止导入os、sys等危险模块
|
||||
3. 禁止使用eval、exec等危险函数
|
||||
4. 必须进行异常处理
|
||||
5. 返回格式: {{"code": "...", "explanation": "..."}}
|
||||
|
||||
## Few-shot示例
|
||||
|
||||
### 蝷箔<EFBFBD>1: 蝏煺<E89D8F>蝻箏仃<E7AE8F>潭<EFBFBD>霈?
|
||||
<EFBFBD>冽<EFBFBD>: <20>𦠜<EFBFBD><F0A6A09C>劐誨銵函撩憭梁<E686AD>蝚血噡蝏煺<E89D8F><E785BA>踵揢銝箸<E98A9D><E7AEB8><EFBFBD>征<EFBFBD>?
|
||||
### 示例1: 统一缺失值标记
|
||||
用户: 把所有代表缺失的符号统一替换为标准空值
|
||||
代码:
|
||||
```python
|
||||
df = df.replace(['-', '不详', 'NA', 'N/A'], np.nan)
|
||||
```
|
||||
|
||||
### 示例2: 数值列清洗
|
||||
<EFBFBD>冽<EFBFBD>: <20>𡃏<EFBFBD><F0A1838F>𣂼<EFBFBD><F0A382BC>𣬚<EFBFBD><F0A3AC9A>墧㺭摮㛖泵<E39B96>瑕縧<E79195>㚁<EFBFBD>頧砌蛹<E7A08C>啣<EFBFBD>潛掩<E6BD9B>?
|
||||
用户: 把肌酐列里的非数字符号去掉,转为数值类型
|
||||
代码:
|
||||
```python
|
||||
df['creatinine'] = df['creatinine'].astype(str).str.replace('>', '').str.replace('<', '')
|
||||
df['creatinine'] = pd.to_numeric(df['creatinine'], errors='coerce')
|
||||
```
|
||||
|
||||
[... <EFBFBD>嗡<EFBFBD>8銝芰內靘?...]
|
||||
[... 其他8个示例 ...]
|
||||
|
||||
## 用户当前请求
|
||||
{user_message}
|
||||
|
||||
霂瑞<EFBFBD><EFBFBD>𣂷誨<EFBFBD><EFBFBD>僎閫<EFBFBD><EFBFBD><EFBFBD>?
|
||||
请生成代码并解释。
|
||||
"""
|
||||
```
|
||||
|
||||
@@ -477,56 +477,55 @@ df['creatinine'] = pd.to_numeric(df['creatinine'], errors='coerce')
|
||||
|
||||
## 🎯 质量标准
|
||||
|
||||
瘥譍葵蝷箔<EFBFBD>敹<EFBFBD>◆皛∟雲嚗?
|
||||
- <EFBFBD>?隞<><E99A9E><EFBFBD>舐凒<E88890>亥<EFBFBD>銵?
|
||||
- <EFBFBD>?<3F>㕑祕蝏<E7A595>釣<EFBFBD>?
|
||||
- <EFBFBD>?<3F>㗇<EFBFBD>蝖桃<E89D96>颲枏<E9A2B2>颲枏枂
|
||||
- <EFBFBD>?蝚血<E89D9A>Python<EFBFBD><EFBFBD>雿喳<EFBFBD>頝?
|
||||
- <EFBFBD>?<3F><><EFBFBD>撘<EFBFBD>虜<EFBFBD><E8999C><EFBFBD>
|
||||
- <EFBFBD>?<3F>匧龫<E58CA7>堒㦤<E5A092>航秩<E888AA>?
|
||||
每个示例必须满足:
|
||||
- ✅ 代码可直接运行
|
||||
- ✅ 有详细注释
|
||||
- ✅ 有明确的输入输出
|
||||
- ✅ 符合Python最佳实践
|
||||
- ✅ 考虑异常情况
|
||||
- ✅ 有医疗场景说明
|
||||
|
||||
---
|
||||
|
||||
## 📊 测试用例设计
|
||||
|
||||
<EFBFBD>箔<EFBFBD>餈?0銝芰內靘页<E99D98>Day 3瘚贝<E7989A>摨𥪜<E691A8><F0A5AA9C>恬<EFBFBD>
|
||||
基于这10个示例,Day 3测试应包含:
|
||||
|
||||
**<EFBFBD>箇<EFBFBD>瘚贝<EFBFBD>嚗?銝迎<E98A9D>**:
|
||||
1. 蝷箔<EFBFBD>1瘚贝<EFBFBD>嚗<EFBFBD>撩憭勗<EFBFBD>潛<EFBFBD>銝<EFBFBD>嚗?
|
||||
**基础测试(4个)**:
|
||||
1. 示例1测试(缺失值统一)
|
||||
2. 示例2测试(数值清洗)
|
||||
3. 蝷箔<EFBFBD>3瘚贝<EFBFBD>嚗<EFBFBD><EFBFBD>批<EFBFBD>蝻𣇉<EFBFBD>嚗?
|
||||
3. 示例3测试(性别编码)
|
||||
4. 示例4测试(年龄分组)
|
||||
|
||||
**銝剔漣瘚贝<EFBFBD>嚗?銝迎<E98A9D>**:
|
||||
5. 蝷箔<EFBFBD>5瘚贝<EFBFBD>嚗㇂MI霈∠<EFBFBD>嚗?
|
||||
**中级测试(3个)**:
|
||||
5. 示例5测试(BMI计算)
|
||||
6. 示例6测试(住院天数)
|
||||
7. 示例7测试(条件筛选)
|
||||
|
||||
**擃条漣瘚贝<EFBFBD>嚗?銝迎<E98A9D>**:
|
||||
8. 蝷箔<EFBFBD>8瘚贝<EFBFBD>嚗<EFBFBD>葉雿齿㺭憛怨‘嚗?
|
||||
9. 蝷箔<EFBFBD>9瘚贝<EFBFBD>嚗<EFBFBD><EFBFBD><EFBFBD>齿<EFBFBD>銵伐<EFBFBD>潃?
|
||||
**高级测试(3个)**:
|
||||
8. 示例8测试(中位数填补)
|
||||
9. 示例9测试(多重插补)⭐
|
||||
10. 示例10测试(智能去重)
|
||||
|
||||
**<EFBFBD>拙<EFBFBD>瘚贝<EFBFBD>嚗?銝迎<E98A9D>**:
|
||||
**扩展测试(5个)**:
|
||||
11. 混合场景测试(先清洗再计算)
|
||||
12. 错误场景测试(列不存在)
|
||||
13. 边界场景测试(全部缺失)
|
||||
14. <EFBFBD>芣<EFBFBD>靽格迤瘚贝<EFBFBD>嚗<EFBFBD>誨<EFBFBD><EFBFBD>𥁒<EFBFBD>坔<EFBFBD><EFBFBD>滩<EFBFBD>嚗?
|
||||
14. 自我修正测试(代码报错后重试)
|
||||
15. 端到端测试(上传→AI处理→结果验证)
|
||||
|
||||
---
|
||||
|
||||
## 🔄 维护记录
|
||||
|
||||
| <EFBFBD>交<EFBFBD> | <20><>𧋦 | 靽格㺿<E6A0BC><E3BABF>捆 | 靽格㺿鈭?|
|
||||
| 日期 | 版本 | 修改内容 | 修改人 |
|
||||
|------|------|---------|--------|
|
||||
| 2025-12-06 | V1.0 | <EFBFBD>嘥<EFBFBD><EFBFBD>𥕦遣嚗?0銝芣瓲敹<E793B2>內靘?| AI Assistant |
|
||||
| 2025-12-06 | V1.0 | 初始创建,10个核心示例 | AI Assistant |
|
||||
|
||||
---
|
||||
|
||||
**<EFBFBD><EFBFBD>﹝<EFBFBD>嗆<EFBFBD>?*: <20>?撌脩&霈?
|
||||
**銝衤<EFBFBD>甇?*: 撘<>憪𠵿ay 3撘<33><E69298>𡢅<EFBFBD>AICodeService摰䂿緵嚗?
|
||||
|
||||
**文档状态**: ✅ 已确认
|
||||
**下一步**: 开始Day 3开发(AICodeService实现)
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user