feat(aia): Complete AIA V2.0 with universal streaming capabilities

Major Changes:
- Add StreamingService with OpenAI Compatible format
- Upgrade Chat component V2 with Ant Design X integration
- Implement AIA module with 12 intelligent agents
- Update API routes to unified /api/v1 prefix
- Update system documentation

Backend (~1300 lines):
- common/streaming: OpenAI Compatible adapter
- modules/aia: 12 agents, conversation service, streaming integration
- Update route versions (RVW, PKB to v1)

Frontend (~3500 lines):
- modules/aia: AgentHub + ChatWorkspace (100% prototype restoration)
- shared/Chat: AIStreamChat, ThinkingBlock, useAIStream Hook
- Update API endpoints to v1

Documentation:
- AIA module status guide
- Universal capabilities catalog
- System overview updates
- All module documentation sync

Tested: Stream response verified, authentication working
Status: AIA V2.0 core completed (85%)
This commit is contained in:
2026-01-14 19:15:01 +08:00
parent 3d35e9c58b
commit 1b53ab9d52
386 changed files with 52096 additions and 65238 deletions

View File

@@ -1,48 +1,48 @@
# 工具C - AI Copilot Few-shot示例库
# 撌亙<EFBFBD>C - AI Copilot Few-shot蝷箔<EFBFBD>摨?
> **<2A><><EFBFBD><EFB99D>𧋦**: V1.0
> **<2A>𥕦遣<F0A595A6><EFBFBD>**: 2025-12-06
> **用途**: System Prompt中的Few-shot示例
> **覆盖场景**: 从基础清洗到高级插补10个核心场景
> **<EFBFBD><EFBFBD>?*: System Prompt銝剔<EFBFBD>Few-shot蝷箔<EFBFBD>
> **<EFBFBD><EFBFBD><EFBFBD>箸艶**: 隞𤾸抅蝖<E68A85><EFBFBD><E79A9C><EFBFBD><EFBFBD>蝥扳<E89DA5>銵伐<E98AB5>10銝芣瓲敹<E793B2><EFBFBD>?
---
## <20><> 蝷箔<E89DB7><E7AE94><EFBFBD>
| 编号 | 场景名称 | 级别 | 技术要点 | 医疗价值 |
| 蝻硋噡 | <20>箸艶<E7AEB8>滨妍 | 蝥批<E89DA5> | <20><><EFBFBD><EFBFBD><E888AA>?| <20><EFBFBD>隞瑕<E99A9E>?|
|------|---------|------|---------|---------|
| 1 | 统一缺失值标记 | Level 1 | replace | 数据标准化 ⭐⭐⭐ |
| 2 | 数值列清洗 | Level 1 | 正则+类型转换 | 检验值处理 ⭐⭐⭐⭐ |
| 3 | 分类变量编码 | Level 2 | map | 统计建模 ⭐⭐⭐⭐⭐ |
| 1 | 蝏煺<EFBFBD>蝻箏仃<EFBFBD><EFBFBD>霈?| Level 1 | replace | <EFBFBD>唳旿<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?潃鐥<E6BD83>潃?|
| 2 | <EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD> | Level 1 | <EFBFBD><EFBFBD>+蝐餃<E89D90>頧祆揢 | 璉<><EFBFBD><E69289><EFBFBD><E6BE86>?潃鐥<E6BD83>潃鐥<E6BD83> |
| 3 | <EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝻𣇉<EFBFBD> | Level 2 | map | 蝏蠘恣撱箸芋 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
| 4 | 餈䂿賒<E482BF><EFBFBD><E3979B><EFBFBD>拳 | Level 2 | cut | <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD> 潃鐥<E6BD83>潃鐥<E6BD83> |
| 5 | BMI计算与分类 | Level 3 | 公式+条件 | 临床指标 ⭐⭐⭐⭐⭐ |
| 6 | 日期计算 | Level 3 | datetime | 时间间隔 ⭐⭐⭐⭐⭐ |
| 7 | 条件筛选 | Level 3 | 多条件过滤 | 入组标准 ⭐⭐⭐⭐⭐ |
| 8 | 简单缺失值填补 | Level 4 | fillna | 缺失处理 ⭐⭐⭐⭐ |
| 9 | 多重插补(MICE) | Level 4 | IterativeImputer | 高级填补 ⭐⭐⭐⭐⭐ |
| 5 | BMI霈∠<EFBFBD>銝𤾸<EFBFBD>蝐?| Level 3 | <EFBFBD><EFBFBD>+<2B>∩辣 | 銝游<E98A9D><E6B8B8><EFBFBD><EFBFBD> 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
| 6 | <EFBFBD><EFBFBD>霈∠<EFBFBD> | Level 3 | datetime | <EFBFBD>園𡢿<EFBFBD><EFBFBD> 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
| 7 | <EFBFBD>∩辣蝑偦<EFBFBD>?| Level 3 | 憭𡁏辺隞嗉<EFBFBD>皛?| <20><EFBFBD><E4BAA6><EFBFBD><EFBFBD> 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
| 8 | <EFBFBD><EFBFBD>閧撩憭勗<EFBFBD>銵?| Level 4 | fillna | 蝻箏仃憭<EFBFBD><EFBFBD> 潃鐥<E6BD83>潃鐥<E6BD83> |
| 9 | 憭𡁻<EFBFBD><EFBFBD>(MICE) | Level 4 | IterativeImputer | 擃条漣憛怨‘ 潃鐥<E6BD83>潃鐥<E6BD83>潃?|
| 10 | <20><EFBFBD><E7AE84><EFBFBD> | Level 4 | sort+drop_duplicates | <20>唳旿韐券<E99F90> 潃鐥<E6BD83>潃鐥<E6BD83> |
---
## 🎯 Level 1: 基础数据清洗2个
## <EFBFBD> Level 1: <EFBFBD><EFBFBD><EFBFBD>唳旿皜<EFBFBD><EFBFBD>嚗?銝迎<E98A9D>
### 示例1: 统一缺失值标记
### 蝷箔<EFBFBD>1: 蝏煺<E89D8F>蝻箏仃<E7AE8F><EFBFBD>霈?
**<EFBFBD><EFBFBD><EFBFBD><EFBFBD>誘**:
```
把所有代表缺失的符号(-、不详、NA、N/A统一替换为标准空值
<EFBFBD>𦠜<EFBFBD><EFBFBD>劐誨銵函撩憭梁<EFBFBD>蝚血噡嚗?<3F><><EFBFBD>霂艾<E99C82><E889BE>A<EFBFBD><41>/A嚗厩<E59A97><EFBFBD><E98A9D>踵揢銝箸<E98A9D><E7AEB8><EFBFBD><EFBFBD>?
```
**AI<41><49><EFBFBD><EFBFBD><E99A9E>**:
```python
# 统一缺失值标记
# 蝏煺<EFBFBD>蝻箏仃<EFBFBD><EFBFBD>霈?
df = df.replace(['-', '銝滩祕', 'NA', 'N/A', '\\', '<EFBFBD>芣䰻'], np.nan)
```
**隞<><E99A9E>霂湔<E99C82>**:
- 医疗数据常见多种缺失值表示方式
- <EFBFBD><EFBFBD><EFBFBD>唳旿撣貉<EFBFBD>憭𡁶<EFBFBD>蝻箏仃<EFBFBD>潸”蝷箸䲮撘?
- 蝏煺<E89D8F>銝箸<E98A9D><E7AEB8><EFBFBD>aN靘蹂<E99D98><E8B982>𡒊賒蝏蠘恣<E8A098><E681A3><EFBFBD>
- 适用场景: 数据清洗第一步
- <EFBFBD><EFBFBD><EFBFBD>箸艶: <20>唳旿皜<E697BF><E79A9C>蝚砌<E89D9A>甇?
**憸<><E686B8>蝏𤘪<E89D8F>**:
- <20><><EFBFBD><EFBFBD><E58EB0><EFBFBD><EFBFBD>蝻箏仃<E7AE8F><EFBFBD>霈啗◤蝏煺<E89D8F>銝摸aN
@@ -54,30 +54,30 @@ df = df.replace(['-', '不详', 'NA', 'N/A', '\\', '未查'], np.nan)
**<EFBFBD><EFBFBD><EFBFBD><EFBFBD>誘**:
```
把肌酐列里的非数字符号去掉,<0.1按0.05处理,转为数值类型
<EFBFBD>𡃏<EFBFBD><EFBFBD>𣂼<EFBFBD><EFBFBD>𣬚<EFBFBD><EFBFBD>墧㺭摮㛖泵<EFBFBD>瑕縧<EFBFBD><EFBFBD><0.1<EFBFBD>?.05<EFBFBD><EFBFBD>嚗諹蓮銝箸㺭<EFBFBD>潛掩<EFBFBD>?
```
**AI<41><49><EFBFBD><EFBFBD><E99A9E>**:
```python
# 处理数值列的符号
# <EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
df['creatinine'] = df['creatinine'].astype(str).str.replace('>', '').str.replace('<', '')
df.loc[df['creatinine'] == '0.1', 'creatinine'] = '0.05'
df['creatinine'] = pd.to_numeric(df['creatinine'], errors='coerce')
```
**隞<><E99A9E>霂湔<E99C82>**:
- 检验科导出数据常含符号(>、<、+等)
- <EFBFBD>撉𣬚<EFBFBD>撖澆枂<EFBFBD>唳旿撣詨鉄蝚血噡嚗?<3F>?<3F>?蝑㚁<E89D91>
- <20><><EFBFBD><E79A9C><EFBFBD><EFBFBD><E693A7><EFBFBD>銵峕㺭<E5B395>潸恣蝞堒<E89D9E>蝏蠘恣<E8A098><E681A3><EFBFBD>
- errors='coerce'撠<><E692A0>瘜閗蓮<E99697><EFBFBD><EFBCB9>潸挽銝摸aN
**憸<><E686B8>蝏𤘪<E89D8F>**:
- 肌酐列变为纯数值类型
- 符号被正确处理
- <EFBFBD><EFBFBD><EFBFBD><EFBFBD>銝箇滲<EFBFBD><EFBFBD>潛掩<EFBFBD>?
- 蝚血噡鋡急迤蝖桀<EFBFBD><EFBFBD>?
-<><EFBFBD>潸蓮銝摸aN
**<EFBFBD><EFBFBD><EFBFBD>箸艶**:
-<>撉峕𥁒<E5B395>𠺪<EFBFBD><F0A0BAAA><EFBFBD><E5B1B8><EFBFBD>蒾蝏<E892BE><E89D8F><EFBFBD><EFBFBD><EFBFBD><E89DA5><EFBCBA><EFBFBD>
- 常见符号:>100、<0.1、12.5+
- 撣貉<EFBFBD>蝚血噡嚗?100<EFBFBD>?0.1<EFBFBD>?2.5+
---
@@ -87,28 +87,28 @@ df['creatinine'] = pd.to_numeric(df['creatinine'], errors='coerce')
**<EFBFBD><EFBFBD><EFBFBD><EFBFBD>誘**:
```
把性别列转为数字,男=1=0
<EFBFBD>𦠜<EFBFBD><EFBFBD><EFBFBD>𡑒蓮銝箸㺭摮梹<EFBFBD><EFBFBD>?1嚗<31>=0
```
**AI<41><49><EFBFBD><EFBFBD><E99A9E>**:
```python
# <20><EFBFBD>蝻𣇉<E89DBB>
df['gender_code'] = df['gender'].map({'': 1, '': 0})
df['gender_code'] = df['gender'].map({'<EFBFBD>?: 1, '?: 0})
```
**隞<><E99A9E>霂湔<E99C82>**:
- 将文本分类变量转为数值
- <EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝐餃<EFBFBD><EFBFBD>讛蓮銝箸㺭<EFBFBD>?
- 靘蹂<E99D98>Logistic<69>𧼮<EFBFBD><F0A7BCAE><EFBFBD><EFBFBD>蝏讐<E89D8F>蝏𦦵<E89D8F>撱箸芋
- map方法简洁高效
- map<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
**憸<><E686B8>蝏𤘪<E89D8F>**:
- 新增gender_code
- <EFBFBD><EFBFBD>gender_code<EFBFBD>?
- <20><EFBFBD>1, 憟喇<E6869F>0
- 原gender列保持不变
- <EFBFBD>ender<EFBFBD>𦯀<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
**<EFBFBD><EFBFBD><EFBFBD>箸艶**:
- 婚姻状态: 已婚=1, 未婚=0
- 吸烟史: 是=1, 否=0
- 憍𡁜宏<EFBFBD><EFBFBD>? 撌脣<E6928C>=1, <EFBFBD><EFBFBD>=0
- <EFBFBD><EFBFBD><EFBFBD>? <20>?1, <EFBFBD>?0
- <20><EFBFBD>蝔见漲: 撠誩郎=1, <20>苷葉=2, 擃䀝葉=3, 憭批郎=4
---
@@ -117,7 +117,7 @@ df['gender_code'] = df['gender'].map({'男': 1, '女': 0})
**<EFBFBD><EFBFBD><EFBFBD><EFBFBD>誘**:
```
把年龄按18岁、60岁分为未成年、成年、老年三组
<EFBFBD>𠰴僑樴<EFBFBD><EFBFBD>18撗<EFBFBD><EFBFBD>?0撗<30><E69297>銝箸𧊋<E7AEB8>𣂼僑<F0A382BC><E58391><EFBFBD>撟氬<E6929F><E6B0AC><EFBFBD><EFBFBD>僑銝厩<E98A9D>
```
**AI<41><49><EFBFBD><EFBFBD><E99A9E>**:
@@ -125,35 +125,35 @@ df['gender_code'] = df['gender'].map({'男': 1, '女': 0})
# 撟湧<E6929F><E6B9A7><EFBFBD><EFBFBD>
df['age_group'] = pd.cut(df['age'],
bins=[0, 18, 60, 120],
labels=['未成年', '成年', '老年'],
labels=['<EFBFBD><EFBFBD>撟?, '<EFBFBD>𣂼僑', '<EFBFBD><EFBFBD>'],
right=False)
```
**隞<><E99A9E>霂湔<E99C82>**:
- 将连续变量离散化为有序分类变量
- <EFBFBD><EFBFBD>蝏剖<EFBFBD><EFBFBD>讐氖<EFBFBD><EFBFBD><EFBFBD>銝箸<EFBFBD>摨誩<EFBFBD>蝐餃<EFBFBD><EFBFBD>?
- bins摰帋<E691B0><E5B88B>粹𡢿颲寧<E9A2B2>嚗𨧣abels摰帋<E691B0><E5B88B><EFBFBD>
- right=False銵函內撌阡𡡒<E998A1><EFBFBD><E596B3>粹𡢿[0,18)
**憸<><E686B8>蝏𤘪<E89D8F>**:
- 新增age_group
- 0-17岁→未成年
- <EFBFBD><EFBFBD>age_group<EFBFBD>?
- 0-17<EFBFBD><EFBFBD><EFBFBD><EFBFBD>撟?
- 18-59撗<39><E69297><EFBFBD>𣂼僑
- 60+撗<><E69297><EFBFBD><EFBFBD>
**<EFBFBD><EFBFBD>摨𠉛鍂**:
- 分层分析(各年龄段发病率)
- 卡方检验年龄组vs疾病
- <EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>撟湧<EFBFBD>畾萄<EFBFBD><EFBFBD><EFBFBD><EFBFBD>嚗?
- <EFBFBD>⊥䲮璉<EFBFBD>撉䕘<EFBFBD>撟湧<EFBFBD><EFBFBD>s<EFBFBD><EFBFBD>嚗?
- <20>讛膩<E8AE9B><EFBFBD>霈∴<E99C88><E288B4>箇瑪<E7AE87><EFBFBD>銵剁<E98AB5>
---
## <20>蘂 Level 3: <20>餃郎霈∠<E99C88>銝𡒊<E98A9D><F0A1928A><EFBFBD>3銝迎<E98A9D>
### 示例5: BMI计算与分类
### 蝷箔<EFBFBD>5: BMI霈∠<EFBFBD>銝𤾸<EFBFBD>蝐?
**<EFBFBD><EFBFBD><EFBFBD><EFBFBD>誘**:
```
根据身高(cm)和体重(kg)计算BMI并标记BMI≥28为肥胖
<EFBFBD>寞旿頨恍<EFBFBD>(cm)<29><EFBFBD><E494B6>?kg)霈∠<E99C88>BMI嚗<49><EFBFBD><E5838E>扇BMI<4D>?8銝箄<E98A9D><E7AE84>?
```
**AI<41><49><EFBFBD><EFBFBD><E99A9E>**:
@@ -167,23 +167,23 @@ df['obesity'] = df['BMI'].apply(lambda x: '肥胖' if x >= 28 else '正常')
**隞<><E99A9E>霂湔<E99C82>**:
- BMI<4D><EFBFBD>: 雿㯄<E99BBF>(kg) / 頨恍<E9A0A8>(m)簡
- 中国标准: BMI≥28为肥胀
- 銝剖𤙴<EFBFBD><EFBFBD><EFBFBD>: BMI<4D>?8銝箄<E98A9D><E7AE84><EFBFBD>
- <20><EFBFBD><E785BE>𤥁恣蝞梹<E89D9E><E6A2B9>𣳇<EFBFBD>敺芰㴓
**憸<><E686B8>蝏𤘪<E89D8F>**:
- <20><EFBFBD>BMI<4D><EFBFBD><E6A2B9><EFBFBD><EFBFBD>
- 新增obesity列(分类)
- <EFBFBD><EFBFBD>obesity<EFBFBD><EFBFBD><EFBFBD><EFBFBD>掩嚗?
**銝游<E98A9D><E6B8B8><EFBFBD><EFBFBD>**:
- <20>讐𠣕: BMI < 18.5
- 正常: 18.5 BMI < 24
- 超重: 24 BMI < 28
- 肥胖: BMI 28
- <EFBFBD>: 18.5 <EFBFBD>?BMI < 24
- <EFBFBD><EFBFBD>: 24 <EFBFBD>?BMI < 28
- <EFBFBD><EFBFBD>: BMI <EFBFBD>?28
**<EFBFBD><EFBFBD><EFBFBD>箸艶**:
- 雿栞”<E6A09E>妖(BSA): <20>𣇉<EFBFBD><F0A38789><EFBFBD><EFBFBD>霈∠<E99C88>
- eGFR: 肾功能评估
- APACHE评分: 危重症评估
- eGFR: <EFBFBD><EFBFBD><EFBFBD><EFBFBD>隡?
- APACHE<EFBFBD><EFBFBD>: <20><EFBFBD><E6A2A2><EFBFBD><EFBFBD>隡?
---
@@ -191,7 +191,7 @@ df['obesity'] = df['BMI'].apply(lambda x: '肥胖' if x >= 28 else '正常')
**<EFBFBD><EFBFBD><EFBFBD><EFBFBD>誘**:
```
根据入院日期和出院日期计算住院天数
<EFBFBD>寞旿<EFBFBD>仿堺<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>𠯫<EFBFBD>蠘恣蝞𦯀<EFBFBD><EFBFBD><EFBFBD>?
```
**AI<41><49><EFBFBD><EFBFBD><E99A9E>**:
@@ -208,13 +208,13 @@ df['length_of_stay'] = (df['discharge_date'] - df['admission_date']).dt.days
- .dt.days<79>𣂼<EFBFBD>憭拇㺭
**憸<><E686B8>蝏𤘪<E89D8F>**:
- 新增length_of_stay
- <EFBFBD><EFBFBD>length_of_stay<EFBFBD>?
- <20><EFBFBD>潛掩<E6BD9B><EFBFBD><E9A1B5><EFBFBD>銝箏予
**<EFBFBD><EFBFBD>摨𠉛鍂**:
- 雿誯堺憭拇㺭<E68B87><E3BAAD><EFBFBD>
- 平均住院日(ALOS)统计
- 住院费用相关性分析
- 撟喳<EFBFBD>雿誯堺<EFBFBD>?ALOS)蝏蠘恣
- 雿誯堺韐寧鍂<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
**<EFBFBD><EFBFBD><EFBFBD>箸艶**:
- 霈∠<E99C88>撟湧<E6929F>: (隞𠰴予 - <20><EFBFBD><E7AE87><EFBFBD>) / 365.25
@@ -223,34 +223,34 @@ df['length_of_stay'] = (df['discharge_date'] - df['admission_date']).dt.days
---
### 示例7: 条件筛选(入组标准)
### 蝷箔<EFBFBD>7: <20>∩辣蝑偦<E89D91><EFBFBD><E39A81><EFBFBD><E4BAA6><EFBFBD><EFBFBD>嚗?
**<EFBFBD><EFBFBD><EFBFBD><EFBFBD>誘**:
```
筛选出年龄≥18岁、诊断为糖尿病、且血糖≥7.0的患者
蝑偦<EFBFBD>匧枂撟湧<EFBFBD><EFBFBD>?8撗<38><E69297><EFBFBD><EFBFBD><EFBFBD>凋蛹蝟硋倏<E7A18B><E5808F><EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝟砽竉7.0<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
```
**AI<41><49><EFBFBD><EFBFBD><E99A9E>**:
```python
# 多条件筛选
# 憭𡁏辺隞嗥<EFBFBD><EFBFBD>?
df_selected = df[
(df['age'] >= 18) &
(df['diagnosis'] == '糖尿病') &
(df['diagnosis'] == '蝟硋倏<EFBFBD>?) &
(df['glucose'] >= 7.0)
]
```
**隞<><E99A9E>霂湔<E99C82>**:
- 布尔索引,多条件用&连接
- 每个条件需加括号
- 返回满足所有条件的行
- <EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>∩辣<EFBFBD>?餈墧𦻖
- 瘥譍葵<EFBFBD>∩辣<EFBFBD><EFBFBD><EFBFBD>䭾𡠺<EFBFBD>?
- 餈𥪜<EFBFBD>皛∟雲<EFBFBD><EFBFBD><EFBFBD>㗇辺隞嗥<EFBFBD>銵?
**憸<><E686B8>蝏𤘪<E89D8F>**:
- <20>蚤ataFrame嚗<65><E59A97><EFBFBD>怎泵<E6808E><E6B3B5>辺隞嗥<E99A9E><E597A5><EFBFBD><EFBFBD>
- <20>篼f銝滚<E98A9D>
**銝游<E98A9D><E6B8B8>𠉛弦摨𠉛鍂**:
- 纳入标准筛选
- 蝥喳<EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝑偦<EFBFBD>?
- <20>㘾膄<E398BE><E88684><EFBFBD><EFBFBD>娪膄
- 鈭𡁶<E988AD><F0A181B6><EFBFBD><EFBFBD><EFBFBD><E59A97><EFBFBD><EFBFBD>僑蝟硋倏<E7A18B><E5808F><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
@@ -260,7 +260,7 @@ df_selected = df[
included = df[
(df['age'] >= 18) &
(df['age'] <= 75) &
(df['diagnosis'].isin(['肺腺癌', '肺鳞癌'])) &
(df['diagnosis'].isin(['<EFBFBD><EFBFBD><EFBFBD>?, '<EFBFBD><EFBFBD><EFBFBD>?])) &
(df['stage'].isin(['III', 'IV']))
]
@@ -274,43 +274,43 @@ excluded = df[
---
## 🔬 Level 4: 高级数据治理3个
## <EFBFBD>𥈡 Level 4: 擃条漣<EFBFBD>唳旿瘝餌<EFBFBD>嚗?銝迎<E98A9D>
### 示例8: 简单缺失值填补
### 蝷箔<EFBFBD>8: 蝞<><E89D9E>閧撩憭勗<E686AD>銵?
**<EFBFBD><EFBFBD><EFBFBD><EFBFBD>誘**:
```
用中位数填补BMI列的缺失值
<EFBFBD>其葉雿齿㺭憛怨BMI<EFBFBD><EFBFBD>蝻箏仃<EFBFBD>?
```
**AI<41><49><EFBFBD><EFBFBD><E99A9E>**:
```python
# 用中位数填补缺失值
# <EFBFBD>其葉雿齿㺭憛怨蝻箏仃<EFBFBD>?
bmi_median = df['BMI'].median()
df['BMI'] = df['BMI'].fillna(bmi_median)
```
**隞<><E99A9E>霂湔<E99C82>**:
- median()计算中位数(排除NaN
- fillna()填补缺失值
- median()霈∠<EFBFBD>銝凋<EFBFBD><EFBFBD><EFBFBD><EFBFBD>㘾膄NaN嚗?
- fillna()憛怨蝻箏仃<EFBFBD>?
- 銝凋<E98A9D><E5878B><EFBFBD>蝳餌黎<E9A48C>澆蔣<E6BE86><EFBFBD>
**憸<><E686B8>蝏𤘪<E89D8F>**:
- BMI列无缺失值
- BMI<EFBFBD><EFBFBD>蝻箏仃<EFBFBD>?
- 蝻箏仃雿滨蔭鋡思葉雿齿㺭<E9BDBF>蹂誨
**憛怨<E680A8><EFBFBD><E5AF9E>㗇𥋘**:
| <20><EFBFBD> | <20><><EFBFBD>箸艶 | 隡条<E99AA1> | 蝻箇<E89DBB> |
|------|---------|------|------|
| 均值 | 正态分布 | 简单 | 受离群值影响 |
| 中位数 | 偏态分布 | 稳健 | 信息损失 |
| 众数 | 分类变量 | 保留分布 | 可能不合理 |
| <EFBFBD><EFBFBD><EFBFBD>?| 甇<><E79487><EFBFBD><EFBFBD>撣?| 蝞<><E89D9E>?| <20>㛖氖蝢文<E89DA2>澆蔣<E6BE86>?|
| 銝凋<EFBFBD><EFBFBD>?| <20>𤩺<EFBFBD><F0A4A9BA><EFBFBD>撣?| 蝔喳<E89D94> | 靽⊥<E99DBD><E28AA5>笔仃 |
| 隡埈㺭 | <20><><EFBFBD><EFBFBD> | 靽萘<E99DBD><E89098><EFBFBD><EFBFBD> | <20><EFBFBD>銝滚<E98A9D><E6BB9A>?|
| <20><EFBFBD>憛怠<E6869B> | <20>園𡢿摨誩<E691A8> | 靽萘<E99DBD>頞见飵 | 隞<><E99A9E><EFBFBD><EFBFBD><EFBFBD> |
**瘜冽<E7989C>鈭钅★**:
- <20>𩤃<EFBFBD><><E99A9E><EFBFBD>鍂鈭𡒊撩憭梁<E686AD><5%
- <20>𩤃<EFBFBD> <20><>挽蝻箏仃銝撤CAR嚗<52><E59A97><EFBFBD><EFBFBD><E588B8>箇撩憭梧<E686AD>
- ⚠️ 可能低估标准差
- <EFBFBD>𩤃<EFBFBD> <20><EFBFBD>雿𦒘摯<F0A69298><E691AF><EFBFBD>撌?
---
@@ -318,7 +318,7 @@ df['BMI'] = df['BMI'].fillna(bmi_median)
**<EFBFBD><EFBFBD><EFBFBD><EFBFBD>誘**:
```
使用多重插补法对BMI、年龄、肌酐列的缺失值进行填补
雿輻鍂憭𡁻<EFBFBD><EFBFBD>瘜訫笆BMI<EFBFBD><EFBFBD>僑樴<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>𣂼<EFBFBD><EFBFBD><EFBFBD>撩憭勗<EFBFBD><EFBFBD><EFBFBD>銵?
```
**AI<41><49><EFBFBD><EFBFBD><E99A9E>**:
@@ -336,42 +336,42 @@ df[cols] = imputer.fit_transform(df[cols])
**隞<><E99A9E>霂湔<E99C82>**:
- MICE (Multivariate Imputation by Chained Equations)
- 利用变量间相关性预测缺失值
- max_iter=10: 最多迭代10次
- random_state=0: 可复现结果
- <EFBFBD>拍鍂<EFBFBD><EFBFBD><EFBFBD>渡㮾<EFBFBD><EFBFBD><EFBFBD>瘚讠撩憭勗<EFBFBD>?
- max_iter=10: <EFBFBD><EFBFBD>憭朞翮隞?0甈?
- random_state=0: <EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
**蝞埈<E89D9E><E59F88><EFBFBD>**:
1. <20><EFBFBD>憛怨<E28098><E59A97><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
2. 循环迭代:
- 对每个有缺失的变量,用其他变量预测
- 更新填补值
3. 收敛后停止
2. 敺芰㴓餈凋誨嚗?
- 撖寞<EFBFBD>銝芣<EFBFBD>蝻箏仃<EFBFBD><EFBFBD><EFBFBD><EFBFBD>𧶏<EFBFBD><EFBFBD><EFBFBD>隞硋<EFBFBD><EFBFBD><EFBFBD>瘚?
- <EFBFBD>湔鰵憛怨<EFBFBD>?
3. <EFBFBD><EFBFBD><EFBFBD>𤾸<EFBFBD>甇?
**<EFBFBD><EFBFBD><EFBFBD>箸艶**:
- ✅ 缺失率5%-30%
- ✅ 缺失机制为MAR随机缺失
- ✅ 变量间存在相关性
- ✅ 需要保持数据分布特征
- <EFBFBD>?蝻箏仃<E7AE8F>?%-30%
- <EFBFBD>?蝻箏仃<E7AE8F><EFBFBD>銝撤AR嚗<52><E59A97><EFBFBD>箇撩憭梧<E686AD>
- <EFBFBD>?<3F><EFBFBD><E3979B><EFBFBD><E6B8B8>函㮾<E587BD><EFBFBD>?
- <EFBFBD>?<3F><><EFBFBD><E996AC><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>鸌敺?
**隡睃飵**:
- 利用变量间关系
- <EFBFBD>拍鍂<EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝟?
- 靽脲<E99DBD><E884B2>唳旿<E594B3><E697BF><EFBFBD>
- <20><EFBFBD><E8AAA9>誩榆
- 统计学上更合理
- 蝏蠘恣摮虫<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
**vs 简单填补**:
| 指标 | 简单填补 | 多重插补 |
**vs <EFBFBD><EFBFBD>銵?*:
| <EFBFBD><EFBFBD><EFBFBD> | 蝞<><E89D9E>銵?| 憭𡁻<E686AD><F0A181BB> |
|------|---------|---------|
| 复杂度 | 低 | 中等 |
| 计算时间 | 快 | 较慢 |
| 憭齿<EFBFBD>摨?| 雿?| 銝剔<E98A9D> |
| 霈∠<EFBFBD><EFBFBD>園𡢿 | 敹?| 颲<><E9A2B2> |
| <20>誩榆 | <20><EFBFBD><EFBFBD>之 | 颲<><E9A2B2> |
| 标准误 | 低估 | 准确 |
| 分布保持 | 差 | 好 |
| <EFBFBD><EFBFBD><EFBFBD>霂?| 雿𦒘摯 | <20><> |
| <EFBFBD><EFBFBD><EFBFBD>靽脲<EFBFBD> | 撌?| 憟?|
**瘜冽<E7989C>鈭钅★**:
- 仅适用于数值变量
- 分类变量需先编码
- 如有时间变量需先转换
- <EFBFBD><EFBFBD><EFBFBD>鍂鈭擧㺭<EFBFBD><EFBFBD><EFBFBD>?
- <EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
- <EFBFBD><EFBFBD><EFBFBD>園𡢿<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
---
@@ -384,7 +384,7 @@ df[cols] = imputer.fit_transform(df[cols])
**AI<41><49><EFBFBD><EFBFBD><E99A9E>**:
```python
# 智能去重先按日期排序再按ID去重保留最后一条
# <EFBFBD><EFBFBD><EFBFBD><EFBFBD>嚗𡁜<EFBFBD><EFBFBD>㗇𠯫<EFBFBD><EFBFBD><EFBFBD>摨𧶏<EFBFBD><EFBFBD>齿<EFBFBD>ID<EFBFBD><EFBFBD>靽萘<EFBFBD><EFBFBD><EFBFBD><EFBFBD>𦒘<EFBFBD><EFBFBD>?
df['check_date'] = pd.to_datetime(df['check_date'])
df = df.sort_values('check_date').drop_duplicates(subset=['patient_id'], keep='last')
```
@@ -395,19 +395,19 @@ df = df.sort_values('check_date').drop_duplicates(subset=['patient_id'], keep='l
- keep='last'靽萘<E99DBD><E89098><EFBFBD><EFBFBD>𦒘<EFBFBD><F0A69298><EFBFBD><E288B4><EFBFBD><E596AE>唳𠯫<E594B3><F0A0AFAB><EFBFBD>
**憸<><E686B8>蝏𤘪<E89D8F>**:
- 每个患者只保留一条记录
- 瘥譍葵<EFBFBD><EFBFBD><EFBFBD><EFBFBD>蘨靽萘<EFBFBD><EFBFBD><EFBFBD>∟扇敶?
- 靽萘<E99DBD><E89098><EFBFBD>糓璉<E7B393><E79289>交𠯫<E4BAA4><F0A0AFAB><EFBFBD><EFBFBD><EFBFBD><E59581><EFBFBD>
**<EFBFBD><EFBFBD><EFBFBD>箸艶**:
**场景1: 保留数据最完整的记录**
**<EFBFBD>箸艶1: 靽萘<E99DBD><E89098>唳旿<E594B3><E697BF>摰峕㟲<E5B395><E39FB2>扇敶?*
```python
# 霈∠<E99C88>瘥讛<E798A5><E8AE9B><EFBFBD><EFBFBD><EFBFBD>游漲
df['completeness'] = df.notna().sum(axis=1)
df = df.sort_values('completeness', ascending=False).drop_duplicates(subset=['patient_id'], keep='first')
```
**场景2: 多字段组合去重**
**<EFBFBD>箸艶2: 憭𡁜<E686AD>畾萇<E795BE><E89087><EFBFBD><EFBFBD>?*
```python
# <20><EFBFBD><E39787><EFBFBD>D+撠梯<E692A0><E6A2AF><EFBFBD><E4BAA4><EFBFBD>
df = df.drop_duplicates(subset=['patient_id', 'visit_date'], keep='first')
@@ -415,13 +415,13 @@ df = df.drop_duplicates(subset=['patient_id', 'visit_date'], keep='first')
**<EFBFBD>箸艶3: 憭齿<E686AD><E9BDBF><EFBFBD><E9A489><EFBFBD>**
```python
# 优先级:日期最新 > 完整度最高
# 隡睃<EFBFBD>蝥改<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?> 摰峕㟲摨行<E691A8>擃?
df = df.sort_values(['check_date', 'completeness'], ascending=[False, False]).drop_duplicates(subset=['patient_id'], keep='first')
```
**<EFBFBD><EFBFBD><EFBFBD>箸艶**:
- 删除重复录入的病例
- 多次就诊取首次/末次
- <EFBFBD>𣳇膄<EFBFBD><EFBFBD>敶訫<EFBFBD><EFBFBD><EFBFBD><EFBFBD>靘?
- 憭𡁏活撠梯<EFBFBD><EFBFBD><EFBFBD>甈?<3F>急活
-<>撉𣬚<E69289><F0A3AC9A>𨅯縧<F0A885AF><EFBFBD><E3B5AA>𡝗<EFBFBD><F0A19D97><EFBFBD>
---
@@ -432,44 +432,44 @@ df = df.sort_values(['check_date', 'completeness'], ascending=[False, False]).dr
```python
system_prompt = f"""
你是医疗科研数据清洗专家负责生成Pandas代码来清洗整理数据。
雿䭾糓<EFBFBD><EFBFBD>蝘𤑳<EFBFBD><EFBFBD>唳旿皜<EFBFBD><EFBFBD>銝枏振嚗諹<EFBFBD><EFBFBD><EFBFBD><EFBFBD>辥andas隞<EFBFBD><EFBFBD><EFBFBD><EFBFBD>瘣埈㟲<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?
## 当前数据集信息
- 文件名: {session.fileName}
## 敶枏<EFBFBD><EFBFBD>唳旿<EFBFBD><EFBFBD><EFBFBD>?
- <EFBFBD><EFBFBD><EFBFBD>? {session.fileName}
- 銵峕㺭: {session.totalRows}
- <20>埈㺭: {session.totalCols}
- <20><EFBFBD>: {', '.join(session.columns)}
## 摰匧<E691B0><EFBFBD><E996AB><EFBFBD><EFBFBD><EFBFBD>
1. <20><EFBFBD><E88ABE><EFBFBD>df<64><EFBFBD>
2. 禁止导入os、sys等危险模块
3. 禁止使用eval、exec等危险函数
2. <EFBFBD>迫撖澆<EFBFBD>os<EFBFBD><EFBFBD>ys蝑匧暒<EFBFBD>拇芋<EFBFBD>?
3. <EFBFBD>迫雿輻鍂eval<EFBFBD><EFBFBD>xec蝑匧暒<EFBFBD>拙遆<EFBFBD>?
4. 敹<>◆餈𥡝<E9A488><EFBFBD>虜憭<E8999C><E686AD>
5. 餈𥪜<E9A488><F0A5AA9C><EFBFBD>: {{"code": "...", "explanation": "..."}}
## Few-shot蝷箔<E89DB7>
### 示例1: 统一缺失值标记
用户: 把所有代表缺失的符号统一替换为标准空值
### 蝷箔<EFBFBD>1: 蝏煺<E89D8F>蝻箏仃<E7AE8F><EFBFBD>霈?
<EFBFBD><EFBFBD>: <20>𦠜<EFBFBD><F0A6A09C>劐誨銵函撩憭梁<E686AD>蝚血噡蝏煺<E89D8F><E785BA>踵揢銝箸<E98A9D><E7AEB8><EFBFBD><EFBFBD>?
<EFBFBD><EFBFBD>:
```python
df = df.replace(['-', '銝滩祕', 'NA', 'N/A'], np.nan)
```
### 蝷箔<E89DB7>2: <20><EFBFBD><EFBFBD><EFBFBD><E79A9C>
用户: 把肌酐列里的非数字符号去掉,转为数值类型
<EFBFBD><EFBFBD>: <20>𡃏<EFBFBD><F0A1838F>𣂼<EFBFBD><F0A382BC>𣬚<EFBFBD><F0A3AC9A>墧㺭摮㛖泵<E39B96>瑕縧<E79195><EFBFBD>頧砌蛹<E7A08C><EFBFBD>潛掩<E6BD9B>?
<EFBFBD><EFBFBD>:
```python
df['creatinine'] = df['creatinine'].astype(str).str.replace('>', '').str.replace('<', '')
df['creatinine'] = pd.to_numeric(df['creatinine'], errors='coerce')
```
[... 其他8个示例 ...]
[... <EFBFBD><EFBFBD>8銝芰內靘?...]
## <20><EFBFBD>敶枏<E695B6>霂瑟<E99C82>
{user_message}
请生成代码并解释。
霂瑞<EFBFBD><EFBFBD>𣂷誨<EFBFBD><EFBFBD>僎閫<EFBFBD><EFBFBD><EFBFBD>?
"""
```
@@ -477,55 +477,56 @@ df['creatinine'] = pd.to_numeric(df['creatinine'], errors='coerce')
## <20>㴓 韐券<E99F90><E588B8><EFBFBD><EFBFBD>
每个示例必须满足:
- ✅ 代码可直接运行
- ✅ 有详细注释
- ✅ 有明确的输入输出
- ✅ 符合Python最佳实践
- ✅ 考虑异常情况
- ✅ 有医疗场景说明
瘥譍葵蝷箔<EFBFBD><EFBFBD>◆皛∟雲嚗?
- <EFBFBD>?隞<><E99A9E><EFBFBD>舐凒<E88890><EFBFBD>銵?
- <EFBFBD>?<3F>㕑祕蝏<E7A595><EFBFBD>?
- <EFBFBD>?<3F><EFBFBD>蝖桃<E89D96>颲枏<E9A2B2>颲枏枂
- <EFBFBD>?蝚血<E89D9A>Python<EFBFBD><EFBFBD>雿喳<EFBFBD>頝?
- <EFBFBD>?<3F><><EFBFBD><EFBFBD><EFBFBD><E8999C><EFBFBD>
- <EFBFBD>?<3F>匧龫<E58CA7>堒㦤<E5A092>航秩<E888AA>?
---
## <20><> 瘚贝<E7989A><E8B49D><EFBFBD>霈曇恣
基于这10个示例Day 3测试应包含
<EFBFBD><EFBFBD>餈?0銝芰內靘页<E99D98>Day 3瘚贝<E7989A>摨𥪜<E691A8><F0A5AA9C><EFBFBD>
**基础测试4个**:
1. 示例1测试缺失值统一
**<EFBFBD><EFBFBD>瘚贝<EFBFBD>嚗?銝迎<E98A9D>**:
1. 蝷箔<EFBFBD>1瘚贝<EFBFBD><EFBFBD>撩憭勗<EFBFBD><EFBFBD><EFBFBD>嚗?
2. 蝷箔<E89DB7>2瘚贝<E7989A><EFBFBD><EFBFBD><EFBFBD>瘣梹<E798A3>
3. 示例3测试性别编码
3. 蝷箔<EFBFBD>3瘚贝<EFBFBD><EFBFBD><EFBFBD><EFBFBD>蝻𣇉<EFBFBD>嚗?
4. 蝷箔<E89DB7>4瘚贝<E7989A><EFBFBD>僑樴<E58391><E6A8B4><EFBFBD><E89D8F>
**中级测试3个**:
5. 示例5测试BMI计算
**銝剔漣瘚贝<EFBFBD>嚗?銝迎<E98A9D>**:
5. 蝷箔<EFBFBD>5瘚贝<EFBFBD>嚗㇂MI霈∠<EFBFBD>嚗?
6. 蝷箔<E89DB7>6瘚贝<E7989A><EFBFBD><E59A97><EFBFBD><EFBCB7><EFBFBD>
7. 蝷箔<E89DB7>7瘚贝<E7989A><EFBFBD>辺隞嗥<E99A9E><E597A5><EFBFBD>
**高级测试3个**:
8. 示例8测试中位数填补
9. 示例9测试多重插补
**擃条漣瘚贝<EFBFBD>嚗?銝迎<E98A9D>**:
8. 蝷箔<EFBFBD>8瘚贝<EFBFBD><EFBFBD>葉雿齿㺭憛怨嚗?
9. 蝷箔<EFBFBD>9瘚贝<EFBFBD><EFBFBD><EFBFBD><EFBFBD>齿<EFBFBD>銵伐<EFBFBD>潃?
10. 蝷箔<E89DB7>10瘚贝<E7989A><EFBFBD><EFBFBD>賢縧<E8B3A2><EFBFBD>
**扩展测试5个**:
**<EFBFBD><EFBFBD>瘚贝<EFBFBD>嚗?銝迎<E98A9D>**:
11. 瘛瑕<E7989B><E79195>箸艶瘚贝<E7989A><EFBFBD><E59A97><EFBFBD><E79A9C><EFBFBD>滩恣蝞梹<E89D9E>
12. <20>躰秤<E8BAB0>箸艶瘚贝<E7989A><EFBFBD><E59A97>銝滚<E98A9D><E6BB9A><EFBFBD>
13. 颲寧<E9A2B2><E5AFA7>箸艶瘚贝<E7989A><EFBFBD><E59A97><EFBFBD>函撩憭梧<E686AD>
14. 自我修正测试(代码报错后重试)
14. <EFBFBD><EFBFBD>靽格迤瘚贝<EFBFBD><EFBFBD><EFBFBD><EFBFBD>𥁒<EFBFBD><EFBFBD><EFBFBD><EFBFBD>嚗?
15. 蝡臬<E89DA1>蝡舀<E89DA1>霂𤏪<E99C82>銝𠹺<E98A9D><F0A0B9BA>𡒶I憭<49><E686AD><EFBFBD><EFBFBD><E59E8D>𣈯<EFBFBD><EFBFBD><E99C82>
---
## <20><> 蝏湔擪霈啣<E99C88>
| 日期 | 版本 | 修改内容 | 修改人 |
| <EFBFBD><EFBFBD> | <20><>𧋦 | 靽格㺿<E6A0BC><E3BABF>捆 | 靽格㺿鈭?|
|------|------|---------|--------|
| 2025-12-06 | V1.0 | 初始创建10个核心示例 | AI Assistant |
| 2025-12-06 | V1.0 | <EFBFBD><EFBFBD><EFBFBD>𥕦遣嚗?0銝芣瓲敹<E793B2>內靘?| AI Assistant |
---
**文档状态**: ✅ 已确认
**下一步**: 开始Day 3开发AICodeService实现)
**<EFBFBD><EFBFBD><EFBFBD><EFBFBD>?*: <20>?撌脩&霈?
**銝衤<EFBFBD>甇?*: 撘<>憪𠵿ay 3撘<33><E69298>𡢅<EFBFBD>AICodeService摰䂿緵嚗?