Files
AIclinicalresearch/python-microservice/operations/filter.py
HaHafeng 5f089516cb feat(iit-manager): Day 3 企业微信集成开发完成
- 新增WechatService(企业微信推送服务,支持文本/卡片/Markdown消息)
- 新增WechatCallbackController(异步回复模式,5秒内响应)
- 完善iit_quality_check Worker(调用WechatService推送通知)
- 新增企业微信回调路由(GET验证+POST接收消息)
- 实现LLM意图识别(query_weekly_summary/query_patient_info等)
- 安装依赖:@wecom/crypto, xml2js
- 更新开发记录文档和MVP开发计划

技术要点:
- 使用异步回复模式规避企业微信5秒超时限制
- 使用@wecom/crypto官方库处理XML加解密
- 使用setImmediate实现后台异步处理
- 支持主动推送消息返回LLM处理结果
- 完善审计日志记录(WECHAT_NOTIFICATION_SENT/WECHAT_INTERACTION)

相关文档:
- docs/03-业务模块/IIT Manager Agent/06-开发记录/Day3-企业微信集成开发完成记录.md
- docs/03-业务模块/IIT Manager Agent/04-开发计划/最小MVP闭环开发计划.md
- docs/03-业务模块/IIT Manager Agent/00-模块当前状态与开发指南.md
2026-01-03 09:39:39 +08:00

142 lines
3.4 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""
高级筛选操作
提供多条件筛选功能支持AND/OR逻辑组合。
"""
import pandas as pd
from typing import List, Dict, Any, Literal
def apply_filter(
df: pd.DataFrame,
conditions: List[Dict[str, Any]],
logic: Literal['and', 'or'] = 'and'
) -> pd.DataFrame:
"""
应用筛选条件
Args:
df: 输入数据框
conditions: 筛选条件列表,每个条件包含:
- column: 列名
- operator: 运算符 (=, !=, >, <, >=, <=, contains, not_contains,
starts_with, ends_with, is_null, not_null)
- value: 值is_null和not_null不需要
logic: 逻辑组合方式 ('and''or')
Returns:
筛选后的数据框
Examples:
>>> df = pd.DataFrame({'年龄': [25, 35, 45], '性别': ['', '', '']})
>>> conditions = [
... {'column': '年龄', 'operator': '>', 'value': 30},
... {'column': '性别', 'operator': '=', 'value': ''}
... ]
>>> result = apply_filter(df, conditions, logic='and')
>>> len(result)
1
"""
if not conditions:
raise ValueError('筛选条件不能为空')
if df.empty:
return df
# 生成各个条件的mask
masks = []
for cond in conditions:
column = cond['column']
operator = cond['operator']
value = cond.get('value')
# 验证列是否存在
if column not in df.columns:
raise KeyError(f"'{column}' 不存在")
# 根据运算符生成mask
if operator == '=':
mask = df[column] == value
elif operator == '!=':
mask = df[column] != value
elif operator == '>':
mask = df[column] > value
elif operator == '<':
mask = df[column] < value
elif operator == '>=':
mask = df[column] >= value
elif operator == '<=':
mask = df[column] <= value
elif operator == 'contains':
mask = df[column].astype(str).str.contains(str(value), na=False)
elif operator == 'not_contains':
mask = ~df[column].astype(str).str.contains(str(value), na=False)
elif operator == 'starts_with':
mask = df[column].astype(str).str.startswith(str(value), na=False)
elif operator == 'ends_with':
mask = df[column].astype(str).str.endswith(str(value), na=False)
elif operator == 'is_null':
mask = df[column].isna()
elif operator == 'not_null':
mask = df[column].notna()
else:
raise ValueError(f"不支持的运算符: {operator}")
masks.append(mask)
# 组合所有条件
if logic == 'and':
final_mask = pd.concat(masks, axis=1).all(axis=1)
elif logic == 'or':
final_mask = pd.concat(masks, axis=1).any(axis=1)
else:
raise ValueError(f"不支持的逻辑运算: {logic}")
# 应用筛选
result = df[final_mask].copy()
# 打印统计信息
original_rows = len(df)
filtered_rows = len(result)
removed_rows = original_rows - filtered_rows
print(f'原始数据: {original_rows}')
print(f'筛选后: {filtered_rows}')
print(f'删除: {removed_rows} 行 ({removed_rows/original_rows*100:.1f}%)')
return result