Files
AIclinicalresearch/python-microservice/operations/recode.py
HaHafeng b64896a307 feat(deploy): Complete PostgreSQL migration and Docker image build
Summary:
- PostgreSQL database migration to RDS completed (90MB SQL, 11 schemas)
- Frontend Nginx Docker image built and pushed to ACR (v1.0, ~50MB)
- Python microservice Docker image built and pushed to ACR (v1.0, 1.12GB)
- Created 3 deployment documentation files

Docker Configuration Files:
- frontend-v2/Dockerfile: Multi-stage build with nginx:alpine
- frontend-v2/.dockerignore: Optimize build context
- frontend-v2/nginx.conf: SPA routing and API proxy
- frontend-v2/docker-entrypoint.sh: Dynamic env injection
- extraction_service/Dockerfile: Multi-stage build with Aliyun Debian mirror
- extraction_service/.dockerignore: Optimize build context
- extraction_service/requirements-prod.txt: Production dependencies (removed Nougat)

Deployment Documentation:
- docs/05-部署文档/00-部署进度总览.md: One-stop deployment status overview
- docs/05-部署文档/07-前端Nginx-SAE部署操作手册.md: Frontend deployment guide
- docs/05-部署文档/08-PostgreSQL数据库部署操作手册.md: Database deployment guide
- docs/00-系统总体设计/00-系统当前状态与开发指南.md: Updated with deployment status

Database Migration:
- RDS instance: pgm-2zex1m2y3r23hdn5 (2C4G, PostgreSQL 15.0)
- Database: ai_clinical_research
- Schemas: 11 business schemas migrated successfully
- Data: 3 users, 2 projects, 1204 literatures verified
- Backup: rds_init_20251224_154529.sql (90MB)

Docker Images:
- Frontend: crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/ai-clinical_frontend-nginx:v1.0
- Python: crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:v1.0

Key Achievements:
- Resolved Docker Hub network issues (using generic tags)
- Fixed 30 TypeScript compilation errors
- Removed Nougat OCR to reduce image size by 1.5GB
- Used Aliyun Debian mirror to resolve apt-get network issues
- Implemented multi-stage builds for optimization

Next Steps:
- Deploy Python microservice to SAE
- Build Node.js backend Docker image
- Deploy Node.js backend to SAE
- Deploy frontend Nginx to SAE
- End-to-end verification testing

Status: Docker images ready, SAE deployment pending
2025-12-24 18:21:55 +08:00

100 lines
2.3 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""
数值映射(重编码)操作
将分类变量的原始值映射为新值男→1女→2
"""
import pandas as pd
from typing import Dict, Any, Optional
def apply_recode(
df: pd.DataFrame,
column: str,
mapping: Dict[Any, Any],
create_new_column: bool = True,
new_column_name: Optional[str] = None
) -> pd.DataFrame:
"""
应用数值映射
Args:
df: 输入数据框
column: 要重编码的列名
mapping: 映射字典,如 {'': 1, '': 2}
create_new_column: 是否创建新列True或覆盖原列False
new_column_name: 新列名create_new_column=True时使用
Returns:
重编码后的数据框
Examples:
>>> df = pd.DataFrame({'性别': ['', '', '', '']})
>>> mapping = {'': 1, '': 2}
>>> result = apply_recode(df, '性别', mapping, True, '性别_编码')
>>> result['性别_编码'].tolist()
[1, 2, 1, 2]
"""
if df.empty:
return df
# 验证列是否存在
if column not in df.columns:
raise KeyError(f"'{column}' 不存在")
if not mapping:
raise ValueError('映射字典不能为空')
# 确定目标列名
if create_new_column:
target_column = new_column_name or f'{column}_编码'
else:
target_column = column
# 创建结果数据框(避免修改原数据)
result = df.copy()
# 应用映射
result[target_column] = result[column].map(mapping)
# 统计结果
mapped_count = result[target_column].notna().sum()
unmapped_count = result[target_column].isna().sum()
total_count = len(result)
print(f'映射完成: {mapped_count} 个值成功映射')
if unmapped_count > 0:
print(f'警告: {unmapped_count} 个值未找到对应映射')
# 找出未映射的唯一值
unmapped_mask = result[target_column].isna()
unmapped_values = result.loc[unmapped_mask, column].unique()
print(f'未映射的值: {list(unmapped_values)[:10]}') # 最多显示10个
# 映射成功率
success_rate = (mapped_count / total_count * 100) if total_count > 0 else 0
print(f'映射成功率: {success_rate:.1f}%')
return result