Files
AIclinicalresearch/docs/03-业务模块/DC-数据清洗整理/04-开发计划/工具C_Pivot列顺序优化总结.md
HaHafeng b64896a307 feat(deploy): Complete PostgreSQL migration and Docker image build
Summary:
- PostgreSQL database migration to RDS completed (90MB SQL, 11 schemas)
- Frontend Nginx Docker image built and pushed to ACR (v1.0, ~50MB)
- Python microservice Docker image built and pushed to ACR (v1.0, 1.12GB)
- Created 3 deployment documentation files

Docker Configuration Files:
- frontend-v2/Dockerfile: Multi-stage build with nginx:alpine
- frontend-v2/.dockerignore: Optimize build context
- frontend-v2/nginx.conf: SPA routing and API proxy
- frontend-v2/docker-entrypoint.sh: Dynamic env injection
- extraction_service/Dockerfile: Multi-stage build with Aliyun Debian mirror
- extraction_service/.dockerignore: Optimize build context
- extraction_service/requirements-prod.txt: Production dependencies (removed Nougat)

Deployment Documentation:
- docs/05-部署文档/00-部署进度总览.md: One-stop deployment status overview
- docs/05-部署文档/07-前端Nginx-SAE部署操作手册.md: Frontend deployment guide
- docs/05-部署文档/08-PostgreSQL数据库部署操作手册.md: Database deployment guide
- docs/00-系统总体设计/00-系统当前状态与开发指南.md: Updated with deployment status

Database Migration:
- RDS instance: pgm-2zex1m2y3r23hdn5 (2C4G, PostgreSQL 15.0)
- Database: ai_clinical_research
- Schemas: 11 business schemas migrated successfully
- Data: 3 users, 2 projects, 1204 literatures verified
- Backup: rds_init_20251224_154529.sql (90MB)

Docker Images:
- Frontend: crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/ai-clinical_frontend-nginx:v1.0
- Python: crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:v1.0

Key Achievements:
- Resolved Docker Hub network issues (using generic tags)
- Fixed 30 TypeScript compilation errors
- Removed Nougat OCR to reduce image size by 1.5GB
- Used Aliyun Debian mirror to resolve apt-get network issues
- Implemented multi-stage builds for optimization

Next Steps:
- Deploy Python microservice to SAE
- Build Node.js backend Docker image
- Deploy Node.js backend to SAE
- Deploy frontend Nginx to SAE
- End-to-end verification testing

Status: Docker images ready, SAE deployment pending
2025-12-24 18:21:55 +08:00

5.6 KiB
Raw Blame History

工具C - Pivot列顺序优化总结

📋 问题描述

用户需求:长宽转换后,列的排序应该与上传文件时的列顺序保持一致。

当前问题:系统按字母顺序排列转换后的列,导致顺序与原文件不一致。


🎯 解决方案方案A - Python端排序

核心思路

  1. Node.js后端从session获取原始列顺序
  2. Node.js后端从数据中提取透视列值的原始顺序(按首次出现顺序)
  3. 传递给Python
  4. Python在pivot后按原始顺序重排列

🛠️ 实现细节

1. Python端pivot.py

新增参数

  • original_column_order: List[str]:原始列顺序(如['Record ID', 'Event Name', 'FMA', '体重', '收缩压', ...]
  • pivot_value_order: List[str]:透视列值的原始顺序(如['基线', '1个月', '2个月', ...]

排序逻辑

if original_column_order:
    # 1. 索引列始终在最前面
    final_cols = [index_column]
    
    # 2. 按原始列顺序添加转换后的列
    for orig_col in original_column_order:
        if orig_col in value_columns:
            # 找出所有属于这个原列的新列
            related_cols = [c for c in df_pivot.columns if c.startswith(f'{orig_col}___')]
            
            # ✨ 按透视列的原始顺序排序
            if pivot_value_order:
                pivot_order_map = {val: idx for idx, val in enumerate(pivot_value_order)}
                related_cols_sorted = sorted(
                    related_cols,
                    key=lambda c: pivot_order_map.get(c.split('___')[1], 999)
                )
            else:
                related_cols_sorted = sorted(related_cols)
            
            final_cols.extend(related_cols_sorted)
    
    # 3. 添加未选择的列(保持原始顺序)
    if keep_unused_columns:
        for orig_col in original_column_order:
            if orig_col in df_pivot.columns and orig_col not in final_cols:
                final_cols.append(orig_col)
    
    # 4. 重排列
    df_pivot = df_pivot[final_cols]

2. Python端main.py

PivotRequest模型

class PivotRequest(BaseModel):
    # ... 原有字段 ...
    original_column_order: List[str] = []  # ✨ 新增
    pivot_value_order: List[str] = []  # ✨ 新增

调用pivot_long_to_wide

result_df = pivot_long_to_wide(
    df,
    request.index_column,
    request.pivot_column,
    request.value_columns,
    request.aggfunc,
    request.column_mapping,
    request.keep_unused_columns,
    request.unused_agg_method,
    request.original_column_order,  # ✨ 新增
    request.pivot_value_order  # ✨ 新增
)

3. Node.js后端QuickActionController.ts

获取原始列顺序

const originalColumnOrder = session.columns || [];

获取透视列值的原始顺序

const pivotColumn = params.pivotColumn;
const seenPivotValues = new Set();
const pivotValueOrder: string[] = [];

for (const row of fullData) {
  const pivotValue = row[pivotColumn];
  if (pivotValue !== null && pivotValue !== undefined && !seenPivotValues.has(pivotValue)) {
    seenPivotValues.add(pivotValue);
    pivotValueOrder.push(String(pivotValue));
  }
}

传递给QuickActionService

executeResult = await quickActionService.executePivot(
  fullData, 
  params, 
  session.columnMapping,
  originalColumnOrder,  // ✨ 新增
  pivotValueOrder  // ✨ 新增
);

4. Node.js后端QuickActionService.ts

方法签名

async executePivot(
  data: any[], 
  params: PivotParams, 
  columnMapping?: any[], 
  originalColumnOrder?: string[],  // ✨ 新增
  pivotValueOrder?: string[]  // ✨ 新增
): Promise<OperationResult>

传递给Python

const response = await axios.post(`${PYTHON_SERVICE_URL}/api/operations/pivot`, {
  // ... 原有参数 ...
  original_column_order: originalColumnOrder || [],  // ✨ 新增
  pivot_value_order: pivotValueOrder || [],  // ✨ 新增
});

📊 效果对比

修改前(按字母顺序)

Record ID | FMA___基线 | FMA___1个月 | 收缩压___基线 | 收缩压___1个月 | 体重___基线 | 体重___1个月
    ↑          ↑             ↑              ↑               ↑            ↑           ↑
  索引列      F开头        F开头           S开头(拼音)     S开头        T开头       T开头

修改后(按原始顺序)

Record ID | FMA___基线 | FMA___1个月 | 体重___基线 | 体重___1个月 | 收缩压___基线 | 收缩压___1个月
    ↑          ↑             ↑            ↑             ↑             ↑              ↑
  索引列    原文件第3列   原文件第3列   原文件第4列   原文件第4列   原文件第5列    原文件第5列

透视值内部顺序(按原始出现顺序)

FMA___基线 | FMA___1个月 | FMA___2个月
    ↑           ↑             ↑
  首次出现    第二次出现    第三次出现
(而不是按"1个月"、"2个月"、"基线"的字母顺序)

开发完成

修改文件清单

  1. extraction_service/operations/pivot.py
  2. extraction_service/main.py
  3. backend/src/modules/dc/tool-c/controllers/QuickActionController.ts
  4. backend/src/modules/dc/tool-c/services/QuickActionService.ts

优势

  • 列顺序与原文件一致(用户熟悉)
  • 透视值顺序按时间顺序基线→1个月→2个月
  • 未选择的列也保持原始顺序
  • 导出Excel时顺序正确

开发时间2025-12-09
状态 已完成,等待测试