Files
AIclinicalresearch/docs/05-部署文档/0126部署/03-Python服务更新方案.md
HaHafeng 2481b786d8 deploy: Complete 0126-27 deployment - database upgrade, services update, code recovery
Major Changes:
- Database: Install pg_bigm/pgvector plugins, create test database
- Python service: v1.0 -> v1.1, add pymupdf4llm/openpyxl/pypandoc
- Node.js backend: v1.3 -> v1.7, fix pino-pretty and ES Module imports
- Frontend: v1.2 -> v1.3, skip TypeScript check for deployment
- Code recovery: Restore empty files from local backup

Technical Fixes:
- Fix pino-pretty error in production (conditional loading)
- Fix ES Module import paths (add .js extensions)
- Fix OSSAdapter TypeScript errors
- Update Prisma Schema (63 models, 16 schemas)
- Update environment variables (DATABASE_URL, EXTRACTION_SERVICE_URL, OSS)
- Remove deprecated variables (REDIS_URL, DIFY_API_URL, DIFY_API_KEY)

Documentation:
- Create 0126 deployment folder with 8 documents
- Update database development standards v2.0
- Update SAE deployment status records

Deployment Status:
- PostgreSQL: ai_clinical_research_test with plugins
- Python: v1.1 @ 172.17.173.84:8000
- Backend: v1.7 @ 172.17.173.89:3001
- Frontend: v1.3 @ 172.17.173.90:80

Tested: All services running successfully on SAE
2026-01-27 08:13:27 +08:00

433 lines
9.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🐍 Python微服务更新方案
> **文档版本**v1.0
> **创建日期**2026-01-26
> **适用范围**extraction_service Python微服务
> **变更类型**:依赖更新 + 镜像重建
---
## 📋 一、变更概述
### 1.1 变更内容
| 变更项 | 描述 | 优先级 |
|--------|------|--------|
| **新增pypdf** | PDF处理增强库 | 🟡 中 |
| **新增pypandoc** | 文档格式转换库 | 🟡 中 |
| **移除Nougat** | 已确认移除v1.0已完成) | ✅ 已完成 |
| **镜像重建** | 构建v1.1版本镜像 | 🔴 高 |
### 1.2 当前状态
```yaml
服务名称: python-extraction-test
当前版本: v1.0
镜像大小: 1.12GB
内网地址: http://172.17.173.66:8000
主要依赖:
- PyMuPDF: 1.24.0+
- pdfplumber: 0.10.3
- mammoth: 1.6.0
- pandas: 2.0+
- polars: 0.19+
```
### 1.3 目标状态
```yaml
目标版本: v1.1
预计镜像大小: ~1.2GB
新增依赖:
- pypdf: latest
- pypandoc: latest
```
---
## 📦 二、依赖更新
### 2.1 新增依赖说明
#### pypdfPDF处理增强
```python
# 主要功能
- PDF文本提取
- PDF合并分割
- PDF元数据读取
- 与PyMuPDF互补
# 安装
pip install pypdf
```
#### pypandoc文档格式转换
```python
# 主要功能
- Markdown Word/HTML/PDF转换
- 支持多种文档格式
- 高质量文档转换
# 安装
pip install pypandoc
# 注意需要安装Pandoc系统依赖
# Docker中需要apt-get install -y pandoc
```
### 2.2 更新requirements-prod.txt
当前内容:
```txt
# ========================================
# 生产环境依赖 (移除Nougat和重量级依赖)
# ========================================
# Web框架
fastapi==0.104.1
uvicorn[standard]==0.24.0
python-multipart==0.0.6
# 数据处理 (DC工具必需)
pandas>=2.0.0
numpy>=1.24.0
polars>=0.19.0
# PDF处理 (核心轻量级库)
PyMuPDF>=1.24.0
pdfplumber==0.10.3
# Docx处理
mammoth==1.6.0
python-docx==1.1.0
# 语言检测
langdetect==1.0.9
# 编码检测
chardet==5.2.0
# 工具
python-dotenv==1.0.0
pydantic>=2.10.0
# 日志
loguru==0.7.2
# 测试工具
requests==2.31.0
```
**需要添加**
```txt
# PDF处理增强
pypdf>=4.0.0
# 文档格式转换
pypandoc>=1.13
```
---
## 🔧 三、Dockerfile更新
### 3.1 当前Dockerfile检查
首先检查是否需要添加Pandoc系统依赖。
### 3.2 更新Dockerfile如需要
如果使用pypandoc需要在Dockerfile中添加Pandoc安装
```dockerfile
# 在apt-get install阶段添加
RUN apt-get update && apt-get install -y \
pandoc \
&& rm -rf /var/lib/apt/lists/*
```
完整Dockerfile示例
```dockerfile
# 基础镜像
FROM python:3.11-slim
# 设置工作目录
WORKDIR /app
# 安装系统依赖包括Pandoc
RUN apt-get update && apt-get install -y \
libgl1-mesa-glx \
libglib2.0-0 \
pandoc \
&& rm -rf /var/lib/apt/lists/*
# 使用阿里云PyPI镜像加速
RUN pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
# 复制依赖文件
COPY requirements-prod.txt .
# 安装Python依赖
RUN pip install --no-cache-dir -r requirements-prod.txt
# 复制应用代码
COPY . .
# 暴露端口
EXPOSE 8000
# 启动命令
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
```
---
## 🚀 四、构建与部署步骤
### Step 1更新requirements-prod.txt
```bash
cd D:\MyCursor\AIclinicalresearch\extraction_service
# 编辑requirements-prod.txt添加新依赖
# pypdf>=4.0.0
# pypandoc>=1.13
```
### Step 2本地测试可选
```bash
# 创建虚拟环境测试
python -m venv test_venv
test_venv\Scripts\activate
pip install -r requirements-prod.txt
# 测试导入
python -c "import pypdf; print(pypdf.__version__)"
python -c "import pypandoc; print(pypandoc.get_pandoc_version())"
```
### Step 3构建Docker镜像
```powershell
cd D:\MyCursor\AIclinicalresearch\extraction_service
# 构建镜像
docker build -t python-extraction:v1.1 .
# 预计时间15分钟
# 预计大小约1.2GB
```
### Step 4本地验证镜像
```powershell
# 运行容器测试
docker run --rm -p 8000:8000 python-extraction:v1.1
# 新开终端测试
curl http://localhost:8000/health
curl http://localhost:8000/docs # 查看API文档
```
### Step 5登录ACR
```powershell
docker login --username=gofeng117@163.com `
--password=fengzhibo117 `
crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com
```
### Step 6打标签
```powershell
docker tag python-extraction:v1.1 `
crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:v1.1
```
### Step 7推送到ACR
```powershell
docker push `
crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:v1.1
# 预计时间10分钟镜像约1.1GB
# 成功标志:看到 "digest: sha256:..."
```
### Step 8SAE部署
1. 登录SAEhttps://sae.console.aliyun.com/
2. 进入应用:`python-extraction-test`
3. 点击【部署应用】
4. 配置:
- **镜像地址**:选择 `python-extraction`
- **镜像版本**:选择 `v1.1`
5. 点击【确认】
6. 等待部署完成约5-8分钟
### Step 9验证部署
```bash
# 健康检查
curl http://172.17.173.66:8000/health
# 或通过后端代理测试
curl http://8.140.53.236/api/v1/health
```
---
## 📋 五、一键部署脚本
### PowerShell脚本
创建 `extraction_service/update-and-deploy.ps1`
```powershell
# Python微服务一键更新脚本
# 使用方法: .\update-and-deploy.ps1 v1.1
param(
[Parameter(Mandatory=$true)]
[string]$Version
)
$ErrorActionPreference = "Stop"
Write-Host "========================================" -ForegroundColor Green
Write-Host "开始更新Python微服务到版本: $Version" -ForegroundColor Green
Write-Host "========================================" -ForegroundColor Green
# 1. 构建Docker镜像
Write-Host "`n[1/4] 构建Docker镜像..." -ForegroundColor Cyan
docker build -t python-extraction:$Version .
if ($LASTEXITCODE -ne 0) {
Write-Host "❌ 构建失败!" -ForegroundColor Red
exit 1
}
Write-Host "✅ 镜像构建成功!" -ForegroundColor Green
# 2. 登录ACR
Write-Host "`n[2/4] 登录ACR..." -ForegroundColor Cyan
docker login --username=gofeng117@163.com `
--password=fengzhibo117 `
crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com
if ($LASTEXITCODE -ne 0) {
Write-Host "❌ 登录失败!" -ForegroundColor Red
exit 1
}
# 3. 打标签
Write-Host "`n[3/4] 打标签..." -ForegroundColor Cyan
$ImageUrl = "crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:$Version"
docker tag python-extraction:$Version $ImageUrl
Write-Host "✅ 标签已打!" -ForegroundColor Green
# 4. 推送到ACR
Write-Host "`n[4/4] 推送到ACR..." -ForegroundColor Cyan
Write-Host "推送地址: $ImageUrl" -ForegroundColor Yellow
docker push $ImageUrl
if ($LASTEXITCODE -ne 0) {
Write-Host "❌ 推送失败!" -ForegroundColor Red
exit 1
}
Write-Host "`n========================================" -ForegroundColor Green
Write-Host "✅ Python微服务镜像已推送成功" -ForegroundColor Green
Write-Host "========================================" -ForegroundColor Green
Write-Host "`n下一步操作:" -ForegroundColor Yellow
Write-Host "1. 登录SAE控制台: https://sae.console.aliyun.com/" -ForegroundColor Yellow
Write-Host "2. 进入应用: python-extraction-test" -ForegroundColor Yellow
Write-Host "3. 点击【部署应用】" -ForegroundColor Yellow
Write-Host "4. 选择镜像版本: $Version" -ForegroundColor Yellow
Write-Host "5. 确认部署" -ForegroundColor Yellow
Write-Host "`n镜像地址VPC:" -ForegroundColor Cyan
Write-Host "crpi-cd5ij4pjt65mweeo-vpc.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:$Version" -ForegroundColor Cyan
```
### 使用方法
```powershell
cd D:\MyCursor\AIclinicalresearch\extraction_service
.\update-and-deploy.ps1 v1.1
```
---
## ⚠️ 六、注意事项
### 6.1 Pandoc依赖
- pypandoc需要系统安装Pandoc
- 确保Dockerfile中包含 `apt-get install pandoc`
- 如果不需要文档转换功能可以不安装pypandoc
### 6.2 镜像大小
- 当前v1.01.12GB
- 预计v1.1~1.2GB
- 主要增量pypandoc + Pandoc
### 6.3 兼容性
- pypdf与PyMuPDF可能有功能重叠
- 建议先在开发环境充分测试
---
## 🔄 七、回滚方案
如果v1.1出现问题回滚到v1.0
1. 登录SAE控制台
2. 进入应用:`python-extraction-test`
3. 点击【部署应用】
4. 选择镜像版本:`v1.0`
5. 确认部署
---
## ✅ 八、验证清单
### 部署前验证
- [ ] requirements-prod.txt已更新
- [ ] Dockerfile已更新如需要
- [ ] 本地构建成功
- [ ] 本地运行测试通过
- [ ] ACR登录成功
### 部署后验证
- [ ] SAE部署成功
- [ ] 健康检查通过
- [ ] PDF提取功能正常
- [ ] Docx提取功能正常
- [ ] 新功能测试通过
---
## 📊 九、时间估算
| 步骤 | 预计时间 |
|------|---------|
| 更新依赖文件 | 5分钟 |
| 本地构建镜像 | 15分钟 |
| 推送到ACR | 10分钟 |
| SAE部署 | 10分钟 |
| 验证测试 | 10分钟 |
| **总计** | **50分钟** |
---
> **最后更新**2026-01-26
> **维护人员**:开发团队