deploy: Complete 0126-27 deployment - database upgrade, services update, code recovery
Major Changes: - Database: Install pg_bigm/pgvector plugins, create test database - Python service: v1.0 -> v1.1, add pymupdf4llm/openpyxl/pypandoc - Node.js backend: v1.3 -> v1.7, fix pino-pretty and ES Module imports - Frontend: v1.2 -> v1.3, skip TypeScript check for deployment - Code recovery: Restore empty files from local backup Technical Fixes: - Fix pino-pretty error in production (conditional loading) - Fix ES Module import paths (add .js extensions) - Fix OSSAdapter TypeScript errors - Update Prisma Schema (63 models, 16 schemas) - Update environment variables (DATABASE_URL, EXTRACTION_SERVICE_URL, OSS) - Remove deprecated variables (REDIS_URL, DIFY_API_URL, DIFY_API_KEY) Documentation: - Create 0126 deployment folder with 8 documents - Update database development standards v2.0 - Update SAE deployment status records Deployment Status: - PostgreSQL: ai_clinical_research_test with plugins - Python: v1.1 @ 172.17.173.84:8000 - Backend: v1.7 @ 172.17.173.89:3001 - Frontend: v1.3 @ 172.17.173.90:80 Tested: All services running successfully on SAE
This commit is contained in:
432
docs/05-部署文档/0126部署/03-Python服务更新方案.md
Normal file
432
docs/05-部署文档/0126部署/03-Python服务更新方案.md
Normal file
@@ -0,0 +1,432 @@
|
||||
# 🐍 Python微服务更新方案
|
||||
|
||||
> **文档版本**:v1.0
|
||||
> **创建日期**:2026-01-26
|
||||
> **适用范围**:extraction_service Python微服务
|
||||
> **变更类型**:依赖更新 + 镜像重建
|
||||
|
||||
---
|
||||
|
||||
## 📋 一、变更概述
|
||||
|
||||
### 1.1 变更内容
|
||||
|
||||
| 变更项 | 描述 | 优先级 |
|
||||
|--------|------|--------|
|
||||
| **新增pypdf** | PDF处理增强库 | 🟡 中 |
|
||||
| **新增pypandoc** | 文档格式转换库 | 🟡 中 |
|
||||
| **移除Nougat** | 已确认移除(v1.0已完成) | ✅ 已完成 |
|
||||
| **镜像重建** | 构建v1.1版本镜像 | 🔴 高 |
|
||||
|
||||
### 1.2 当前状态
|
||||
|
||||
```yaml
|
||||
服务名称: python-extraction-test
|
||||
当前版本: v1.0
|
||||
镜像大小: 1.12GB
|
||||
内网地址: http://172.17.173.66:8000
|
||||
主要依赖:
|
||||
- PyMuPDF: 1.24.0+
|
||||
- pdfplumber: 0.10.3
|
||||
- mammoth: 1.6.0
|
||||
- pandas: 2.0+
|
||||
- polars: 0.19+
|
||||
```
|
||||
|
||||
### 1.3 目标状态
|
||||
|
||||
```yaml
|
||||
目标版本: v1.1
|
||||
预计镜像大小: ~1.2GB
|
||||
新增依赖:
|
||||
- pypdf: latest
|
||||
- pypandoc: latest
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📦 二、依赖更新
|
||||
|
||||
### 2.1 新增依赖说明
|
||||
|
||||
#### pypdf(PDF处理增强)
|
||||
|
||||
```python
|
||||
# 主要功能
|
||||
- PDF文本提取
|
||||
- PDF合并、分割
|
||||
- PDF元数据读取
|
||||
- 与PyMuPDF互补
|
||||
|
||||
# 安装
|
||||
pip install pypdf
|
||||
```
|
||||
|
||||
#### pypandoc(文档格式转换)
|
||||
|
||||
```python
|
||||
# 主要功能
|
||||
- Markdown ↔ Word/HTML/PDF转换
|
||||
- 支持多种文档格式
|
||||
- 高质量文档转换
|
||||
|
||||
# 安装
|
||||
pip install pypandoc
|
||||
|
||||
# 注意:需要安装Pandoc系统依赖
|
||||
# Docker中需要:apt-get install -y pandoc
|
||||
```
|
||||
|
||||
### 2.2 更新requirements-prod.txt
|
||||
|
||||
当前内容:
|
||||
```txt
|
||||
# ========================================
|
||||
# 生产环境依赖 (移除Nougat和重量级依赖)
|
||||
# ========================================
|
||||
|
||||
# Web框架
|
||||
fastapi==0.104.1
|
||||
uvicorn[standard]==0.24.0
|
||||
python-multipart==0.0.6
|
||||
|
||||
# 数据处理 (DC工具必需)
|
||||
pandas>=2.0.0
|
||||
numpy>=1.24.0
|
||||
polars>=0.19.0
|
||||
|
||||
# PDF处理 (核心轻量级库)
|
||||
PyMuPDF>=1.24.0
|
||||
pdfplumber==0.10.3
|
||||
|
||||
# Docx处理
|
||||
mammoth==1.6.0
|
||||
python-docx==1.1.0
|
||||
|
||||
# 语言检测
|
||||
langdetect==1.0.9
|
||||
|
||||
# 编码检测
|
||||
chardet==5.2.0
|
||||
|
||||
# 工具
|
||||
python-dotenv==1.0.0
|
||||
pydantic>=2.10.0
|
||||
|
||||
# 日志
|
||||
loguru==0.7.2
|
||||
|
||||
# 测试工具
|
||||
requests==2.31.0
|
||||
```
|
||||
|
||||
**需要添加**:
|
||||
```txt
|
||||
# PDF处理增强
|
||||
pypdf>=4.0.0
|
||||
|
||||
# 文档格式转换
|
||||
pypandoc>=1.13
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 三、Dockerfile更新
|
||||
|
||||
### 3.1 当前Dockerfile检查
|
||||
|
||||
首先检查是否需要添加Pandoc系统依赖。
|
||||
|
||||
### 3.2 更新Dockerfile(如需要)
|
||||
|
||||
如果使用pypandoc,需要在Dockerfile中添加Pandoc安装:
|
||||
|
||||
```dockerfile
|
||||
# 在apt-get install阶段添加
|
||||
RUN apt-get update && apt-get install -y \
|
||||
pandoc \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
```
|
||||
|
||||
完整Dockerfile示例:
|
||||
|
||||
```dockerfile
|
||||
# 基础镜像
|
||||
FROM python:3.11-slim
|
||||
|
||||
# 设置工作目录
|
||||
WORKDIR /app
|
||||
|
||||
# 安装系统依赖(包括Pandoc)
|
||||
RUN apt-get update && apt-get install -y \
|
||||
libgl1-mesa-glx \
|
||||
libglib2.0-0 \
|
||||
pandoc \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# 使用阿里云PyPI镜像加速
|
||||
RUN pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
|
||||
|
||||
# 复制依赖文件
|
||||
COPY requirements-prod.txt .
|
||||
|
||||
# 安装Python依赖
|
||||
RUN pip install --no-cache-dir -r requirements-prod.txt
|
||||
|
||||
# 复制应用代码
|
||||
COPY . .
|
||||
|
||||
# 暴露端口
|
||||
EXPOSE 8000
|
||||
|
||||
# 启动命令
|
||||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 四、构建与部署步骤
|
||||
|
||||
### Step 1:更新requirements-prod.txt
|
||||
|
||||
```bash
|
||||
cd D:\MyCursor\AIclinicalresearch\extraction_service
|
||||
|
||||
# 编辑requirements-prod.txt,添加新依赖
|
||||
# pypdf>=4.0.0
|
||||
# pypandoc>=1.13
|
||||
```
|
||||
|
||||
### Step 2:本地测试(可选)
|
||||
|
||||
```bash
|
||||
# 创建虚拟环境测试
|
||||
python -m venv test_venv
|
||||
test_venv\Scripts\activate
|
||||
pip install -r requirements-prod.txt
|
||||
|
||||
# 测试导入
|
||||
python -c "import pypdf; print(pypdf.__version__)"
|
||||
python -c "import pypandoc; print(pypandoc.get_pandoc_version())"
|
||||
```
|
||||
|
||||
### Step 3:构建Docker镜像
|
||||
|
||||
```powershell
|
||||
cd D:\MyCursor\AIclinicalresearch\extraction_service
|
||||
|
||||
# 构建镜像
|
||||
docker build -t python-extraction:v1.1 .
|
||||
|
||||
# 预计时间:15分钟
|
||||
# 预计大小:约1.2GB
|
||||
```
|
||||
|
||||
### Step 4:本地验证镜像
|
||||
|
||||
```powershell
|
||||
# 运行容器测试
|
||||
docker run --rm -p 8000:8000 python-extraction:v1.1
|
||||
|
||||
# 新开终端测试
|
||||
curl http://localhost:8000/health
|
||||
curl http://localhost:8000/docs # 查看API文档
|
||||
```
|
||||
|
||||
### Step 5:登录ACR
|
||||
|
||||
```powershell
|
||||
docker login --username=gofeng117@163.com `
|
||||
--password=fengzhibo117 `
|
||||
crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com
|
||||
```
|
||||
|
||||
### Step 6:打标签
|
||||
|
||||
```powershell
|
||||
docker tag python-extraction:v1.1 `
|
||||
crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:v1.1
|
||||
```
|
||||
|
||||
### Step 7:推送到ACR
|
||||
|
||||
```powershell
|
||||
docker push `
|
||||
crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:v1.1
|
||||
|
||||
# 预计时间:10分钟(镜像约1.1GB)
|
||||
# 成功标志:看到 "digest: sha256:..."
|
||||
```
|
||||
|
||||
### Step 8:SAE部署
|
||||
|
||||
1. 登录SAE:https://sae.console.aliyun.com/
|
||||
2. 进入应用:`python-extraction-test`
|
||||
3. 点击【部署应用】
|
||||
4. 配置:
|
||||
- **镜像地址**:选择 `python-extraction`
|
||||
- **镜像版本**:选择 `v1.1`
|
||||
5. 点击【确认】
|
||||
6. 等待部署完成(约5-8分钟)
|
||||
|
||||
### Step 9:验证部署
|
||||
|
||||
```bash
|
||||
# 健康检查
|
||||
curl http://172.17.173.66:8000/health
|
||||
|
||||
# 或通过后端代理测试
|
||||
curl http://8.140.53.236/api/v1/health
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 五、一键部署脚本
|
||||
|
||||
### PowerShell脚本
|
||||
|
||||
创建 `extraction_service/update-and-deploy.ps1`:
|
||||
|
||||
```powershell
|
||||
# Python微服务一键更新脚本
|
||||
# 使用方法: .\update-and-deploy.ps1 v1.1
|
||||
|
||||
param(
|
||||
[Parameter(Mandatory=$true)]
|
||||
[string]$Version
|
||||
)
|
||||
|
||||
$ErrorActionPreference = "Stop"
|
||||
|
||||
Write-Host "========================================" -ForegroundColor Green
|
||||
Write-Host "开始更新Python微服务到版本: $Version" -ForegroundColor Green
|
||||
Write-Host "========================================" -ForegroundColor Green
|
||||
|
||||
# 1. 构建Docker镜像
|
||||
Write-Host "`n[1/4] 构建Docker镜像..." -ForegroundColor Cyan
|
||||
docker build -t python-extraction:$Version .
|
||||
if ($LASTEXITCODE -ne 0) {
|
||||
Write-Host "❌ 构建失败!" -ForegroundColor Red
|
||||
exit 1
|
||||
}
|
||||
Write-Host "✅ 镜像构建成功!" -ForegroundColor Green
|
||||
|
||||
# 2. 登录ACR
|
||||
Write-Host "`n[2/4] 登录ACR..." -ForegroundColor Cyan
|
||||
docker login --username=gofeng117@163.com `
|
||||
--password=fengzhibo117 `
|
||||
crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com
|
||||
if ($LASTEXITCODE -ne 0) {
|
||||
Write-Host "❌ 登录失败!" -ForegroundColor Red
|
||||
exit 1
|
||||
}
|
||||
|
||||
# 3. 打标签
|
||||
Write-Host "`n[3/4] 打标签..." -ForegroundColor Cyan
|
||||
$ImageUrl = "crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:$Version"
|
||||
docker tag python-extraction:$Version $ImageUrl
|
||||
Write-Host "✅ 标签已打!" -ForegroundColor Green
|
||||
|
||||
# 4. 推送到ACR
|
||||
Write-Host "`n[4/4] 推送到ACR..." -ForegroundColor Cyan
|
||||
Write-Host "推送地址: $ImageUrl" -ForegroundColor Yellow
|
||||
docker push $ImageUrl
|
||||
if ($LASTEXITCODE -ne 0) {
|
||||
Write-Host "❌ 推送失败!" -ForegroundColor Red
|
||||
exit 1
|
||||
}
|
||||
|
||||
Write-Host "`n========================================" -ForegroundColor Green
|
||||
Write-Host "✅ Python微服务镜像已推送成功!" -ForegroundColor Green
|
||||
Write-Host "========================================" -ForegroundColor Green
|
||||
Write-Host "`n下一步操作:" -ForegroundColor Yellow
|
||||
Write-Host "1. 登录SAE控制台: https://sae.console.aliyun.com/" -ForegroundColor Yellow
|
||||
Write-Host "2. 进入应用: python-extraction-test" -ForegroundColor Yellow
|
||||
Write-Host "3. 点击【部署应用】" -ForegroundColor Yellow
|
||||
Write-Host "4. 选择镜像版本: $Version" -ForegroundColor Yellow
|
||||
Write-Host "5. 确认部署" -ForegroundColor Yellow
|
||||
Write-Host "`n镜像地址(VPC):" -ForegroundColor Cyan
|
||||
Write-Host "crpi-cd5ij4pjt65mweeo-vpc.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:$Version" -ForegroundColor Cyan
|
||||
```
|
||||
|
||||
### 使用方法
|
||||
|
||||
```powershell
|
||||
cd D:\MyCursor\AIclinicalresearch\extraction_service
|
||||
.\update-and-deploy.ps1 v1.1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ 六、注意事项
|
||||
|
||||
### 6.1 Pandoc依赖
|
||||
|
||||
- pypandoc需要系统安装Pandoc
|
||||
- 确保Dockerfile中包含 `apt-get install pandoc`
|
||||
- 如果不需要文档转换功能,可以不安装pypandoc
|
||||
|
||||
### 6.2 镜像大小
|
||||
|
||||
- 当前v1.0:1.12GB
|
||||
- 预计v1.1:~1.2GB
|
||||
- 主要增量:pypandoc + Pandoc
|
||||
|
||||
### 6.3 兼容性
|
||||
|
||||
- pypdf与PyMuPDF可能有功能重叠
|
||||
- 建议先在开发环境充分测试
|
||||
|
||||
---
|
||||
|
||||
## 🔄 七、回滚方案
|
||||
|
||||
如果v1.1出现问题,回滚到v1.0:
|
||||
|
||||
1. 登录SAE控制台
|
||||
2. 进入应用:`python-extraction-test`
|
||||
3. 点击【部署应用】
|
||||
4. 选择镜像版本:`v1.0`
|
||||
5. 确认部署
|
||||
|
||||
---
|
||||
|
||||
## ✅ 八、验证清单
|
||||
|
||||
### 部署前验证
|
||||
|
||||
- [ ] requirements-prod.txt已更新
|
||||
- [ ] Dockerfile已更新(如需要)
|
||||
- [ ] 本地构建成功
|
||||
- [ ] 本地运行测试通过
|
||||
- [ ] ACR登录成功
|
||||
|
||||
### 部署后验证
|
||||
|
||||
- [ ] SAE部署成功
|
||||
- [ ] 健康检查通过
|
||||
- [ ] PDF提取功能正常
|
||||
- [ ] Docx提取功能正常
|
||||
- [ ] 新功能测试通过
|
||||
|
||||
---
|
||||
|
||||
## 📊 九、时间估算
|
||||
|
||||
| 步骤 | 预计时间 |
|
||||
|------|---------|
|
||||
| 更新依赖文件 | 5分钟 |
|
||||
| 本地构建镜像 | 15分钟 |
|
||||
| 推送到ACR | 10分钟 |
|
||||
| SAE部署 | 10分钟 |
|
||||
| 验证测试 | 10分钟 |
|
||||
| **总计** | **50分钟** |
|
||||
|
||||
---
|
||||
|
||||
> **最后更新**:2026-01-26
|
||||
> **维护人员**:开发团队
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user