Major Changes: - Database: Install pg_bigm/pgvector plugins, create test database - Python service: v1.0 -> v1.1, add pymupdf4llm/openpyxl/pypandoc - Node.js backend: v1.3 -> v1.7, fix pino-pretty and ES Module imports - Frontend: v1.2 -> v1.3, skip TypeScript check for deployment - Code recovery: Restore empty files from local backup Technical Fixes: - Fix pino-pretty error in production (conditional loading) - Fix ES Module import paths (add .js extensions) - Fix OSSAdapter TypeScript errors - Update Prisma Schema (63 models, 16 schemas) - Update environment variables (DATABASE_URL, EXTRACTION_SERVICE_URL, OSS) - Remove deprecated variables (REDIS_URL, DIFY_API_URL, DIFY_API_KEY) Documentation: - Create 0126 deployment folder with 8 documents - Update database development standards v2.0 - Update SAE deployment status records Deployment Status: - PostgreSQL: ai_clinical_research_test with plugins - Python: v1.1 @ 172.17.173.84:8000 - Backend: v1.7 @ 172.17.173.89:3001 - Frontend: v1.3 @ 172.17.173.90:80 Tested: All services running successfully on SAE
433 lines
9.1 KiB
Markdown
433 lines
9.1 KiB
Markdown
# 🐍 Python微服务更新方案
|
||
|
||
> **文档版本**:v1.0
|
||
> **创建日期**:2026-01-26
|
||
> **适用范围**:extraction_service Python微服务
|
||
> **变更类型**:依赖更新 + 镜像重建
|
||
|
||
---
|
||
|
||
## 📋 一、变更概述
|
||
|
||
### 1.1 变更内容
|
||
|
||
| 变更项 | 描述 | 优先级 |
|
||
|--------|------|--------|
|
||
| **新增pypdf** | PDF处理增强库 | 🟡 中 |
|
||
| **新增pypandoc** | 文档格式转换库 | 🟡 中 |
|
||
| **移除Nougat** | 已确认移除(v1.0已完成) | ✅ 已完成 |
|
||
| **镜像重建** | 构建v1.1版本镜像 | 🔴 高 |
|
||
|
||
### 1.2 当前状态
|
||
|
||
```yaml
|
||
服务名称: python-extraction-test
|
||
当前版本: v1.0
|
||
镜像大小: 1.12GB
|
||
内网地址: http://172.17.173.66:8000
|
||
主要依赖:
|
||
- PyMuPDF: 1.24.0+
|
||
- pdfplumber: 0.10.3
|
||
- mammoth: 1.6.0
|
||
- pandas: 2.0+
|
||
- polars: 0.19+
|
||
```
|
||
|
||
### 1.3 目标状态
|
||
|
||
```yaml
|
||
目标版本: v1.1
|
||
预计镜像大小: ~1.2GB
|
||
新增依赖:
|
||
- pypdf: latest
|
||
- pypandoc: latest
|
||
```
|
||
|
||
---
|
||
|
||
## 📦 二、依赖更新
|
||
|
||
### 2.1 新增依赖说明
|
||
|
||
#### pypdf(PDF处理增强)
|
||
|
||
```python
|
||
# 主要功能
|
||
- PDF文本提取
|
||
- PDF合并、分割
|
||
- PDF元数据读取
|
||
- 与PyMuPDF互补
|
||
|
||
# 安装
|
||
pip install pypdf
|
||
```
|
||
|
||
#### pypandoc(文档格式转换)
|
||
|
||
```python
|
||
# 主要功能
|
||
- Markdown ↔ Word/HTML/PDF转换
|
||
- 支持多种文档格式
|
||
- 高质量文档转换
|
||
|
||
# 安装
|
||
pip install pypandoc
|
||
|
||
# 注意:需要安装Pandoc系统依赖
|
||
# Docker中需要:apt-get install -y pandoc
|
||
```
|
||
|
||
### 2.2 更新requirements-prod.txt
|
||
|
||
当前内容:
|
||
```txt
|
||
# ========================================
|
||
# 生产环境依赖 (移除Nougat和重量级依赖)
|
||
# ========================================
|
||
|
||
# Web框架
|
||
fastapi==0.104.1
|
||
uvicorn[standard]==0.24.0
|
||
python-multipart==0.0.6
|
||
|
||
# 数据处理 (DC工具必需)
|
||
pandas>=2.0.0
|
||
numpy>=1.24.0
|
||
polars>=0.19.0
|
||
|
||
# PDF处理 (核心轻量级库)
|
||
PyMuPDF>=1.24.0
|
||
pdfplumber==0.10.3
|
||
|
||
# Docx处理
|
||
mammoth==1.6.0
|
||
python-docx==1.1.0
|
||
|
||
# 语言检测
|
||
langdetect==1.0.9
|
||
|
||
# 编码检测
|
||
chardet==5.2.0
|
||
|
||
# 工具
|
||
python-dotenv==1.0.0
|
||
pydantic>=2.10.0
|
||
|
||
# 日志
|
||
loguru==0.7.2
|
||
|
||
# 测试工具
|
||
requests==2.31.0
|
||
```
|
||
|
||
**需要添加**:
|
||
```txt
|
||
# PDF处理增强
|
||
pypdf>=4.0.0
|
||
|
||
# 文档格式转换
|
||
pypandoc>=1.13
|
||
```
|
||
|
||
---
|
||
|
||
## 🔧 三、Dockerfile更新
|
||
|
||
### 3.1 当前Dockerfile检查
|
||
|
||
首先检查是否需要添加Pandoc系统依赖。
|
||
|
||
### 3.2 更新Dockerfile(如需要)
|
||
|
||
如果使用pypandoc,需要在Dockerfile中添加Pandoc安装:
|
||
|
||
```dockerfile
|
||
# 在apt-get install阶段添加
|
||
RUN apt-get update && apt-get install -y \
|
||
pandoc \
|
||
&& rm -rf /var/lib/apt/lists/*
|
||
```
|
||
|
||
完整Dockerfile示例:
|
||
|
||
```dockerfile
|
||
# 基础镜像
|
||
FROM python:3.11-slim
|
||
|
||
# 设置工作目录
|
||
WORKDIR /app
|
||
|
||
# 安装系统依赖(包括Pandoc)
|
||
RUN apt-get update && apt-get install -y \
|
||
libgl1-mesa-glx \
|
||
libglib2.0-0 \
|
||
pandoc \
|
||
&& rm -rf /var/lib/apt/lists/*
|
||
|
||
# 使用阿里云PyPI镜像加速
|
||
RUN pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
|
||
|
||
# 复制依赖文件
|
||
COPY requirements-prod.txt .
|
||
|
||
# 安装Python依赖
|
||
RUN pip install --no-cache-dir -r requirements-prod.txt
|
||
|
||
# 复制应用代码
|
||
COPY . .
|
||
|
||
# 暴露端口
|
||
EXPOSE 8000
|
||
|
||
# 启动命令
|
||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
|
||
```
|
||
|
||
---
|
||
|
||
## 🚀 四、构建与部署步骤
|
||
|
||
### Step 1:更新requirements-prod.txt
|
||
|
||
```bash
|
||
cd D:\MyCursor\AIclinicalresearch\extraction_service
|
||
|
||
# 编辑requirements-prod.txt,添加新依赖
|
||
# pypdf>=4.0.0
|
||
# pypandoc>=1.13
|
||
```
|
||
|
||
### Step 2:本地测试(可选)
|
||
|
||
```bash
|
||
# 创建虚拟环境测试
|
||
python -m venv test_venv
|
||
test_venv\Scripts\activate
|
||
pip install -r requirements-prod.txt
|
||
|
||
# 测试导入
|
||
python -c "import pypdf; print(pypdf.__version__)"
|
||
python -c "import pypandoc; print(pypandoc.get_pandoc_version())"
|
||
```
|
||
|
||
### Step 3:构建Docker镜像
|
||
|
||
```powershell
|
||
cd D:\MyCursor\AIclinicalresearch\extraction_service
|
||
|
||
# 构建镜像
|
||
docker build -t python-extraction:v1.1 .
|
||
|
||
# 预计时间:15分钟
|
||
# 预计大小:约1.2GB
|
||
```
|
||
|
||
### Step 4:本地验证镜像
|
||
|
||
```powershell
|
||
# 运行容器测试
|
||
docker run --rm -p 8000:8000 python-extraction:v1.1
|
||
|
||
# 新开终端测试
|
||
curl http://localhost:8000/health
|
||
curl http://localhost:8000/docs # 查看API文档
|
||
```
|
||
|
||
### Step 5:登录ACR
|
||
|
||
```powershell
|
||
docker login --username=gofeng117@163.com `
|
||
--password=fengzhibo117 `
|
||
crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com
|
||
```
|
||
|
||
### Step 6:打标签
|
||
|
||
```powershell
|
||
docker tag python-extraction:v1.1 `
|
||
crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:v1.1
|
||
```
|
||
|
||
### Step 7:推送到ACR
|
||
|
||
```powershell
|
||
docker push `
|
||
crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:v1.1
|
||
|
||
# 预计时间:10分钟(镜像约1.1GB)
|
||
# 成功标志:看到 "digest: sha256:..."
|
||
```
|
||
|
||
### Step 8:SAE部署
|
||
|
||
1. 登录SAE:https://sae.console.aliyun.com/
|
||
2. 进入应用:`python-extraction-test`
|
||
3. 点击【部署应用】
|
||
4. 配置:
|
||
- **镜像地址**:选择 `python-extraction`
|
||
- **镜像版本**:选择 `v1.1`
|
||
5. 点击【确认】
|
||
6. 等待部署完成(约5-8分钟)
|
||
|
||
### Step 9:验证部署
|
||
|
||
```bash
|
||
# 健康检查
|
||
curl http://172.17.173.66:8000/health
|
||
|
||
# 或通过后端代理测试
|
||
curl http://8.140.53.236/api/v1/health
|
||
```
|
||
|
||
---
|
||
|
||
## 📋 五、一键部署脚本
|
||
|
||
### PowerShell脚本
|
||
|
||
创建 `extraction_service/update-and-deploy.ps1`:
|
||
|
||
```powershell
|
||
# Python微服务一键更新脚本
|
||
# 使用方法: .\update-and-deploy.ps1 v1.1
|
||
|
||
param(
|
||
[Parameter(Mandatory=$true)]
|
||
[string]$Version
|
||
)
|
||
|
||
$ErrorActionPreference = "Stop"
|
||
|
||
Write-Host "========================================" -ForegroundColor Green
|
||
Write-Host "开始更新Python微服务到版本: $Version" -ForegroundColor Green
|
||
Write-Host "========================================" -ForegroundColor Green
|
||
|
||
# 1. 构建Docker镜像
|
||
Write-Host "`n[1/4] 构建Docker镜像..." -ForegroundColor Cyan
|
||
docker build -t python-extraction:$Version .
|
||
if ($LASTEXITCODE -ne 0) {
|
||
Write-Host "❌ 构建失败!" -ForegroundColor Red
|
||
exit 1
|
||
}
|
||
Write-Host "✅ 镜像构建成功!" -ForegroundColor Green
|
||
|
||
# 2. 登录ACR
|
||
Write-Host "`n[2/4] 登录ACR..." -ForegroundColor Cyan
|
||
docker login --username=gofeng117@163.com `
|
||
--password=fengzhibo117 `
|
||
crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com
|
||
if ($LASTEXITCODE -ne 0) {
|
||
Write-Host "❌ 登录失败!" -ForegroundColor Red
|
||
exit 1
|
||
}
|
||
|
||
# 3. 打标签
|
||
Write-Host "`n[3/4] 打标签..." -ForegroundColor Cyan
|
||
$ImageUrl = "crpi-cd5ij4pjt65mweeo.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:$Version"
|
||
docker tag python-extraction:$Version $ImageUrl
|
||
Write-Host "✅ 标签已打!" -ForegroundColor Green
|
||
|
||
# 4. 推送到ACR
|
||
Write-Host "`n[4/4] 推送到ACR..." -ForegroundColor Cyan
|
||
Write-Host "推送地址: $ImageUrl" -ForegroundColor Yellow
|
||
docker push $ImageUrl
|
||
if ($LASTEXITCODE -ne 0) {
|
||
Write-Host "❌ 推送失败!" -ForegroundColor Red
|
||
exit 1
|
||
}
|
||
|
||
Write-Host "`n========================================" -ForegroundColor Green
|
||
Write-Host "✅ Python微服务镜像已推送成功!" -ForegroundColor Green
|
||
Write-Host "========================================" -ForegroundColor Green
|
||
Write-Host "`n下一步操作:" -ForegroundColor Yellow
|
||
Write-Host "1. 登录SAE控制台: https://sae.console.aliyun.com/" -ForegroundColor Yellow
|
||
Write-Host "2. 进入应用: python-extraction-test" -ForegroundColor Yellow
|
||
Write-Host "3. 点击【部署应用】" -ForegroundColor Yellow
|
||
Write-Host "4. 选择镜像版本: $Version" -ForegroundColor Yellow
|
||
Write-Host "5. 确认部署" -ForegroundColor Yellow
|
||
Write-Host "`n镜像地址(VPC):" -ForegroundColor Cyan
|
||
Write-Host "crpi-cd5ij4pjt65mweeo-vpc.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:$Version" -ForegroundColor Cyan
|
||
```
|
||
|
||
### 使用方法
|
||
|
||
```powershell
|
||
cd D:\MyCursor\AIclinicalresearch\extraction_service
|
||
.\update-and-deploy.ps1 v1.1
|
||
```
|
||
|
||
---
|
||
|
||
## ⚠️ 六、注意事项
|
||
|
||
### 6.1 Pandoc依赖
|
||
|
||
- pypandoc需要系统安装Pandoc
|
||
- 确保Dockerfile中包含 `apt-get install pandoc`
|
||
- 如果不需要文档转换功能,可以不安装pypandoc
|
||
|
||
### 6.2 镜像大小
|
||
|
||
- 当前v1.0:1.12GB
|
||
- 预计v1.1:~1.2GB
|
||
- 主要增量:pypandoc + Pandoc
|
||
|
||
### 6.3 兼容性
|
||
|
||
- pypdf与PyMuPDF可能有功能重叠
|
||
- 建议先在开发环境充分测试
|
||
|
||
---
|
||
|
||
## 🔄 七、回滚方案
|
||
|
||
如果v1.1出现问题,回滚到v1.0:
|
||
|
||
1. 登录SAE控制台
|
||
2. 进入应用:`python-extraction-test`
|
||
3. 点击【部署应用】
|
||
4. 选择镜像版本:`v1.0`
|
||
5. 确认部署
|
||
|
||
---
|
||
|
||
## ✅ 八、验证清单
|
||
|
||
### 部署前验证
|
||
|
||
- [ ] requirements-prod.txt已更新
|
||
- [ ] Dockerfile已更新(如需要)
|
||
- [ ] 本地构建成功
|
||
- [ ] 本地运行测试通过
|
||
- [ ] ACR登录成功
|
||
|
||
### 部署后验证
|
||
|
||
- [ ] SAE部署成功
|
||
- [ ] 健康检查通过
|
||
- [ ] PDF提取功能正常
|
||
- [ ] Docx提取功能正常
|
||
- [ ] 新功能测试通过
|
||
|
||
---
|
||
|
||
## 📊 九、时间估算
|
||
|
||
| 步骤 | 预计时间 |
|
||
|------|---------|
|
||
| 更新依赖文件 | 5分钟 |
|
||
| 本地构建镜像 | 15分钟 |
|
||
| 推送到ACR | 10分钟 |
|
||
| SAE部署 | 10分钟 |
|
||
| 验证测试 | 10分钟 |
|
||
| **总计** | **50分钟** |
|
||
|
||
---
|
||
|
||
> **最后更新**:2026-01-26
|
||
> **维护人员**:开发团队
|
||
|
||
|
||
|
||
|