docs(platform): Add database documentation system and restructure deployment docs
Completed: - Add 6 core database documents (docs/01-平台基础层/07-数据库/) Architecture overview, migration history, environment comparison, tech debt tracking, seed data management, PostgreSQL extensions - Restructure deployment docs: archive 20 legacy files to _archive-2025/ - Create unified daily operations manual (01-日常更新操作手册.md) - Add pending deployment change tracker (03-待部署变更清单.md) - Update database development standard to v3.0 (three iron rules) - Fix Prisma schema type drift: align @db.* annotations with actual DB IIT: UUID/Timestamptz(6), SSA: Timestamp(6)/VarChar(20/50/100) - Add migration: 20260227_align_schema_with_db_types (idempotent ALTER) - Add Cursor Rule for auto-reminding deployment change documentation - Update system status guide v6.4 with deployment and DB doc references - Add architecture consultation docs (Prisma guide, SAE deployment guide) Technical details: - Manual migration due to shadow DB limitation (TD-001 in tech debt) - Deployment docs reduced from 20+ scattered files to 3 core documents - Cursor Rule triggers on schema.prisma, package.json, Dockerfile changes Made-with: Cursor
This commit is contained in:
844
docs/05-部署文档/_archive-2025首次部署/09-Python微服务-SAE部署操作手册.md
Normal file
844
docs/05-部署文档/_archive-2025首次部署/09-Python微服务-SAE部署操作手册.md
Normal file
@@ -0,0 +1,844 @@
|
||||
# Python 微服务 SAE 部署操作手册
|
||||
|
||||
**文档版本**: v1.0
|
||||
**创建时间**: 2024-12-24
|
||||
**适用范围**: AI临床研究平台 - Python微服务(extraction_service)
|
||||
**环境类型**: 测试环境(轻量版SAE)
|
||||
**目标读者**: 运维工程师、开发工程师
|
||||
|
||||
---
|
||||
|
||||
## 📋 目录
|
||||
|
||||
1. [前置检查清单](#前置检查清单)
|
||||
2. [创建SAE应用(Web控制台)](#创建sae应用web控制台)
|
||||
3. [部署后验证](#部署后验证)
|
||||
4. [集成配置](#集成配置)
|
||||
5. [常见问题排查](#常见问题排查)
|
||||
|
||||
---
|
||||
|
||||
## 前置检查清单
|
||||
|
||||
### ✅ 必需资源确认
|
||||
|
||||
在开始创建SAE应用前,请确认以下资源已准备就绪:
|
||||
|
||||
| 资源类型 | 确认项 | 获取位置 |
|
||||
|---------|-------|---------|
|
||||
| **Docker镜像** | ✅ 已推送至ACR | [部署进度总览.md - 2.1 ACR容器镜像仓库](./00-部署进度总览.md#21-acr容器镜像仓库) |
|
||||
| **VPC网络** | ✅ VPC ID、vSwitch ID | [部署进度总览.md - 2.2 VPC网络](./00-部署进度总览.md#22-vpc网络与nat网关) |
|
||||
| **安全组** | ✅ 安全组ID | [部署进度总览.md - 2.2 VPC网络](./00-部署进度总览.md#22-vpc网络与nat网关) |
|
||||
| **OSS存储** | ✅ AccessKey、Bucket名称 | [部署进度总览.md - 2.5 OSS对象存储](./00-部署进度总览.md#25-oss对象存储) |
|
||||
| **SAE命名空间** | ✅ 命名空间ID | [部署进度总览.md - 2.4 SAE应用](./00-部署进度总览.md#24-sae-serverless应用) |
|
||||
|
||||
### 📦 镜像信息
|
||||
|
||||
```
|
||||
镜像地址(VPC内网):
|
||||
crpi-cd5ij4pjt65mweeo-vpc.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:v1.0
|
||||
|
||||
镜像版本:v1.0
|
||||
镜像大小:1.12GB
|
||||
功能说明:PDF/Docx提取 + 数据清洗(pandas/polars)
|
||||
```
|
||||
|
||||
### 🌐 网络配置信息
|
||||
|
||||
```
|
||||
VPC ID:vpc-2ze055cptkew9c38w4r06
|
||||
vSwitch ID:vsw-2zevacop039bxrmj6yc0c(可用区F)
|
||||
安全组ID:sg-2zedk6fi8sgmmcwdu7tu
|
||||
命名空间:cn-beijing:test-airesearch
|
||||
```
|
||||
|
||||
### 🗄️ OSS配置信息
|
||||
|
||||
```
|
||||
OSS_ACCESS_KEY_ID:LTAI5tB2Dt3NdvBL3G7nYGv7
|
||||
OSS_ACCESS_KEY_SECRET:1iSN9k39RkApP93QjUhC1DcPIeMG4V
|
||||
OSS_BUCKET:ai-clinical-research
|
||||
OSS_ENDPOINT:oss-cn-beijing-internal.aliyuncs.com
|
||||
```
|
||||
|
||||
⚠️ **安全警告**:AccessKey是敏感信息,仅在SAE环境变量中配置,不要提交到Git或打印到日志!
|
||||
|
||||
---
|
||||
|
||||
## 创建SAE应用(Web控制台)
|
||||
|
||||
### 步骤 1:进入SAE控制台
|
||||
|
||||
1. 登录 [阿里云控制台](https://homenew.console.aliyun.com/)
|
||||
2. 搜索并进入 **Serverless 应用引擎 SAE**
|
||||
3. 确认地域为 **华北2(北京)**
|
||||
4. 选择命名空间 **test-airesearch**
|
||||
|
||||
---
|
||||
|
||||
### 步骤 2:创建应用
|
||||
|
||||
#### 2.1 基本信息配置
|
||||
|
||||
点击 **创建应用** 按钮,填写以下信息:
|
||||
|
||||
| 配置项 | 值 | 说明 |
|
||||
|--------|---|------|
|
||||
| **应用名称** | `python-extraction-test` | 建议加 `-test` 后缀区分测试环境 |
|
||||
| **应用类型** | **轻量版应用** | 测试环境使用轻量版,节省成本 |
|
||||
| **部署方式** | **镜像** | 选择容器镜像部署 |
|
||||
|
||||
点击 **下一步**
|
||||
|
||||
---
|
||||
|
||||
#### 2.2 应用部署配置
|
||||
|
||||
##### 镜像配置
|
||||
|
||||
| 配置项 | 值 | 说明 |
|
||||
|--------|---|------|
|
||||
| **镜像来源** | 容器镜像服务 ACR(或选择"自定义镜像") | |
|
||||
| **镜像地址** | `crpi-cd5ij4pjt65mweeo-vpc.cn-beijing.personal.cr.aliyuncs.com/ai-clinical/python-extraction:v1.0` | ⚠️ 必须使用VPC内网地址<br>⚠️ 必须包含版本号 `:v1.0` |
|
||||
| **镜像版本** | `v1.0` | 固定版本号,不要使用 `:latest`<br>⚠️ 如果不指定版本号,SAE会默认使用 `:latest` 导致拉取失败 |
|
||||
| **镜像仓库认证** | **需要配置** | ⚠️ **关键步骤**:配置ACR访问凭证(见下方) |
|
||||
|
||||
##### 🔑 镜像仓库认证配置(关键步骤)
|
||||
|
||||
**⚠️ 如果出现 `insufficient_scope: authorization failed` 错误,必须配置此项**
|
||||
|
||||
找到 **"镜像仓库认证"** 或 **"私有镜像仓库"** 配置项:
|
||||
|
||||
| 配置项 | 值 | 说明 |
|
||||
|--------|---|------|
|
||||
| **镜像仓库地址** | `crpi-cd5ij4pjt65mweeo-vpc.cn-beijing.personal.cr.aliyuncs.com` | 只填写Registry域名,不包含命名空间/仓库名 |
|
||||
| **用户名** | `gofeng117@163.com` | ACR登录用户名 |
|
||||
| **密码** | `fengzhibo117` | ACR登录密码 |
|
||||
|
||||
💡 **说明**:
|
||||
- 这些凭证用于SAE拉取私有镜像
|
||||
- 密码会被SAE加密存储,不会泄露
|
||||
- 凭证来源:[部署进度总览 - 2.1 ACR容器镜像仓库](./00-部署进度总览.md#21-acr容器镜像仓库)
|
||||
|
||||
##### 应用实例配置
|
||||
|
||||
| 配置项 | 值 | 说明 |
|
||||
|--------|---|------|
|
||||
| **CPU** | 1核 | `1000 millicores` |
|
||||
| **内存** | 2GB | `2048 MB` |
|
||||
| **实例数** | 1 | ⚠️ 必须至少1个实例,0个实例=服务停止 |
|
||||
|
||||
##### 应用访问设置
|
||||
|
||||
| 配置项 | 值 | 说明 |
|
||||
|--------|---|------|
|
||||
| **容器端口** | `8000` | Python FastAPI服务端口 |
|
||||
| **协议类型** | HTTP | |
|
||||
| **是否开启公网访问** | **否** | 仅内网访问,被Node.js后端调用 |
|
||||
|
||||
点击 **下一步**
|
||||
|
||||
---
|
||||
|
||||
#### 2.3 环境配置
|
||||
|
||||
##### 环境变量
|
||||
|
||||
点击 **添加环境变量**,逐个添加以下配置:
|
||||
|
||||
| 变量名 | 变量值 | 说明 |
|
||||
|--------|--------|------|
|
||||
| `LOG_LEVEL` | `INFO` | 日志级别 |
|
||||
| `TEMP_DIR` | `/tmp/extraction_service` | 临时文件目录 |
|
||||
| `TZ` | `Asia/Shanghai` | 时区设置 |
|
||||
| `SERVICE_NAME` | `python-extraction` | 服务标识 |
|
||||
| `SERVICE_VERSION` | `v1.0` | 版本标识 |
|
||||
| `OSS_ACCESS_KEY_ID` | `LTAI5tB2Dt3NdvBL3G7nYGv7` | OSS访问密钥ID |
|
||||
| `OSS_ACCESS_KEY_SECRET` | `1iSN9k39RkApP93QjUhC1DcPIeMG4V` | OSS访问密钥Secret |
|
||||
| `OSS_BUCKET` | `ai-clinical-research` | OSS Bucket名称 |
|
||||
| `OSS_ENDPOINT` | `oss-cn-beijing-internal.aliyuncs.com` | OSS内网Endpoint |
|
||||
|
||||
⚠️ **注意**:
|
||||
- 环境变量中的 `OSS_ACCESS_KEY_SECRET` 是敏感信息,SAE会自动加密
|
||||
- 所有环境变量都可以在应用部署后修改
|
||||
|
||||
##### 健康检查
|
||||
|
||||
| 配置项 | 值 | 说明 |
|
||||
|--------|---|------|
|
||||
| **健康检查方式** | HTTP | |
|
||||
| **健康检查路径** | `/api/health` | FastAPI健康检查端点 |
|
||||
| **健康检查端口** | `8000` | |
|
||||
| **初始延迟时间** | `40` 秒 | 给镜像拉取和服务启动留时间 |
|
||||
| **检查间隔** | `30` 秒 | |
|
||||
| **检查超时** | `10` 秒 | |
|
||||
| **健康阈值** | `2` 次 | 连续2次成功视为健康 |
|
||||
| **不健康阈值** | `3` 次 | 连续3次失败视为不健康 |
|
||||
|
||||
点击 **下一步**
|
||||
|
||||
---
|
||||
|
||||
#### 2.4 网络配置
|
||||
|
||||
| 配置项 | 值 | 说明 |
|
||||
|--------|---|------|
|
||||
| **专有网络VPC** | `vpc-2ze055cptkew9c38w4r06` | ai-clinical-vpc |
|
||||
| **虚拟交换机vSwitch** | `vsw-2zevacop039bxrmj6yc0c` | 可用区F |
|
||||
| **安全组** | `sg-2zedk6fi8sgmmcwdu7tu` | |
|
||||
| **SLB公网访问** | **不配置** | 仅内网访问 |
|
||||
|
||||
点击 **下一步**
|
||||
|
||||
---
|
||||
|
||||
#### 2.5 应用生命周期配置(可选,使用默认即可)
|
||||
|
||||
| 配置项 | 默认值 | 说明 |
|
||||
|--------|--------|------|
|
||||
| **启动超时时间** | 300秒 | 镜像较大,需要较长启动时间 |
|
||||
| **优雅停机超时** | 30秒 | 给应用处理完当前请求的时间 |
|
||||
|
||||
点击 **下一步**
|
||||
|
||||
---
|
||||
|
||||
#### 2.6 确认配置
|
||||
|
||||
1. 仔细检查所有配置项是否正确
|
||||
2. 特别确认:
|
||||
- ✅ 镜像地址使用VPC内网地址
|
||||
- ✅ 实例数 = 1(不是0)
|
||||
- ✅ OSS环境变量已配置
|
||||
- ✅ 健康检查路径为 `/api/health`
|
||||
3. 点击 **创建应用**
|
||||
|
||||
---
|
||||
|
||||
### 步骤 3:等待部署完成
|
||||
|
||||
部署过程大约需要 **3-5分钟**,SAE会自动执行以下步骤:
|
||||
|
||||
```
|
||||
1. 拉取Docker镜像(约2-3分钟,镜像1.12GB)
|
||||
└─ 使用VPC内网,速度较快
|
||||
2. 启动容器(约30秒)
|
||||
└─ 执行Dockerfile中的CMD命令
|
||||
3. 健康检查(约1-2分钟)
|
||||
└─ 等待40秒后开始检查 /api/health
|
||||
4. 应用运行中(部署成功)
|
||||
└─ 实例状态变为"运行中"
|
||||
```
|
||||
|
||||
**实时监控部署进度**:
|
||||
- SAE控制台 → 应用详情 → 变更记录 → 查看详情
|
||||
|
||||
**查看部署日志**:
|
||||
- SAE控制台 → 应用详情 → 日志查询 → 实时日志
|
||||
|
||||
**预期日志内容**:
|
||||
```log
|
||||
INFO: Started server process [1]
|
||||
INFO: Waiting for application startup.
|
||||
INFO: Application startup complete.
|
||||
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 部署后验证
|
||||
|
||||
### 步骤 1:获取内网访问地址
|
||||
|
||||
**⚠️ 关键步骤:必须从SAE控制台获取真实内网IP**
|
||||
|
||||
#### 获取方法:
|
||||
|
||||
1. SAE控制台 → 应用列表 → 点击 `python-extraction-test`
|
||||
2. 进入应用详情页
|
||||
3. 找到 **实例列表** 或 **基本信息**
|
||||
4. 查看 **内网IP地址**
|
||||
|
||||
**预期格式**:
|
||||
```
|
||||
内网IP:172.17.x.x
|
||||
端口:8000
|
||||
完整地址:http://172.17.x.x:8000
|
||||
```
|
||||
|
||||
**⚠️ 重要**:
|
||||
- ❌ 不要猜测域名(如 `extraction-service.internal`)
|
||||
- ❌ 不要使用 `localhost:8000`
|
||||
- ✅ 必须使用SAE控制台显示的真实IP地址
|
||||
|
||||
**记录内网地址**:
|
||||
```
|
||||
# ✅ 已获取内网地址(2024-12-24):
|
||||
PYTHON_SERVICE_INTERNAL_IP=172.17.173.66:8000
|
||||
PYTHON_SERVICE_URL=http://172.17.173.66:8000
|
||||
```
|
||||
|
||||
⚠️ **重要提醒**:
|
||||
- 此地址仅在VPC内网可访问
|
||||
- Node.js后端需要配置此地址作为环境变量
|
||||
- 如果实例重启,IP地址可能会变化(需重新获取)
|
||||
|
||||
---
|
||||
|
||||
### 步骤 2:健康检查测试
|
||||
|
||||
#### 方法 1:从SAE控制台测试(推荐)
|
||||
|
||||
1. SAE控制台 → 应用详情 → 实例列表
|
||||
2. 点击实例的 **Webshell** 按钮(如果支持)
|
||||
3. 执行命令(使用Python测试,因为容器中没有curl):
|
||||
```bash
|
||||
python -c "import urllib.request; print(urllib.request.urlopen('http://localhost:8000/api/health').read().decode())"
|
||||
```
|
||||
|
||||
⚠️ **注意**:如果遇到 `curl: command not found`,说明容器中没有安装curl工具(精简镜像),请使用上面的Python命令。
|
||||
|
||||
#### 方法 2:从本地测试(需要临时配置)
|
||||
|
||||
⚠️ **注意**:由于Python服务仅在VPC内网,本地无法直接访问,需要以下任一方法:
|
||||
|
||||
**选项A:通过Node.js后端转发(推荐)**
|
||||
- 待Node.js后端部署后,通过后端间接测试
|
||||
|
||||
**选项B:临时配置公网SLB(测试完成后删除)**
|
||||
1. SAE控制台 → 应用详情 → 应用访问设置
|
||||
2. 点击 **绑定SLB**
|
||||
3. 创建或选择公网SLB
|
||||
4. 测试完成后立即删除SLB
|
||||
|
||||
**预期响应**:
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"checks": {
|
||||
"pymupdf": {
|
||||
"available": true,
|
||||
"version": "1.26.7"
|
||||
},
|
||||
"nougat": {
|
||||
"available": false,
|
||||
"error": "Nougat未安装(已移除以减小镜像)"
|
||||
},
|
||||
"temp_dir": {
|
||||
"path": "/tmp/extraction_service",
|
||||
"writable": true
|
||||
}
|
||||
},
|
||||
"timestamp": "2024-12-24T10:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 步骤 3:查看应用日志
|
||||
|
||||
1. SAE控制台 → 应用详情 → 日志查询
|
||||
2. 选择 **实时日志**
|
||||
3. 确认日志中包含:
|
||||
|
||||
```log
|
||||
✅ 正常启动标志:
|
||||
INFO: Started server process [1]
|
||||
INFO: Application startup complete.
|
||||
INFO: Uvicorn running on http://0.0.0.0:8000
|
||||
|
||||
✅ 健康检查日志(每30秒一次):
|
||||
INFO: 172.17.x.x:xxxx - "GET /api/health HTTP/1.1" 200 OK
|
||||
|
||||
❌ 如果出现错误:
|
||||
ERROR: ImportError: libXXX.so: cannot open shared object file
|
||||
→ 说明系统依赖缺失,检查Dockerfile
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 步骤 4:监控应用状态
|
||||
|
||||
SAE控制台 → 应用详情 → 基本信息
|
||||
|
||||
**关键指标**:
|
||||
| 指标 | 正常值 | 说明 |
|
||||
|------|--------|------|
|
||||
| **应用状态** | 运行中 | 绿色 |
|
||||
| **实例数** | 1/1 | 1个实例运行中 |
|
||||
| **健康实例数** | 1 | 健康检查通过 |
|
||||
| **CPU使用率** | < 20% | 空闲状态 |
|
||||
| **内存使用率** | < 50% | 约1GB(Python基础+依赖) |
|
||||
|
||||
---
|
||||
|
||||
## 集成配置
|
||||
|
||||
### 步骤 1:更新Node.js后端环境变量
|
||||
|
||||
在Node.js后端的SAE应用中,添加以下环境变量:
|
||||
|
||||
```bash
|
||||
# Python微服务内网地址
|
||||
EXTRACTION_SERVICE_URL=http://172.17.x.x:8000
|
||||
|
||||
# 注意:
|
||||
# 1. 替换为实际获取的内网IP
|
||||
# 2. 不要加尾部斜杠 /
|
||||
```
|
||||
|
||||
**配置位置**:
|
||||
- SAE控制台 → Node.js后端应用 → 应用配置 → 环境变量 → 添加
|
||||
|
||||
**配置后操作**:
|
||||
- 重启Node.js后端应用(SAE会自动重启)
|
||||
|
||||
---
|
||||
|
||||
### 步骤 2:后端代码验证(可选)
|
||||
|
||||
在Node.js后端代码中添加测试端点:
|
||||
|
||||
```typescript
|
||||
// backend/src/routes/test.ts
|
||||
|
||||
import { Router } from 'express';
|
||||
import axios from 'axios';
|
||||
|
||||
const router = Router();
|
||||
|
||||
router.get('/test-python-service', async (req, res) => {
|
||||
try {
|
||||
const extractionServiceUrl = process.env.EXTRACTION_SERVICE_URL || 'http://localhost:8000';
|
||||
|
||||
// 1. 测试健康检查
|
||||
const healthRes = await axios.get(`${extractionServiceUrl}/api/health`);
|
||||
|
||||
res.json({
|
||||
success: true,
|
||||
message: 'Python service is healthy',
|
||||
data: healthRes.data
|
||||
});
|
||||
} catch (error) {
|
||||
res.status(500).json({
|
||||
success: false,
|
||||
message: 'Failed to connect to Python service',
|
||||
error: error.message
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
export default router;
|
||||
```
|
||||
|
||||
**测试方法**:
|
||||
```bash
|
||||
# 从前端或Postman访问
|
||||
GET https://your-backend-domain.com/api/test-python-service
|
||||
|
||||
# 预期响应:
|
||||
{
|
||||
"success": true,
|
||||
"message": "Python service is healthy",
|
||||
"data": { "status": "healthy", ... }
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 步骤 3:端到端功能测试
|
||||
|
||||
测试完整业务流程:
|
||||
|
||||
#### 测试场景 1:PDF文本提取
|
||||
|
||||
**流程**:
|
||||
```
|
||||
前端上传PDF
|
||||
→ Node.js后端接收
|
||||
→ HTTP POST 转发到 Python服务 (EXTRACTION_SERVICE_URL)
|
||||
→ Python服务提取文本
|
||||
→ 返回JSON结果
|
||||
→ 后端处理并返回前端
|
||||
```
|
||||
|
||||
**测试步骤**:
|
||||
1. 在前端上传一个小的PDF文件(< 5MB)
|
||||
2. 查看Node.js后端日志:
|
||||
```log
|
||||
INFO: Calling Python service: http://172.17.x.x:8000/api/extract/pdf
|
||||
INFO: Python service responded in 2.3s
|
||||
```
|
||||
3. 查看Python服务日志:
|
||||
```log
|
||||
INFO: Request: POST /api/extract/pdf
|
||||
INFO: File size: 1.2MB, filename: test.pdf
|
||||
INFO: Using PyMuPDF extraction
|
||||
INFO: Response: 200 (took 2.10s)
|
||||
```
|
||||
|
||||
#### 测试场景 2:数据清洗(DC工具)
|
||||
|
||||
**流程**:
|
||||
```
|
||||
前端上传Excel
|
||||
→ 后端调用 Python服务 /api/operations/fillna
|
||||
→ Python使用pandas/polars处理
|
||||
→ 返回清洗后的数据
|
||||
```
|
||||
|
||||
**测试步骤**:
|
||||
1. 在DC模块上传Excel文件
|
||||
2. 执行数据清洗操作(如fillna)
|
||||
3. 验证返回结果是否正确
|
||||
|
||||
---
|
||||
|
||||
## 常见问题排查
|
||||
|
||||
### 问题 1:镜像拉取失败(insufficient_scope: authorization failed)
|
||||
|
||||
**症状**:
|
||||
```
|
||||
Error: ImagePullBackOff
|
||||
Failed to pull image: insufficient_scope: authorization failed
|
||||
pull access denied, repository does not exist or may require authorization
|
||||
```
|
||||
|
||||
**根本原因**:SAE没有权限访问ACR私有镜像仓库
|
||||
|
||||
**解决步骤**:
|
||||
|
||||
**方法1:配置镜像仓库认证(推荐)**
|
||||
|
||||
1. SAE控制台 → 应用详情 → 点击"部署应用"或"编辑应用"
|
||||
2. 在 **"镜像配置"** 部分,找到 **"镜像仓库认证"** 或 **"私有镜像仓库"**
|
||||
3. 配置以下信息:
|
||||
```
|
||||
镜像仓库地址:crpi-cd5ij4pjt65mweeo-vpc.cn-beijing.personal.cr.aliyuncs.com
|
||||
用户名:gofeng117@163.com
|
||||
密码:fengzhibo117
|
||||
```
|
||||
4. 保存配置并重新部署
|
||||
|
||||
**方法2:使用RAM角色授权(生产环境推荐)**
|
||||
|
||||
1. RAM控制台 → 创建角色 → 选择"阿里云服务" → 受信服务选"SAE"
|
||||
2. 为角色添加权限:`AliyunContainerRegistryReadOnlyAccess`
|
||||
3. SAE应用配置 → 高级设置 → 绑定RAM角色
|
||||
|
||||
**方法3:设置ACR仓库为公开(仅测试环境)**
|
||||
|
||||
⚠️ 不推荐生产环境使用(安全风险)
|
||||
|
||||
1. ACR控制台 → 个人实例 → 仓库列表
|
||||
2. 找到 `ai-clinical/python-extraction`
|
||||
3. 仓库设置 → 访问控制 → 改为"公开"
|
||||
|
||||
---
|
||||
|
||||
### 问题 2:应用启动失败(其他原因)
|
||||
|
||||
**症状**:
|
||||
```
|
||||
SAE控制台显示:应用启动失败
|
||||
实例状态:异常
|
||||
```
|
||||
|
||||
**排查步骤**:
|
||||
|
||||
**1. 查看部署日志**
|
||||
```
|
||||
SAE控制台 → 应用详情 → 变更记录 → 查看详情
|
||||
```
|
||||
|
||||
**2. 常见错误及解决方法**:
|
||||
|
||||
| 错误信息 | 原因 | 解决方法 |
|
||||
|---------|------|---------|
|
||||
| `ImagePullBackOff` + `failed to resolve reference "...:latest"` | **镜像地址未指定版本号** | **在镜像地址末尾添加 `:v1.0`**<br>完整地址:`...python-extraction:v1.0` |
|
||||
| `ImagePullBackOff` + `insufficient_scope: authorization failed` | **ACR访问权限不足(最常见)** | **配置镜像仓库认证**<br>1. SAE应用配置 → 镜像配置<br>2. 配置镜像仓库认证<br>3. 用户名:`gofeng117@163.com`<br>4. 密码:`fengzhibo117` |
|
||||
| `ImagePullBackOff` + `pull access denied` | 镜像仓库认证失败 | 检查用户名/密码是否正确 |
|
||||
| `ImagePullBackOff` | 镜像地址错误 | 确认使用VPC内网地址(带-vpc后缀) |
|
||||
| `ImportError: libXXX.so` | 系统依赖缺失 | 检查Dockerfile,确保安装了所有运行时依赖 |
|
||||
| `OOMKilled` | 内存不足 | 增加内存配置(2GB → 4GB) |
|
||||
| `Health check failed` | 健康检查未通过 | 检查 `/api/health` 端点是否正常 |
|
||||
|
||||
**3. 查看容器日志**
|
||||
```
|
||||
SAE控制台 → 应用详情 → 日志查询 → 实时日志
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 问题 3:健康检查失败
|
||||
|
||||
**症状**:
|
||||
```
|
||||
实例列表显示:健康检查失败
|
||||
实例反复重启
|
||||
```
|
||||
|
||||
**排查步骤**:
|
||||
|
||||
**1. 确认服务是否正常启动**
|
||||
```bash
|
||||
# 查看日志中是否有:
|
||||
INFO: Uvicorn running on http://0.0.0.0:8000
|
||||
```
|
||||
|
||||
**2. 确认端口是否正确**
|
||||
```bash
|
||||
# 检查容器端口配置:8000
|
||||
# 检查健康检查端口配置:8000
|
||||
```
|
||||
|
||||
**3. 手动测试健康检查端点**
|
||||
```bash
|
||||
# 在SAE Webshell中执行:
|
||||
curl http://localhost:8000/api/health
|
||||
```
|
||||
|
||||
**4. 调整健康检查参数**
|
||||
```
|
||||
初始延迟时间:40秒 → 60秒(如果镜像拉取慢)
|
||||
检查超时:10秒 → 20秒
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 问题 4:Node.js后端无法连接Python服务
|
||||
|
||||
**症状**:
|
||||
```
|
||||
后端日志:Connection refused
|
||||
或
|
||||
ECONNREFUSED: connect ECONNREFUSED 172.17.x.x:8000
|
||||
```
|
||||
|
||||
**排查步骤**:
|
||||
|
||||
**1. 确认内网地址是否正确**
|
||||
```bash
|
||||
# ❌ 错误配置(猜测的域名)
|
||||
EXTRACTION_SERVICE_URL=http://python-extraction.internal:8000
|
||||
|
||||
# ✅ 正确配置(SAE控制台显示的真实IP)
|
||||
EXTRACTION_SERVICE_URL=http://172.17.10.5:8000
|
||||
```
|
||||
|
||||
**2. 确认Python服务是否运行**
|
||||
```
|
||||
SAE控制台 → Python应用 → 实例列表
|
||||
状态:运行中 ✅
|
||||
```
|
||||
|
||||
**3. 确认安全组规则**
|
||||
```
|
||||
SAE控制台 → Python应用 → 网络配置 → 安全组
|
||||
入站规则:允许VPC内访问 8000端口
|
||||
```
|
||||
|
||||
**4. 测试内网连通性**
|
||||
```bash
|
||||
# 在Node.js后端容器中执行(通过SAE Webshell):
|
||||
curl http://172.17.x.x:8000/api/health
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 问题 5:PDF提取超时
|
||||
|
||||
**症状**:
|
||||
```
|
||||
后端日志:Request timeout after 300s
|
||||
Python日志:Processing large PDF...
|
||||
```
|
||||
|
||||
**原因**:
|
||||
- 文件过大(> 50MB)
|
||||
- PDF包含大量图片
|
||||
|
||||
**解决方法**:
|
||||
|
||||
**1. 增加超时时间**
|
||||
```bash
|
||||
# Node.js后端环境变量
|
||||
EXTRACTION_TIMEOUT=600 # 10分钟
|
||||
```
|
||||
|
||||
**2. 限制文件大小**
|
||||
```python
|
||||
# Python服务:main.py
|
||||
MAX_FILE_SIZE = 50 * 1024 * 1024 # 50MB
|
||||
|
||||
@app.post("/api/extract/pdf")
|
||||
async def extract_pdf(file: UploadFile):
|
||||
if file.size > MAX_FILE_SIZE:
|
||||
raise HTTPException(status_code=413, detail="File too large")
|
||||
```
|
||||
|
||||
**3. 优化提取逻辑**
|
||||
```python
|
||||
# 跳过图片页、压缩图片等
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 问题 6:内存溢出(OOM)
|
||||
|
||||
**症状**:
|
||||
```
|
||||
容器自动重启
|
||||
日志显示:Killed (signal 9)
|
||||
实例监控:内存使用率 > 95%
|
||||
```
|
||||
|
||||
**解决方法**:
|
||||
|
||||
**1. 增加内存配置**
|
||||
```
|
||||
SAE控制台 → 应用配置 → 规格
|
||||
内存:2GB → 4GB
|
||||
```
|
||||
|
||||
**2. 优化代码(流式处理)**
|
||||
```python
|
||||
# 不要一次性加载整个文件到内存
|
||||
with open(pdf_path, 'rb') as f:
|
||||
for chunk in read_in_chunks(f):
|
||||
process(chunk)
|
||||
```
|
||||
|
||||
**3. 限制并发请求**
|
||||
```python
|
||||
# main.py
|
||||
from fastapi import FastAPI
|
||||
from starlette.middleware.base import BaseHTTPMiddleware
|
||||
|
||||
app = FastAPI()
|
||||
# 限制并发连接数
|
||||
app.add_middleware(ConnectionLimitMiddleware, max_connections=10)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 附录
|
||||
|
||||
### A. 快速命令参考
|
||||
|
||||
**查看应用信息**:
|
||||
```bash
|
||||
# 阿里云CLI
|
||||
aliyun sae DescribeApplicationStatus --AppId <app-id>
|
||||
```
|
||||
|
||||
**查看实例列表**:
|
||||
```bash
|
||||
# 阿里云CLI
|
||||
aliyun sae DescribeApplicationInstances --AppId <app-id>
|
||||
```
|
||||
|
||||
**重启应用**:
|
||||
```
|
||||
SAE控制台 → 应用详情 → 重启应用
|
||||
```
|
||||
|
||||
**查看实时日志**:
|
||||
```
|
||||
SAE控制台 → 应用详情 → 日志查询 → 实时日志
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### B. 环境变量配置清单
|
||||
|
||||
| 变量名 | 必需 | 默认值 | 说明 |
|
||||
|--------|-----|--------|------|
|
||||
| `LOG_LEVEL` | 否 | `INFO` | 日志级别(DEBUG/INFO/WARNING/ERROR) |
|
||||
| `TEMP_DIR` | 否 | `/tmp/extraction_service` | 临时文件目录 |
|
||||
| `TZ` | 否 | `UTC` | 时区(建议 `Asia/Shanghai`) |
|
||||
| `SERVICE_NAME` | 否 | - | 服务名称(用于日志标识) |
|
||||
| `SERVICE_VERSION` | 否 | - | 服务版本(用于日志标识) |
|
||||
| `OSS_ACCESS_KEY_ID` | 是 | - | OSS访问密钥ID |
|
||||
| `OSS_ACCESS_KEY_SECRET` | 是 | - | OSS访问密钥Secret |
|
||||
| `OSS_BUCKET` | 是 | - | OSS Bucket名称 |
|
||||
| `OSS_ENDPOINT` | 是 | - | OSS Endpoint(建议内网) |
|
||||
|
||||
---
|
||||
|
||||
### C. 部署检查清单
|
||||
|
||||
**部署前**:
|
||||
- [ ] 确认Docker镜像已推送至ACR
|
||||
- [ ] 确认VPC、vSwitch、安全组ID
|
||||
- [ ] 确认OSS AccessKey有效
|
||||
- [ ] 确认SAE命名空间已创建
|
||||
|
||||
**部署中**:
|
||||
- [ ] 镜像地址使用VPC内网地址
|
||||
- [ ] 实例数 = 1(不是0)
|
||||
- [ ] 容器端口 = 8000
|
||||
- [ ] 健康检查路径 = `/api/health`
|
||||
- [ ] 环境变量配置完整
|
||||
|
||||
**部署后**:
|
||||
- [ ] 应用状态 = 运行中
|
||||
- [ ] 健康检查通过
|
||||
- [ ] 日志显示服务正常启动
|
||||
- [ ] 记录内网IP地址
|
||||
- [ ] 更新Node.js后端环境变量
|
||||
|
||||
---
|
||||
|
||||
### D. 成本预估
|
||||
|
||||
**测试环境(轻量版SAE)**:
|
||||
```
|
||||
规格:1核2GB × 1实例
|
||||
费用:约 ¥60/月
|
||||
```
|
||||
|
||||
**优化建议**:
|
||||
- 测试阶段可以手动停止应用(停止后不计费)
|
||||
- 夜间或周末停止应用节省成本
|
||||
- 生产环境建议使用包年包月优惠
|
||||
|
||||
---
|
||||
|
||||
### E. 相关文档
|
||||
|
||||
- [部署进度总览](./00-部署进度总览.md) - 所有资源速查表
|
||||
- [Python微服务-SAE容器部署指南](./04-Python微服务-SAE容器部署指南.md) - 技术架构详解
|
||||
- [快速部署SOP](./01-快速部署SOP-零基础版.md) - 完整部署流程
|
||||
|
||||
---
|
||||
|
||||
**文档维护**:
|
||||
- 创建时间:2024-12-24
|
||||
- 最后更新:2024-12-24
|
||||
- 下次审查:2025-01-24
|
||||
|
||||
---
|
||||
|
||||
**部署完成后,请记录以下信息**:
|
||||
|
||||
```
|
||||
部署时间:2024-12-24 19:43
|
||||
内网IP地址:http://172.17.173.66:8000
|
||||
首次健康检查通过时间:2024-12-24 19:44
|
||||
SAE应用名称:python-extraction-test
|
||||
应用类型:轻量版应用
|
||||
规格配置:1核2GB × 1实例
|
||||
部署状态:✅ 成功
|
||||
备注:
|
||||
- 解决了ACR镜像拉取权限问题(配置了镜像仓库认证)
|
||||
- 解决了镜像标签问题(指定了:v1.0版本)
|
||||
- 应用正常运行,2个uvicorn worker进程
|
||||
- OpenBLAS警告可忽略(不影响功能)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
> **提示**:部署完成后,请及时更新 [部署进度总览.md](./00-部署进度总览.md) 中的内网地址!
|
||||
|
||||
Reference in New Issue
Block a user