# 数据库迁移注意事项 > **文档版本**:v1.1 > **创建日期**:2026-01-27 > **最后更新**:2026-01-27 > **基于故障**:2026-01-27 RVW 任务无限循环故障(根因:pg-boss 表约束丢失) --- ## 📋 背景 ### 故障复盘 **现象**:RVW 审稿模块任务完成后继续无限循环执行,同一 taskId 被处理 7 次 **根因**:数据库从本地迁移到 RDS 时,`platform_schema` 中 **pg-boss 表的约束丢失** | 丢失的约束 | 类型 | 影响 | |-----------|------|------| | `queue_pkey` | 主键 | 🔴 **导致队列定义重复** | | `queue_dead_letter_fkey` | 外键 | 死信队列引用完整性丢失 | | `schedule_name_fkey` | 外键 | 定时任务引用完整性丢失 | | `subscription_name_fkey` | 外键 | 订阅引用完整性丢失 | **影响**:每次 SAE 重启都会创建新的队列定义,导致同一任务被入队多次 --- ## 🔍 为什么只有 pg-boss 相关约束丢失? ### pg-boss 表的"双重管理者"问题 | 类别 | 表示例 | 管理方式 | 约束创建 | |------|--------|---------|---------| | **业务表** | `users`, `tenants`, `ReviewTask` | Prisma Migration | ✅ 迁移文件明确包含所有约束 | | **pg-boss 表** | `queue`, `job`, `schedule` | pg-boss 运行时自动创建 | ⚠️ 仅在首次启动时创建 | **关键点**: 1. **业务表由 Prisma 管理**:通过 `prisma migrate deploy` 创建,约束完整 2. **pg-boss 表在 Prisma Schema 中定义**:但只是通过 `db pull` 拉取的,Prisma 不负责创建 3. **pg-boss 表实际由 `boss.start()` 创建**:如果表已存在,pg-boss 跳过创建步骤 ### 迁移时发生了什么? ``` 本地数据库 pg_dump RDS ┌─────────────────┐ ────────► ┌─────────────────┐ │ queue 表 │ │ queue 表 │ │ ✅ queue_pkey │ │ ❌ queue_pkey │ ← 丢失! │ ✅ 3个外键约束 │ │ ❌ 3个外键约束 │ ← 丢失! └─────────────────┘ └─────────────────┘ 原因: 1. pg_dump 导出时外键延迟处理 2. queue 表有自引用外键 (dead_letter → name) 3. 导入时约束创建顺序问题导致失败 4. pg-boss 检测到表已存在,跳过约束创建 ``` ### 自引用外键是罪魁祸首 ```sql -- queue 表的自引用外键(容易在迁移时丢失) FOREIGN KEY (dead_letter) REFERENCES platform_schema.queue(name) ``` --- ## 🔴 迁移后必须检查项 ### 1. pg-boss 4 个关键约束检查(最重要!) ```sql -- 🔴 一键检查 pg-boss 关键约束(必须返回 4 行) SELECT conname, contype, CASE contype WHEN 'p' THEN '主键' WHEN 'f' THEN '外键' END as type_desc FROM pg_constraint WHERE connamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'platform_schema') AND conname IN ( 'queue_pkey', -- 主键(防止重复队列) 'queue_dead_letter_fkey', -- 自引用外键 'schedule_name_fkey', -- schedule → queue 外键 'subscription_name_fkey' -- subscription → queue 外键 ) ORDER BY conname; -- 预期结果(必须 4 行): -- conname | contype | type_desc -- -------------------------+---------+----------- -- queue_dead_letter_fkey | f | 外键 -- queue_pkey | p | 主键 -- schedule_name_fkey | f | 外键 -- subscription_name_fkey | f | 外键 ``` **如果少于 4 行,立即修复**: ```sql -- 修复 1:添加 queue_pkey(如果缺失) -- 先清理重复数据 DELETE FROM platform_schema.queue a USING platform_schema.queue b WHERE a.name = b.name AND a.created_on < b.created_on; -- 添加主键 ALTER TABLE platform_schema.queue ADD PRIMARY KEY (name); -- 修复 2:添加 queue_dead_letter_fkey(如果缺失) ALTER TABLE platform_schema.queue ADD CONSTRAINT queue_dead_letter_fkey FOREIGN KEY (dead_letter) REFERENCES platform_schema.queue(name); -- 修复 3:添加 schedule_name_fkey(如果缺失) ALTER TABLE platform_schema.schedule ADD CONSTRAINT schedule_name_fkey FOREIGN KEY (name) REFERENCES platform_schema.queue(name) ON DELETE CASCADE; -- 修复 4:添加 subscription_name_fkey(如果缺失) ALTER TABLE platform_schema.subscription ADD CONSTRAINT subscription_name_fkey FOREIGN KEY (name) REFERENCES platform_schema.queue(name) ON DELETE CASCADE; ``` --- ### 2. 检查重复队列定义 ```sql -- 检查是否有重复队列定义 SELECT name, COUNT(*) as cnt FROM platform_schema.queue GROUP BY name HAVING COUNT(*) > 1; -- 预期结果:无返回(0 行) ``` --- ### 3. 全局约束数量对比 ```sql -- 检查各 schema 约束数量(与本地对比) SELECT n.nspname as schema, COUNT(*) as constraint_count FROM pg_constraint c JOIN pg_namespace n ON c.connamespace = n.oid WHERE n.nspname NOT IN ('pg_catalog', 'information_schema') GROUP BY n.nspname ORDER BY n.nspname; -- 🔴 platform_schema 约束数量必须是 33 -- 如果少于 33,说明有约束丢失 ``` --- ### 4. 检查所有 pg-boss 表索引 ```sql -- 列出 platform_schema 中的所有表和索引 SELECT t.tablename, COUNT(i.indexname) as index_count FROM pg_tables t LEFT JOIN pg_indexes i ON t.tablename = i.tablename AND t.schemaname = i.schemaname WHERE t.schemaname = 'platform_schema' GROUP BY t.tablename ORDER BY t.tablename; -- 关键表应该有索引: -- queue: 1 (queue_pkey) -- job_common: 多个索引 -- schedule: 索引 ``` --- ## 📋 数据库迁移完整检查清单 ### 迁移前 - [ ] 备份源数据库 - [ ] 记录源数据库的表结构和索引 - [ ] 确认 pg_dump 参数包含约束和索引 ### 迁移中 ```bash # 推荐的 pg_dump 参数(包含完整结构) pg_dump -h localhost -U postgres -d ai_clinical_research \ --no-owner \ --no-acl \ --format=plain \ --encoding=UTF8 \ > backup.sql # 不要使用 --data-only(会丢失约束) ``` ### 迁移后(🔴 关键检查) | 检查项 | SQL | 预期 | |--------|-----|------| | queue 主键 | `SELECT indexname FROM pg_indexes WHERE tablename='queue';` | `queue_pkey` | | 重复队列 | `SELECT name, COUNT(*) FROM platform_schema.queue GROUP BY name HAVING COUNT(*) > 1;` | 无返回 | | 表数量 | `SELECT COUNT(*) FROM pg_tables WHERE schemaname='platform_schema';` | 与源一致 | | 索引数量 | `SELECT COUNT(*) FROM pg_indexes WHERE schemaname='platform_schema';` | 与源一致 | --- ## 🛠️ 常见问题修复 ### 问题 1:pg-boss 表约束丢失(2026-01-27 故障) **症状**: - 每次应用启动都创建新的队列定义 - 同一任务被重复处理多次 - RVW 审稿任务无限循环 **丢失的约束(共 4 个)**: | 约束 | 类型 | 影响 | |------|------|------| | `queue_pkey` | 主键 | 🔴 导致队列定义重复 | | `queue_dead_letter_fkey` | 外键 | 死信队列引用失效 | | `schedule_name_fkey` | 外键 | 定时任务引用失效 | | `subscription_name_fkey` | 外键 | 订阅引用失效 | **一键修复脚本**: ```sql -- 🔴 执行前先检查哪些约束缺失 SELECT conname FROM pg_constraint WHERE connamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'platform_schema') AND conname IN ('queue_pkey', 'queue_dead_letter_fkey', 'schedule_name_fkey', 'subscription_name_fkey'); -- 修复步骤: -- 1. 清理重复队列定义(如果有) DELETE FROM platform_schema.queue a USING platform_schema.queue b WHERE a.name = b.name AND a.created_on < b.created_on; -- 2. 添加主键(如果缺失) ALTER TABLE platform_schema.queue ADD PRIMARY KEY (name); -- 3. 添加外键约束(如果缺失) ALTER TABLE platform_schema.queue ADD CONSTRAINT queue_dead_letter_fkey FOREIGN KEY (dead_letter) REFERENCES platform_schema.queue(name); ALTER TABLE platform_schema.schedule ADD CONSTRAINT schedule_name_fkey FOREIGN KEY (name) REFERENCES platform_schema.queue(name) ON DELETE CASCADE; ALTER TABLE platform_schema.subscription ADD CONSTRAINT subscription_name_fkey FOREIGN KEY (name) REFERENCES platform_schema.queue(name) ON DELETE CASCADE; -- 4. 验证(必须返回 4 行) SELECT conname FROM pg_constraint WHERE connamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'platform_schema') AND conname IN ('queue_pkey', 'queue_dead_letter_fkey', 'schedule_name_fkey', 'subscription_name_fkey'); ``` --- ### 问题 2:迁移后中文乱码 **症状**:用户名、租户名显示为 `????` **原因**:PowerShell 默认编码不是 UTF-8 **修复**:在 Docker 容器内执行导入导出 ```bash # 在容器内导出(绕过 PowerShell 编码问题) docker exec -it postgres-container bash -c "pg_dump ... > /tmp/backup.sql" # 在容器内导入 docker exec -i postgres-container psql ... < backup.sql ``` 详见:[05-部署文档/0126部署/08-部署完成总结.md](../05-部署文档/0126部署/08-部署完成总结.md) --- ### 问题 3:迁移后外键约束失败 **症状**:`ERROR: insert or update violates foreign key constraint` **原因**:表导入顺序错误 **修复**:使用事务或正确的依赖顺序 ```sql -- 临时禁用外键检查(谨慎使用) SET session_replication_role = replica; -- 导入数据... -- 恢复外键检查 SET session_replication_role = DEFAULT; ``` --- ## 📊 迁移对比脚本 迁移后执行此脚本,对比源和目标数据库: ```sql -- 🔴 一键健康检查(迁移后必须执行) -- 1. 全局统计(与本地对比) SELECT (SELECT COUNT(*) FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema')) as tables, (SELECT COUNT(*) FROM pg_indexes WHERE schemaname NOT IN ('pg_catalog', 'information_schema')) as indexes, (SELECT COUNT(*) FROM pg_constraint c JOIN pg_namespace n ON c.connamespace = n.oid WHERE n.nspname NOT IN ('pg_catalog', 'information_schema')) as constraints; -- 预期值(2026-01-27):tables=64, indexes=236, constraints=108 -- 2. platform_schema 约束数量 SELECT COUNT(*) as platform_constraints FROM pg_constraint c JOIN pg_namespace n ON c.connamespace = n.oid WHERE n.nspname = 'platform_schema'; -- 预期值:33 -- 3. 🔴 pg-boss 4 个关键约束 SELECT conname FROM pg_constraint WHERE connamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'platform_schema') AND conname IN ('queue_pkey', 'queue_dead_letter_fkey', 'schedule_name_fkey', 'subscription_name_fkey'); -- 预期:4 行 -- 4. 重复队列检查 SELECT name, COUNT(*) FROM platform_schema.queue GROUP BY name HAVING COUNT(*) > 1; -- 预期:0 行 ``` ### 本地 vs RDS 快速对比命令 ```bash # 本地 docker exec ai-clinical-postgres psql "postgresql://postgres:postgres123@localhost:5432/ai_clinical_research" -c " SELECT (SELECT COUNT(*) FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema')) as tables, (SELECT COUNT(*) FROM pg_indexes WHERE schemaname NOT IN ('pg_catalog', 'information_schema')) as indexes, (SELECT COUNT(*) FROM pg_constraint c JOIN pg_namespace n ON c.connamespace = n.oid WHERE n.nspname NOT IN ('pg_catalog', 'information_schema')) as constraints; " # RDS docker exec ai-clinical-postgres psql "postgresql://airesearch:Xibahe%40fengzhibo117@pgm-2zex1m2y3r23hdn5xo.pg.rds.aliyuncs.com:5432/ai_clinical_research_test" -c " SELECT (SELECT COUNT(*) FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema')) as tables, (SELECT COUNT(*) FROM pg_indexes WHERE schemaname NOT IN ('pg_catalog', 'information_schema')) as indexes, (SELECT COUNT(*) FROM pg_constraint c JOIN pg_namespace n ON c.connamespace = n.oid WHERE n.nspname NOT IN ('pg_catalog', 'information_schema')) as constraints; " ``` --- ## 📚 相关文档 - [01-PgBoss队列监控与维护](./01-PgBoss队列监控与维护.md) - 队列日常监控 - [02-故障预防检查清单](./02-故障预防检查清单.md) - 部署检查清单 - [故障分析报告](../06-测试文档/故障分析报告%20(1).md) - 原始故障分析 --- ## 📝 变更记录 | 日期 | 版本 | 变更内容 | |------|------|---------| | 2026-01-27 | v1.1 | 补充 3 个外键约束丢失分析,添加"双重管理者"问题说明 | | 2026-01-27 | v1.0 | 初始版本,基于 RVW 任务无限循环故障 |