Files
AIclinicalresearch/docs/07-运维文档/03-数据库迁移注意事项.md
HaHafeng bbf98c4d5c fix(backend): Resolve PgBoss infinite loop issue and cleanup unused files
Backend fixes:
- Fix PgBoss task infinite loop on SAE (root cause: missing queue table constraints)
- Add singletonKey to prevent duplicate job enqueueing
- Add idempotency check in reviewWorker (skip completed tasks)
- Add optimistic locking in reviewService (atomic status update)

Frontend fixes:
- Add isSubmitting state to prevent duplicate submissions in RVW Dashboard
- Fix API baseURL in knowledgeBaseApi (relative path)

Cleanup (removed):
- Old frontend/ directory (migrated to frontend-v2)
- python-microservice/ (unused, replaced by extraction_service)
- Root package.json and node_modules (accidentally created)
- redcap-docker-dev/ (external dependency)
- Various temporary files and outdated docs in root

New documentation:
- docs/07-运维文档/01-PgBoss队列监控与维护.md
- docs/07-运维文档/02-故障预防检查清单.md
- docs/07-运维文档/03-数据库迁移注意事项.md

Database fix applied to RDS:
- Added PRIMARY KEY to platform_schema.queue
- Added 3 missing foreign key constraints

Tested: Local build passed, RDS constraints verified
2026-01-27 18:16:22 +08:00

12 KiB
Raw Permalink Blame History

数据库迁移注意事项

文档版本v1.1
创建日期2026-01-27
最后更新2026-01-27
基于故障2026-01-27 RVW 任务无限循环故障根因pg-boss 表约束丢失)


📋 背景

故障复盘

现象RVW 审稿模块任务完成后继续无限循环执行,同一 taskId 被处理 7 次

根因:数据库从本地迁移到 RDS 时,platform_schemapg-boss 表的约束丢失

丢失的约束 类型 影响
queue_pkey 主键 🔴 导致队列定义重复
queue_dead_letter_fkey 外键 死信队列引用完整性丢失
schedule_name_fkey 外键 定时任务引用完整性丢失
subscription_name_fkey 外键 订阅引用完整性丢失

影响:每次 SAE 重启都会创建新的队列定义,导致同一任务被入队多次


🔍 为什么只有 pg-boss 相关约束丢失?

pg-boss 表的"双重管理者"问题

类别 表示例 管理方式 约束创建
业务表 users, tenants, ReviewTask Prisma Migration 迁移文件明确包含所有约束
pg-boss 表 queue, job, schedule pg-boss 运行时自动创建 ⚠️ 仅在首次启动时创建

关键点

  1. 业务表由 Prisma 管理:通过 prisma migrate deploy 创建,约束完整
  2. pg-boss 表在 Prisma Schema 中定义:但只是通过 db pull 拉取的Prisma 不负责创建
  3. pg-boss 表实际由 boss.start() 创建如果表已存在pg-boss 跳过创建步骤

迁移时发生了什么?

本地数据库                    pg_dump                      RDS
┌─────────────────┐          ────────►          ┌─────────────────┐
│ queue 表        │                              │ queue 表        │
│ ✅ queue_pkey   │                              │ ❌ queue_pkey   │  ← 丢失!
│ ✅ 3个外键约束  │                              │ ❌ 3个外键约束  │  ← 丢失!
└─────────────────┘                              └─────────────────┘

原因:
1. pg_dump 导出时外键延迟处理
2. queue 表有自引用外键 (dead_letter → name)
3. 导入时约束创建顺序问题导致失败
4. pg-boss 检测到表已存在,跳过约束创建

自引用外键是罪魁祸首

-- queue 表的自引用外键(容易在迁移时丢失)
FOREIGN KEY (dead_letter) REFERENCES platform_schema.queue(name)

🔴 迁移后必须检查项

1. pg-boss 4 个关键约束检查(最重要!)

-- 🔴 一键检查 pg-boss 关键约束(必须返回 4 行)
SELECT conname, contype, 
  CASE contype WHEN 'p' THEN '主键' WHEN 'f' THEN '外键' END as type_desc
FROM pg_constraint 
WHERE connamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'platform_schema')
  AND conname IN (
    'queue_pkey',              -- 主键(防止重复队列)
    'queue_dead_letter_fkey',  -- 自引用外键
    'schedule_name_fkey',      -- schedule → queue 外键
    'subscription_name_fkey'   -- subscription → queue 外键
  )
ORDER BY conname;

-- 预期结果(必须 4 行):
--         conname          | contype | type_desc 
-- -------------------------+---------+-----------
--  queue_dead_letter_fkey  | f       | 外键
--  queue_pkey              | p       | 主键
--  schedule_name_fkey      | f       | 外键
--  subscription_name_fkey  | f       | 外键

如果少于 4 行,立即修复

-- 修复 1添加 queue_pkey如果缺失
-- 先清理重复数据
DELETE FROM platform_schema.queue a
USING platform_schema.queue b
WHERE a.name = b.name 
  AND a.created_on < b.created_on;
-- 添加主键
ALTER TABLE platform_schema.queue ADD PRIMARY KEY (name);

-- 修复 2添加 queue_dead_letter_fkey如果缺失
ALTER TABLE platform_schema.queue 
ADD CONSTRAINT queue_dead_letter_fkey 
FOREIGN KEY (dead_letter) REFERENCES platform_schema.queue(name);

-- 修复 3添加 schedule_name_fkey如果缺失
ALTER TABLE platform_schema.schedule 
ADD CONSTRAINT schedule_name_fkey 
FOREIGN KEY (name) REFERENCES platform_schema.queue(name) ON DELETE CASCADE;

-- 修复 4添加 subscription_name_fkey如果缺失
ALTER TABLE platform_schema.subscription 
ADD CONSTRAINT subscription_name_fkey 
FOREIGN KEY (name) REFERENCES platform_schema.queue(name) ON DELETE CASCADE;

2. 检查重复队列定义

-- 检查是否有重复队列定义
SELECT name, COUNT(*) as cnt 
FROM platform_schema.queue 
GROUP BY name 
HAVING COUNT(*) > 1;

-- 预期结果无返回0 行)

3. 全局约束数量对比

-- 检查各 schema 约束数量(与本地对比)
SELECT n.nspname as schema, COUNT(*) as constraint_count
FROM pg_constraint c
JOIN pg_namespace n ON c.connamespace = n.oid
WHERE n.nspname NOT IN ('pg_catalog', 'information_schema')
GROUP BY n.nspname
ORDER BY n.nspname;

-- 🔴 platform_schema 约束数量必须是 33
-- 如果少于 33说明有约束丢失

4. 检查所有 pg-boss 表索引

-- 列出 platform_schema 中的所有表和索引
SELECT 
  t.tablename,
  COUNT(i.indexname) as index_count
FROM pg_tables t
LEFT JOIN pg_indexes i ON t.tablename = i.tablename AND t.schemaname = i.schemaname
WHERE t.schemaname = 'platform_schema'
GROUP BY t.tablename
ORDER BY t.tablename;

-- 关键表应该有索引:
-- queue: 1 (queue_pkey)
-- job_common: 多个索引
-- schedule: 索引

📋 数据库迁移完整检查清单

迁移前

  • 备份源数据库
  • 记录源数据库的表结构和索引
  • 确认 pg_dump 参数包含约束和索引

迁移中

# 推荐的 pg_dump 参数(包含完整结构)
pg_dump -h localhost -U postgres -d ai_clinical_research \
  --no-owner \
  --no-acl \
  --format=plain \
  --encoding=UTF8 \
  > backup.sql

# 不要使用 --data-only会丢失约束

迁移后(🔴 关键检查)

检查项 SQL 预期
queue 主键 SELECT indexname FROM pg_indexes WHERE tablename='queue'; queue_pkey
重复队列 SELECT name, COUNT(*) FROM platform_schema.queue GROUP BY name HAVING COUNT(*) > 1; 无返回
表数量 SELECT COUNT(*) FROM pg_tables WHERE schemaname='platform_schema'; 与源一致
索引数量 SELECT COUNT(*) FROM pg_indexes WHERE schemaname='platform_schema'; 与源一致

🛠️ 常见问题修复

问题 1pg-boss 表约束丢失2026-01-27 故障)

症状

  • 每次应用启动都创建新的队列定义
  • 同一任务被重复处理多次
  • RVW 审稿任务无限循环

丢失的约束(共 4 个)

约束 类型 影响
queue_pkey 主键 🔴 导致队列定义重复
queue_dead_letter_fkey 外键 死信队列引用失效
schedule_name_fkey 外键 定时任务引用失效
subscription_name_fkey 外键 订阅引用失效

一键修复脚本

-- 🔴 执行前先检查哪些约束缺失
SELECT conname FROM pg_constraint 
WHERE connamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'platform_schema')
  AND conname IN ('queue_pkey', 'queue_dead_letter_fkey', 'schedule_name_fkey', 'subscription_name_fkey');

-- 修复步骤:

-- 1. 清理重复队列定义(如果有)
DELETE FROM platform_schema.queue a
USING platform_schema.queue b
WHERE a.name = b.name AND a.created_on < b.created_on;

-- 2. 添加主键(如果缺失)
ALTER TABLE platform_schema.queue ADD PRIMARY KEY (name);

-- 3. 添加外键约束(如果缺失)
ALTER TABLE platform_schema.queue 
ADD CONSTRAINT queue_dead_letter_fkey 
FOREIGN KEY (dead_letter) REFERENCES platform_schema.queue(name);

ALTER TABLE platform_schema.schedule 
ADD CONSTRAINT schedule_name_fkey 
FOREIGN KEY (name) REFERENCES platform_schema.queue(name) ON DELETE CASCADE;

ALTER TABLE platform_schema.subscription 
ADD CONSTRAINT subscription_name_fkey 
FOREIGN KEY (name) REFERENCES platform_schema.queue(name) ON DELETE CASCADE;

-- 4. 验证(必须返回 4 行)
SELECT conname FROM pg_constraint 
WHERE connamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'platform_schema')
  AND conname IN ('queue_pkey', 'queue_dead_letter_fkey', 'schedule_name_fkey', 'subscription_name_fkey');

问题 2迁移后中文乱码

症状:用户名、租户名显示为 ????

原因PowerShell 默认编码不是 UTF-8

修复:在 Docker 容器内执行导入导出

# 在容器内导出(绕过 PowerShell 编码问题)
docker exec -it postgres-container bash -c "pg_dump ... > /tmp/backup.sql"

# 在容器内导入
docker exec -i postgres-container psql ... < backup.sql

详见:05-部署文档/0126部署/08-部署完成总结.md


问题 3迁移后外键约束失败

症状ERROR: insert or update violates foreign key constraint

原因:表导入顺序错误

修复:使用事务或正确的依赖顺序

-- 临时禁用外键检查(谨慎使用)
SET session_replication_role = replica;

-- 导入数据...

-- 恢复外键检查
SET session_replication_role = DEFAULT;

📊 迁移对比脚本

迁移后执行此脚本,对比源和目标数据库:

-- 🔴 一键健康检查(迁移后必须执行)

-- 1. 全局统计(与本地对比)
SELECT 
  (SELECT COUNT(*) FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema')) as tables,
  (SELECT COUNT(*) FROM pg_indexes WHERE schemaname NOT IN ('pg_catalog', 'information_schema')) as indexes,
  (SELECT COUNT(*) FROM pg_constraint c JOIN pg_namespace n ON c.connamespace = n.oid WHERE n.nspname NOT IN ('pg_catalog', 'information_schema')) as constraints;

-- 预期值2026-01-27tables=64, indexes=236, constraints=108

-- 2. platform_schema 约束数量
SELECT COUNT(*) as platform_constraints
FROM pg_constraint c
JOIN pg_namespace n ON c.connamespace = n.oid
WHERE n.nspname = 'platform_schema';

-- 预期值33

-- 3. 🔴 pg-boss 4 个关键约束
SELECT conname FROM pg_constraint 
WHERE connamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'platform_schema')
  AND conname IN ('queue_pkey', 'queue_dead_letter_fkey', 'schedule_name_fkey', 'subscription_name_fkey');

-- 预期4 行

-- 4. 重复队列检查
SELECT name, COUNT(*) FROM platform_schema.queue GROUP BY name HAVING COUNT(*) > 1;

-- 预期0 行

本地 vs RDS 快速对比命令

# 本地
docker exec ai-clinical-postgres psql "postgresql://postgres:postgres123@localhost:5432/ai_clinical_research" -c "
SELECT 
  (SELECT COUNT(*) FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema')) as tables,
  (SELECT COUNT(*) FROM pg_indexes WHERE schemaname NOT IN ('pg_catalog', 'information_schema')) as indexes,
  (SELECT COUNT(*) FROM pg_constraint c JOIN pg_namespace n ON c.connamespace = n.oid WHERE n.nspname NOT IN ('pg_catalog', 'information_schema')) as constraints;
"

# RDS
docker exec ai-clinical-postgres psql "postgresql://airesearch:Xibahe%40fengzhibo117@pgm-2zex1m2y3r23hdn5xo.pg.rds.aliyuncs.com:5432/ai_clinical_research_test" -c "
SELECT 
  (SELECT COUNT(*) FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema')) as tables,
  (SELECT COUNT(*) FROM pg_indexes WHERE schemaname NOT IN ('pg_catalog', 'information_schema')) as indexes,
  (SELECT COUNT(*) FROM pg_constraint c JOIN pg_namespace n ON c.connamespace = n.oid WHERE n.nspname NOT IN ('pg_catalog', 'information_schema')) as constraints;
"

📚 相关文档


📝 变更记录

日期 版本 变更内容
2026-01-27 v1.1 补充 3 个外键约束丢失分析,添加"双重管理者"问题说明
2026-01-27 v1.0 初始版本,基于 RVW 任务无限循环故障