Backend fixes: - Fix PgBoss task infinite loop on SAE (root cause: missing queue table constraints) - Add singletonKey to prevent duplicate job enqueueing - Add idempotency check in reviewWorker (skip completed tasks) - Add optimistic locking in reviewService (atomic status update) Frontend fixes: - Add isSubmitting state to prevent duplicate submissions in RVW Dashboard - Fix API baseURL in knowledgeBaseApi (relative path) Cleanup (removed): - Old frontend/ directory (migrated to frontend-v2) - python-microservice/ (unused, replaced by extraction_service) - Root package.json and node_modules (accidentally created) - redcap-docker-dev/ (external dependency) - Various temporary files and outdated docs in root New documentation: - docs/07-运维文档/01-PgBoss队列监控与维护.md - docs/07-运维文档/02-故障预防检查清单.md - docs/07-运维文档/03-数据库迁移注意事项.md Database fix applied to RDS: - Added PRIMARY KEY to platform_schema.queue - Added 3 missing foreign key constraints Tested: Local build passed, RDS constraints verified
1.7 KiB
1.7 KiB
运维文档
文档目的:记录系统运维相关的监控、故障排查、预防措施等
创建日期:2026-01-27
维护者:运维团队
📚 文档索引
| 文档 | 说明 | 优先级 |
|---|---|---|
| 01-PgBoss队列监控与维护 | pg-boss 任务队列的监控、清理、故障排查 | 🔴 高 |
| 02-故障预防检查清单 | 部署前/后的检查清单,预防常见故障 | 🔴 高 |
| 03-数据库迁移注意事项 | 数据库迁移时的检查项,避免约束丢失 | 🔴 高 |
🔧 快速参考
日常检查 SQL
-- 检查重复队列定义
SELECT name, COUNT(*) as cnt
FROM platform_schema.queue
GROUP BY name
HAVING COUNT(*) > 1;
-- 检查任务状态分布
SELECT name, state, COUNT(*)
FROM platform_schema.job_common
GROUP BY name, state
ORDER BY name, state;
紧急故障处理
- 任务无限循环 → 参考 01-PgBoss队列监控与维护
- 数据库连接满 → 参考 03-数据库运维手册
- 服务不可用 → 重启 SAE 应用,检查日志
📈 监控告警
| 监控项 | 阈值 | 处理方式 |
|---|---|---|
| 队列重复定义 | > 1 | 清理重复条目 |
| 活跃任务数 | > 100 | 检查是否有任务卡住 |
| 数据库连接数 | > 80% | 检查连接泄漏 |