Backend fixes: - Fix PgBoss task infinite loop on SAE (root cause: missing queue table constraints) - Add singletonKey to prevent duplicate job enqueueing - Add idempotency check in reviewWorker (skip completed tasks) - Add optimistic locking in reviewService (atomic status update) Frontend fixes: - Add isSubmitting state to prevent duplicate submissions in RVW Dashboard - Fix API baseURL in knowledgeBaseApi (relative path) Cleanup (removed): - Old frontend/ directory (migrated to frontend-v2) - python-microservice/ (unused, replaced by extraction_service) - Root package.json and node_modules (accidentally created) - redcap-docker-dev/ (external dependency) - Various temporary files and outdated docs in root New documentation: - docs/07-运维文档/01-PgBoss队列监控与维护.md - docs/07-运维文档/02-故障预防检查清单.md - docs/07-运维文档/03-数据库迁移注意事项.md Database fix applied to RDS: - Added PRIMARY KEY to platform_schema.queue - Added 3 missing foreign key constraints Tested: Local build passed, RDS constraints verified
60 lines
1.7 KiB
Markdown
60 lines
1.7 KiB
Markdown
# 运维文档
|
||
|
||
> **文档目的**:记录系统运维相关的监控、故障排查、预防措施等
|
||
> **创建日期**:2026-01-27
|
||
> **维护者**:运维团队
|
||
|
||
---
|
||
|
||
## 📚 文档索引
|
||
|
||
| 文档 | 说明 | 优先级 |
|
||
|------|------|--------|
|
||
| [01-PgBoss队列监控与维护](./01-PgBoss队列监控与维护.md) | pg-boss 任务队列的监控、清理、故障排查 | 🔴 高 |
|
||
| [02-故障预防检查清单](./02-故障预防检查清单.md) | 部署前/后的检查清单,预防常见故障 | 🔴 高 |
|
||
| [03-数据库迁移注意事项](./03-数据库迁移注意事项.md) | 数据库迁移时的检查项,避免约束丢失 | 🔴 高 |
|
||
|
||
---
|
||
|
||
## 🔧 快速参考
|
||
|
||
### 日常检查 SQL
|
||
|
||
```sql
|
||
-- 检查重复队列定义
|
||
SELECT name, COUNT(*) as cnt
|
||
FROM platform_schema.queue
|
||
GROUP BY name
|
||
HAVING COUNT(*) > 1;
|
||
|
||
-- 检查任务状态分布
|
||
SELECT name, state, COUNT(*)
|
||
FROM platform_schema.job_common
|
||
GROUP BY name, state
|
||
ORDER BY name, state;
|
||
```
|
||
|
||
### 紧急故障处理
|
||
|
||
1. **任务无限循环** → 参考 [01-PgBoss队列监控与维护](./01-PgBoss队列监控与维护.md)
|
||
2. **数据库连接满** → 参考 [03-数据库运维手册](./03-数据库运维手册.md)
|
||
3. **服务不可用** → 重启 SAE 应用,检查日志
|
||
|
||
---
|
||
|
||
## 📈 监控告警
|
||
|
||
| 监控项 | 阈值 | 处理方式 |
|
||
|--------|------|---------|
|
||
| 队列重复定义 | > 1 | 清理重复条目 |
|
||
| 活跃任务数 | > 100 | 检查是否有任务卡住 |
|
||
| 数据库连接数 | > 80% | 检查连接泄漏 |
|
||
|
||
---
|
||
|
||
## 📝 相关文档
|
||
|
||
- [部署文档](../05-部署文档/README.md)
|
||
- [测试文档](../06-测试文档/README.md)
|
||
- [故障分析报告](../06-测试文档/故障分析报告%20(1).md)
|