fix(backend): Resolve PgBoss infinite loop issue and cleanup unused files
Backend fixes: - Fix PgBoss task infinite loop on SAE (root cause: missing queue table constraints) - Add singletonKey to prevent duplicate job enqueueing - Add idempotency check in reviewWorker (skip completed tasks) - Add optimistic locking in reviewService (atomic status update) Frontend fixes: - Add isSubmitting state to prevent duplicate submissions in RVW Dashboard - Fix API baseURL in knowledgeBaseApi (relative path) Cleanup (removed): - Old frontend/ directory (migrated to frontend-v2) - python-microservice/ (unused, replaced by extraction_service) - Root package.json and node_modules (accidentally created) - redcap-docker-dev/ (external dependency) - Various temporary files and outdated docs in root New documentation: - docs/07-运维文档/01-PgBoss队列监控与维护.md - docs/07-运维文档/02-故障预防检查清单.md - docs/07-运维文档/03-数据库迁移注意事项.md Database fix applied to RDS: - Added PRIMARY KEY to platform_schema.queue - Added 3 missing foreign key constraints Tested: Local build passed, RDS constraints verified
This commit is contained in:
@@ -1,50 +1,59 @@
|
||||
# 运维文档
|
||||
|
||||
> **文档定位:** 系统运维、监控、故障排查
|
||||
> **适用范围:** 运维团队、SRE团队
|
||||
> **文档目的**:记录系统运维相关的监控、故障排查、预防措施等
|
||||
> **创建日期**:2026-01-27
|
||||
> **维护者**:运维团队
|
||||
|
||||
---
|
||||
|
||||
## 📋 运维文档清单
|
||||
## 📚 文档索引
|
||||
|
||||
| 文档 | 说明 | 状态 |
|
||||
|------|------|------|
|
||||
| **01-环境配置指南.md** | 环境变量、数据库连接、API密钥配置 | ✅ 已完成 |
|
||||
| **02-环境变量配置模板.md** | .env配置模板,含CloseAI配置 ⭐ | ✅ 已完成 |
|
||||
| **03-监控告警.md** | 监控指标、告警规则 | ⏳ 待创建 |
|
||||
| **04-故障排查.md** | 常见问题排查手册 | ⏳ 待创建 |
|
||||
| **05-备份恢复.md** | 数据备份和恢复策略 | ⏳ 待创建 |
|
||||
| 文档 | 说明 | 优先级 |
|
||||
|------|------|--------|
|
||||
| [01-PgBoss队列监控与维护](./01-PgBoss队列监控与维护.md) | pg-boss 任务队列的监控、清理、故障排查 | 🔴 高 |
|
||||
| [02-故障预防检查清单](./02-故障预防检查清单.md) | 部署前/后的检查清单,预防常见故障 | 🔴 高 |
|
||||
| [03-数据库迁移注意事项](./03-数据库迁移注意事项.md) | 数据库迁移时的检查项,避免约束丢失 | 🔴 高 |
|
||||
|
||||
---
|
||||
|
||||
## 🎯 核心运维任务
|
||||
## 🔧 快速参考
|
||||
|
||||
### 1. 监控
|
||||
- 系统健康检查
|
||||
- 性能监控
|
||||
- 告警通知
|
||||
### 日常检查 SQL
|
||||
|
||||
### 2. 日志
|
||||
- 日志收集
|
||||
- 日志分析
|
||||
- 日志归档
|
||||
```sql
|
||||
-- 检查重复队列定义
|
||||
SELECT name, COUNT(*) as cnt
|
||||
FROM platform_schema.queue
|
||||
GROUP BY name
|
||||
HAVING COUNT(*) > 1;
|
||||
|
||||
### 3. 备份
|
||||
- 数据库备份
|
||||
- 文件备份
|
||||
- 恢复演练
|
||||
-- 检查任务状态分布
|
||||
SELECT name, state, COUNT(*)
|
||||
FROM platform_schema.job_common
|
||||
GROUP BY name, state
|
||||
ORDER BY name, state;
|
||||
```
|
||||
|
||||
### 4. 故障处理
|
||||
- 故障诊断
|
||||
- 应急预案
|
||||
- 事后总结
|
||||
### 紧急故障处理
|
||||
|
||||
1. **任务无限循环** → 参考 [01-PgBoss队列监控与维护](./01-PgBoss队列监控与维护.md)
|
||||
2. **数据库连接满** → 参考 [03-数据库运维手册](./03-数据库运维手册.md)
|
||||
3. **服务不可用** → 重启 SAE 应用,检查日志
|
||||
|
||||
---
|
||||
|
||||
**最后更新:** 2025-11-06
|
||||
**维护人:** 技术架构师
|
||||
|
||||
## 📈 监控告警
|
||||
|
||||
| 监控项 | 阈值 | 处理方式 |
|
||||
|--------|------|---------|
|
||||
| 队列重复定义 | > 1 | 清理重复条目 |
|
||||
| 活跃任务数 | > 100 | 检查是否有任务卡住 |
|
||||
| 数据库连接数 | > 80% | 检查连接泄漏 |
|
||||
|
||||
---
|
||||
|
||||
## 📝 相关文档
|
||||
|
||||
- [部署文档](../05-部署文档/README.md)
|
||||
- [测试文档](../06-测试文档/README.md)
|
||||
- [故障分析报告](../06-测试文档/故障分析报告%20(1).md)
|
||||
|
||||
Reference in New Issue
Block a user