Files
AIclinicalresearch/docs/07-运维文档
HaHafeng 19f9c5ea93 docs(deployment): Fix 8 critical deployment issues and enhance documentation
Summary of fixes:
- Fix service discovery address (change .sae domain to internal IP)
- Unify timezone configuration (Asia/Shanghai for all services)
- Enhance ECS security group configuration (Redis/Weaviate port binding)
- Add image pull strategy best practices
- Add Python service memory management guidelines
- Update Dify API Key deployment strategy (avoid deadlock)
- Add SSH tunnel for RDS database access
- Add NAT gateway cost optimization explanation

Modified files (7 docs):
- 00-部署架构总览.md (enhanced with 7 sections)
- 03-Dify-ECS部署完全指南.md (security hardening)
- 04-Python微服务-SAE容器部署指南.md (timezone + service discovery)
- 05-Node.js后端-SAE容器部署指南.md (timezone configuration)
- PostgreSQL部署策略-摸底报告.md (timezone best practice)
- 07-关键配置补充说明.md (3 new sections)
- 08-部署检查清单.md (service address fix)

New files:
- 文档修正报告-20251214.md (comprehensive fix report)
- Review documents from technical team

Impact:
- Fixed 3 P0/P1 critical issues (100% connection failure risk)
- Fixed 3 P2 important issues (stability and maintainability)
- Added 2 P3 best practices (developer convenience)

Status: All deployment documents reviewed and corrected, ready for production deployment
2025-12-14 13:25:28 +08:00
..

运维文档

文档定位: 系统运维、监控、故障排查
适用范围: 运维团队、SRE团队


📋 运维文档清单

文档 说明 状态
01-环境配置指南.md 环境变量、数据库连接、API密钥配置 已完成
02-环境变量配置模板.md .env配置模板含CloseAI配置 已完成
03-监控告警.md 监控指标、告警规则 待创建
04-故障排查.md 常见问题排查手册 待创建
05-备份恢复.md 数据备份和恢复策略 待创建

🎯 核心运维任务

1. 监控

  • 系统健康检查
  • 性能监控
  • 告警通知

2. 日志

  • 日志收集
  • 日志分析
  • 日志归档

3. 备份

  • 数据库备份
  • 文件备份
  • 恢复演练

4. 故障处理

  • 故障诊断
  • 应急预案
  • 事后总结

最后更新: 2025-11-06
维护人: 技术架构师