feat(platform): Complete Postgres-Only architecture refactoring (Phase 1-7)
Major Changes: - Implement Platform-Only architecture pattern (unified task management) - Add PostgresCacheAdapter for unified caching (platform_schema.app_cache) - Add PgBossQueue for job queue management (platform_schema.job) - Implement CheckpointService using job.data (generic for all modules) - Add intelligent threshold-based dual-mode processing (THRESHOLD=50) - Add task splitting mechanism (auto chunk size recommendation) - Refactor ASL screening service with smart mode selection - Refactor DC extraction service with smart mode selection - Register workers for ASL and DC modules Technical Highlights: - All task management data stored in platform_schema.job.data (JSONB) - Business tables remain clean (no task management fields) - CheckpointService is generic (shared by all modules) - Zero code duplication (DRY principle) - Follows 3-layer architecture principle - Zero additional cost (no Redis needed, save 8400 CNY/year) Code Statistics: - New code: ~1750 lines - Modified code: ~500 lines - Test code: ~1800 lines - Documentation: ~3000 lines Testing: - Unit tests: 8/8 passed - Integration tests: 2/2 passed - Architecture validation: passed - Linter errors: 0 Files: - Platform layer: PostgresCacheAdapter, PgBossQueue, CheckpointService, utils - ASL module: screeningService, screeningWorker - DC module: ExtractionController, extractionWorker - Tests: 11 test files - Docs: Updated 4 key documents Status: Phase 1-7 completed, Phase 8-9 pending
This commit is contained in:
@@ -1,9 +1,10 @@
|
||||
# AI智能文献模块 - 当前状态与开发指南
|
||||
|
||||
> **文档版本:** v1.3
|
||||
> **文档版本:** v1.4
|
||||
> **创建日期:** 2025-11-21
|
||||
> **维护者:** AI智能文献开发团队
|
||||
> **最后更新:** 2025-11-23 (Day 5完成后)
|
||||
> **最后更新:** 2025-12-13 🏆 **Postgres-Only 架构改造完成**
|
||||
> **重大进展:** Platform-Only 架构改造 - 智能双模式处理、任务拆分、断点续传
|
||||
> **文档目的:** 反映模块真实状态,帮助新开发人员快速上手
|
||||
|
||||
---
|
||||
@@ -35,6 +36,50 @@ AI智能文献模块是一个基于大语言模型(LLM)的文献筛选系统
|
||||
- **模型支持**:DeepSeek-V3 + Qwen-Max 双模型筛选
|
||||
- **部署状态**:✅ 本地开发环境运行正常
|
||||
|
||||
### 🏆 Postgres-Only 架构改造(2025-12-13完成)
|
||||
|
||||
**改造目标:**
|
||||
- 支持2-24小时的长时间任务(1000篇文献筛选)
|
||||
- 实例重启后任务可恢复(断点续传)
|
||||
- 零额外成本(使用 Postgres,不需要 Redis)
|
||||
|
||||
**核心实现:**
|
||||
|
||||
1. **智能双模式处理** 🎯
|
||||
- 阈值:50篇文献
|
||||
- 小任务(<50篇):直接处理,快速响应(<1分钟)
|
||||
- 大任务(≥50篇):队列处理,可靠性高(支持断点续传)
|
||||
|
||||
2. **任务拆分机制** 📦
|
||||
- 100篇 → 2个批次(每批50篇)
|
||||
- 1000篇 → 20个批次(每批50篇)
|
||||
- 自动推荐批次大小
|
||||
|
||||
3. **断点续传机制** 🔄
|
||||
- 每10篇文献保存一次断点
|
||||
- 断点数据存储在 `platform_schema.job.data`(pg-boss)
|
||||
- 实例重启后自动从上次位置继续
|
||||
|
||||
4. **Platform层统一管理** 🏗️
|
||||
- 任务管理信息不存储在 `asl_schema.screening_tasks`
|
||||
- 统一存储在 `platform_schema.job.data`(JSONB)
|
||||
- 使用 `CheckpointService` 操作 job.data(所有模块通用)
|
||||
|
||||
**改造文件:**
|
||||
- `screeningService.ts`:添加智能阈值判断,推送批次任务到 pg-boss
|
||||
- `screeningWorker.ts`:批次处理逻辑,断点续传实现
|
||||
- `CheckpointService.ts`:操作 job.data,不依赖业务表
|
||||
|
||||
**测试验证:**
|
||||
- ✅ 小任务(7篇)- 直接模式测试通过
|
||||
- ✅ 大任务(100篇)- 队列模式测试通过
|
||||
- ✅ 任务拆分逻辑验证通过
|
||||
- ✅ Platform-Only 架构验证通过
|
||||
|
||||
**技术债务:**
|
||||
- ⚠️ Phase 8 全面测试(断点续传压力测试、1000篇文献完整流程)
|
||||
- ⚠️ Phase 9 SAE 部署验证
|
||||
|
||||
### 关键里程碑
|
||||
|
||||
**标题摘要初筛(已完成)**:
|
||||
|
||||
Reference in New Issue
Block a user