feat(platform): Complete Postgres-Only architecture refactoring (Phase 1-7)

Major Changes:
- Implement Platform-Only architecture pattern (unified task management)
- Add PostgresCacheAdapter for unified caching (platform_schema.app_cache)
- Add PgBossQueue for job queue management (platform_schema.job)
- Implement CheckpointService using job.data (generic for all modules)
- Add intelligent threshold-based dual-mode processing (THRESHOLD=50)
- Add task splitting mechanism (auto chunk size recommendation)
- Refactor ASL screening service with smart mode selection
- Refactor DC extraction service with smart mode selection
- Register workers for ASL and DC modules

Technical Highlights:
- All task management data stored in platform_schema.job.data (JSONB)
- Business tables remain clean (no task management fields)
- CheckpointService is generic (shared by all modules)
- Zero code duplication (DRY principle)
- Follows 3-layer architecture principle
- Zero additional cost (no Redis needed, save 8400 CNY/year)

Code Statistics:
- New code: ~1750 lines
- Modified code: ~500 lines
- Test code: ~1800 lines
- Documentation: ~3000 lines

Testing:
- Unit tests: 8/8 passed
- Integration tests: 2/2 passed
- Architecture validation: passed
- Linter errors: 0

Files:
- Platform layer: PostgresCacheAdapter, PgBossQueue, CheckpointService, utils
- ASL module: screeningService, screeningWorker
- DC module: ExtractionController, extractionWorker
- Tests: 11 test files
- Docs: Updated 4 key documents

Status: Phase 1-7 completed, Phase 8-9 pending
This commit is contained in:
2025-12-13 16:10:04 +08:00
parent a3586cdf30
commit fa72beea6c
135 changed files with 17508 additions and 91 deletions

View File

@@ -1,9 +1,10 @@
# AI智能文献模块 - 当前状态与开发指南
> **文档版本:** v1.3
> **文档版本:** v1.4
> **创建日期:** 2025-11-21
> **维护者:** AI智能文献开发团队
> **最后更新:** 2025-11-23 (Day 5完成后)
> **最后更新:** 2025-12-13 🏆 **Postgres-Only 架构改造完成**
> **重大进展:** Platform-Only 架构改造 - 智能双模式处理、任务拆分、断点续传
> **文档目的:** 反映模块真实状态,帮助新开发人员快速上手
---
@@ -35,6 +36,50 @@ AI智能文献模块是一个基于大语言模型LLM的文献筛选系统
- **模型支持**DeepSeek-V3 + Qwen-Max 双模型筛选
- **部署状态**:✅ 本地开发环境运行正常
### 🏆 Postgres-Only 架构改造2025-12-13完成
**改造目标:**
- 支持2-24小时的长时间任务1000篇文献筛选
- 实例重启后任务可恢复(断点续传)
- 零额外成本(使用 Postgres不需要 Redis
**核心实现:**
1. **智能双模式处理** 🎯
- 阈值50篇文献
- 小任务(<50篇直接处理快速响应<1分钟
- 大任务≥50篇队列处理可靠性高支持断点续传
2. **任务拆分机制** 📦
- 100篇 → 2个批次每批50篇
- 1000篇 → 20个批次每批50篇
- 自动推荐批次大小
3. **断点续传机制** 🔄
- 每10篇文献保存一次断点
- 断点数据存储在 `platform_schema.job.data`pg-boss
- 实例重启后自动从上次位置继续
4. **Platform层统一管理** 🏗️
- 任务管理信息不存储在 `asl_schema.screening_tasks`
- 统一存储在 `platform_schema.job.data`JSONB
- 使用 `CheckpointService` 操作 job.data所有模块通用
**改造文件:**
- `screeningService.ts`:添加智能阈值判断,推送批次任务到 pg-boss
- `screeningWorker.ts`:批次处理逻辑,断点续传实现
- `CheckpointService.ts`:操作 job.data不依赖业务表
**测试验证:**
- ✅ 小任务7篇- 直接模式测试通过
- ✅ 大任务100篇- 队列模式测试通过
- ✅ 任务拆分逻辑验证通过
- ✅ Platform-Only 架构验证通过
**技术债务:**
- ⚠️ Phase 8 全面测试断点续传压力测试、1000篇文献完整流程
- ⚠️ Phase 9 SAE 部署验证
### 关键里程碑
**标题摘要初筛(已完成)**: