feat(platform): Complete Postgres-Only architecture refactoring (Phase 1-7)
Major Changes: - Implement Platform-Only architecture pattern (unified task management) - Add PostgresCacheAdapter for unified caching (platform_schema.app_cache) - Add PgBossQueue for job queue management (platform_schema.job) - Implement CheckpointService using job.data (generic for all modules) - Add intelligent threshold-based dual-mode processing (THRESHOLD=50) - Add task splitting mechanism (auto chunk size recommendation) - Refactor ASL screening service with smart mode selection - Refactor DC extraction service with smart mode selection - Register workers for ASL and DC modules Technical Highlights: - All task management data stored in platform_schema.job.data (JSONB) - Business tables remain clean (no task management fields) - CheckpointService is generic (shared by all modules) - Zero code duplication (DRY principle) - Follows 3-layer architecture principle - Zero additional cost (no Redis needed, save 8400 CNY/year) Code Statistics: - New code: ~1750 lines - Modified code: ~500 lines - Test code: ~1800 lines - Documentation: ~3000 lines Testing: - Unit tests: 8/8 passed - Integration tests: 2/2 passed - Architecture validation: passed - Linter errors: 0 Files: - Platform layer: PostgresCacheAdapter, PgBossQueue, CheckpointService, utils - ASL module: screeningService, screeningWorker - DC module: ExtractionController, extractionWorker - Tests: 11 test files - Docs: Updated 4 key documents Status: Phase 1-7 completed, Phase 8-9 pending
This commit is contained in:
@@ -1,7 +1,8 @@
|
||||
# 云原生开发规范
|
||||
|
||||
> **文档版本:** V1.0
|
||||
> **文档版本:** V1.1
|
||||
> **创建日期:** 2025-11-16
|
||||
> **最后更新:** 2025-12-13 🏆 **Postgres-Only 架构规范新增**
|
||||
> **适用对象:** 所有开发人员
|
||||
> **强制性:** ✅ 必须遵守
|
||||
> **维护者:** 架构团队
|
||||
@@ -32,6 +33,7 @@
|
||||
| **日志系统** | `import { logger } from '@/common/logging'` | 标准化日志 | ✅ 平台级 |
|
||||
| **异步任务** | `import { jobQueue } from '@/common/jobs'` | 长时间任务 | ✅ 平台级 |
|
||||
| **缓存服务** | `import { cache } from '@/common/cache'` | 分布式缓存 | ✅ 平台级 |
|
||||
| **🏆 断点续传** | `import { CheckpointService } from '@/common/jobs'` | 任务断点管理 | ✅ 平台级(新) |
|
||||
| **数据库** | `import { prisma } from '@/config/database'` | 数据库操作 | ✅ 平台级 |
|
||||
| **LLM能力** | `import { LLMFactory } from '@/common/llm'` | LLM调用 | ✅ 平台级 |
|
||||
|
||||
@@ -320,6 +322,144 @@ export async function extractPdfText(ossKey: string): Promise<string> {
|
||||
|
||||
---
|
||||
|
||||
## 🏆 Postgres-Only 架构规范(2025-12-13 新增)
|
||||
|
||||
### 核心理念
|
||||
|
||||
**Platform-Only 模式**:所有任务管理信息统一存储在 `platform_schema.job.data`,业务表只存储业务信息。
|
||||
|
||||
### 任务管理的正确做法
|
||||
|
||||
#### ✅ DO: 使用 job.data 存储任务管理信息
|
||||
|
||||
```typescript
|
||||
// ✅ 正确:任务拆分和断点信息存储在 job.data
|
||||
import { jobQueue } from '@/common/jobs';
|
||||
import { CheckpointService } from '@/common/jobs/CheckpointService';
|
||||
|
||||
// 推送任务时,包含完整信息
|
||||
await jobQueue.push('asl:screening:batch', {
|
||||
// 业务信息
|
||||
taskId: 'xxx',
|
||||
projectId: 'yyy',
|
||||
literatureIds: [...],
|
||||
|
||||
// ✅ 任务拆分信息(存储在 job.data)
|
||||
batchIndex: 3,
|
||||
totalBatches: 20,
|
||||
startIndex: 150,
|
||||
endIndex: 200,
|
||||
|
||||
// ✅ 进度追踪
|
||||
processedCount: 0,
|
||||
successCount: 0,
|
||||
failedCount: 0,
|
||||
});
|
||||
|
||||
// Worker 中使用 CheckpointService
|
||||
const checkpointService = new CheckpointService(prisma);
|
||||
|
||||
// 保存断点到 job.data
|
||||
await checkpointService.saveCheckpoint(job.id, {
|
||||
currentBatchIndex: 5,
|
||||
currentIndex: 250,
|
||||
processedBatches: 5,
|
||||
totalBatches: 20
|
||||
});
|
||||
|
||||
// 加载断点从 job.data
|
||||
const checkpoint = await checkpointService.loadCheckpoint(job.id);
|
||||
if (checkpoint) {
|
||||
resumeFrom = checkpoint.currentIndex;
|
||||
}
|
||||
```
|
||||
|
||||
#### ❌ DON'T: 在业务表中存储任务管理信息
|
||||
|
||||
```typescript
|
||||
// ❌ 错误:在业务表的 Schema 中添加任务管理字段
|
||||
model AslScreeningTask {
|
||||
id String @id
|
||||
projectId String
|
||||
|
||||
// ❌ 不要添加这些字段!
|
||||
totalBatches Int // ← 应该在 job.data 中
|
||||
processedBatches Int // ← 应该在 job.data 中
|
||||
currentIndex Int // ← 应该在 job.data 中
|
||||
checkpointData Json? // ← 应该在 job.data 中
|
||||
}
|
||||
|
||||
// ❌ 错误:自己实现断点服务
|
||||
class MyCheckpointService {
|
||||
async save(taskId: string) {
|
||||
await prisma.aslScreeningTask.update({
|
||||
where: { id: taskId },
|
||||
data: { checkpointData: {...} } // ❌ 不要这样做!
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**为什么不对?**
|
||||
- ❌ 每个模块都要添加相同的字段(代码重复)
|
||||
- ❌ 违反 DRY 原则
|
||||
- ❌ 违反 3 层架构原则
|
||||
- ❌ 维护困难(修改逻辑需要改多处)
|
||||
|
||||
### 智能阈值判断的规范
|
||||
|
||||
#### ✅ DO: 实现智能双模式处理
|
||||
|
||||
```typescript
|
||||
const QUEUE_THRESHOLD = 50; // 推荐阈值
|
||||
|
||||
export async function startTask(items: any[]) {
|
||||
const useQueue = items.length >= QUEUE_THRESHOLD;
|
||||
|
||||
if (useQueue) {
|
||||
// 队列模式:大任务(≥50条)
|
||||
const chunks = splitIntoChunks(items, 50);
|
||||
for (const chunk of chunks) {
|
||||
await jobQueue.push('task:batch', {...});
|
||||
}
|
||||
} else {
|
||||
// 直接模式:小任务(<50条)
|
||||
processDirectly(items); // 快速响应
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**为什么这样做?**
|
||||
- ✅ 小任务快速响应(无队列延迟)
|
||||
- ✅ 大任务高可靠(支持断点续传)
|
||||
- ✅ 性能与可靠性平衡
|
||||
|
||||
#### ❌ DON'T: 所有任务都走队列
|
||||
|
||||
```typescript
|
||||
// ❌ 错误:即使1条记录也使用队列
|
||||
export async function startTask(items: any[]) {
|
||||
// 无论多少数据,都推送到队列
|
||||
await jobQueue.push('task:batch', items); // ❌ 小任务会有延迟
|
||||
}
|
||||
```
|
||||
|
||||
**为什么不对?**
|
||||
- ❌ 小任务响应慢(队列有轮询间隔)
|
||||
- ❌ 浪费队列资源
|
||||
- ❌ 用户体验差
|
||||
|
||||
### 阈值推荐值
|
||||
|
||||
| 任务类型 | 推荐阈值 | 理由 |
|
||||
|---------|---------|------|
|
||||
| 文献筛选 | 50篇 | 单篇~7秒,50篇~5分钟 |
|
||||
| 数据提取 | 50条 | 单条~5-10秒,50条~5分钟 |
|
||||
| 统计模型 | 30个 | 单个~10秒,30个~5分钟 |
|
||||
| 默认 | 50条 | 通用推荐值 |
|
||||
|
||||
---
|
||||
|
||||
## ❌ 禁止做法(DON'T)
|
||||
|
||||
### 1. 本地文件存储 ❌
|
||||
|
||||
Reference in New Issue
Block a user