docs(asl): Upgrade Tool 3 architecture from Fan-out to Scatter+Aggregator (v2.0)

Architecture transformation:
- Replace Fan-out (Manager->Child->Last Child Wins) with Scatter+Aggregator pattern
- API layer directly dispatches N independent jobs (no Manager)
- Worker only writes its own Result row, never touches Task table (zero row-lock)
- Aggregator polls groupBy for completion + zombie cleanup (replaces Sweeper)
- Reduce red lines from 13 to 9, eliminate distributed complexity

Documents updated (10 files):
- 08-Tool3 main architecture doc: v2.0 rewrite (schema, Task 2.3/2.4, red lines, risks)
- 08d-Code patterns: rewrite sections 4.1-4.6 (API dispatch, SingleWorker, Aggregator)
- 08a-M1 sprint: rewrite M1-3 core (Worker+Aggregator), red lines, acceptance criteria
- 08b-M2 sprint: simplify SSE (NOTIFY/LISTEN downgraded to P2 optional)
- 08c-M3 sprint: milestone table wording update
- New: Scatter+Polling Aggregator pattern guide v1.1 (Level 2 cookbook)
- New: V2.0 architecture deep review and gap-fix report
- Updated: ASL module status, system status, capability layer index

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
2026-02-24 22:11:09 +08:00
parent 85fda830c2
commit 371fa53956
13 changed files with 1163 additions and 597 deletions

View File

@@ -148,12 +148,15 @@ if (pkbExtractedText) {
---
## 4. Fan-out Worker 模式(核心)
## 4. 散装派发 + Aggregator 轮询收口(核心)
> 🚀 **v2.0 架构转型:** 本节完全重写,对齐 `散装派发与轮询收口任务模式指南.md`。
> 废弃 Fan-out 模式Manager/Child/Last Child Wins/原子递增/独立 Sweeper
### 4.1 ExtractionService 接口
```typescript
// ⚠️ v1.4 终极修正:废弃 P-Queue并发控制完全交给 pg-boss teamConcurrency
// 🚀 v2.0并发控制交给 pg-boss teamConcurrencyWorker 只写自己的 Result
class ExtractionService {
constructor(
private promptBuilder: DynamicPromptBuilder,
@@ -163,295 +166,258 @@ class ExtractionService {
private pkbBridge: PkbBridgeService,
) {}
// 单篇文献提取(Child Job 调用)
async extractOne(resultId: string, taskId: string): Promise<void>;
// 单篇文献提取(Worker 调用)
async extractOne(resultId: string, taskId: string): Promise<ExtractionOutput>;
// 内部流程(单篇粒度):
// 1. 加载项目模板 → 组装 Schema
// 2. 从 PKB 读取 extractedText零成本用 snapshotStorageKey 访问 OSS防 PKB 删除v1.5
// 3. ⚠️ v1.4: 通过 snapshotStorageKey → OSS 缓存检查 → MinerU 子队列teamConcurrency 全局限流
// 2. 从 PKB 读取 extractedText零成本用 snapshotStorageKey 访问 OSS防 PKB 删除)
// 3. MinerU 表格提取(含 OSS 缓存 Cache-Aside
// 4. 组装 PromptXML 隔离区 + 防注入护栏)→ LLM 调用
// 5. 解析 JSON → fuzzyQuoteMatch 验证
// 6. ⚠️ 事务内 upsert Result + 原子递增父任务计数(防 Race Condition
// 7. SSE 推送进度日志
// 6. 返回 ExtractionOutputWorker 负责写 Result本方法不写 DB
}
```
### 4.2 ExtractionManagerWorkerFire-and-forget
### 4.2 API 层散装派发ExtractionController
> 🚀 v2.0**删除 ExtractionManagerWorker**。API 层直接创建 Result + 快照 PKB 元数据 + 散装派发 Job。
```typescript
// Manager Worker — Fire-and-forget派发后立即退出
// ⚠️ v1.5:派发前一次性快照 PKB 元数据,防止提取中 PKB 侧删改导致崩溃
class ExtractionManagerWorker {
async handle(job: { data: { taskId: string } }) {
const task = await prisma.aslExtractionTask.findUnique({ where: { id: job.data.taskId } });
const results = await prisma.aslExtractionResult.findMany({ where: { taskId: task.id } });
// ═══════════════════════════════════════════════════════════
// 🚨 v1.6 空集合边界守卫
// 如果文献被全部删除或过滤后 results 为空,无 Child 被派发,
// Last Child Wins 永远不触发Task 永远卡在 processing。
// Manager 必须自己充当"收口人"直接完成任务。
// ═══════════════════════════════════════════════════════════
if (results.length === 0) {
await prisma.aslExtractionTask.update({
where: { id: task.id },
data: { status: 'completed', completedAt: new Date() },
});
await broadcastLog(task.id, { source: 'system', message: '⚠️ No documents to extract, task auto-completed.' });
return;
// ExtractionController.ts — POST /api/v1/asl/extraction/tasks
async function createTask(req, reply) {
const { projectId, templateId, documentIds, idempotencyKey } = req.body;
if (documentIds.length === 0) throw new Error('No documents selected');
// ═══════════════════════════════════════════════════════
// DB 级幂等:@unique 索引 + P2002 冲突捕获
// ═══════════════════════════════════════════════════════
let task;
try {
task = await prisma.aslExtractionTask.create({
data: { projectId, templateId, totalCount: documentIds.length, status: 'processing', idempotencyKey },
});
} catch (error) {
if (error.code === 'P2002' && idempotencyKey) {
const existing = await prisma.aslExtractionTask.findFirst({ where: { idempotencyKey } });
return reply.send({ success: true, taskId: existing.id, note: 'Idempotent return' });
}
// ═══════════════════════════════════════════════════════════
// ⚠️ v1.5 PKB 数据一致性快照
// 提取任务可能持续 50 分钟,期间用户可能在 PKB 删除/修改文档。
// 一次性批量读取 PKB 元数据并冻结到 AslExtractionResult
// Child Worker 从自身记录读取 snapshotStorageKey/snapshotFilename
// 不再运行时回查 PKB即使 PKB 删了记录OSS 文件通常仍在。
// ═══════════════════════════════════════════════════════════
const pkbDocIds = results.map(r => r.pkbDocumentId).filter(Boolean);
const pkbDocs = await Promise.all(
pkbDocIds.map(id => this.pkbBridge.getDocumentDetail(id))
);
const pkbDocMap = new Map(pkbDocs.map(d => [d.documentId, d]));
// 批量快照写入
await prisma.$transaction(
results.map(result => {
const doc = pkbDocMap.get(result.pkbDocumentId);
return prisma.aslExtractionResult.update({
where: { id: result.id },
data: {
snapshotStorageKey: doc?.storageKey ?? null,
snapshotFilename: doc?.filename ?? null,
}
});
})
);
// Fan-out为每篇文献派发 Child Job
for (const result of results) {
await pgBoss.send('asl_extraction_child', {
taskId: task.id,
resultId: result.id,
pkbDocumentId: result.pkbDocumentId,
}, {
retryLimit: 3,
retryDelay: 10, // 10 秒后重试
retryBackoff: true, // 指数退避
expireInMinutes: 30,
singletonKey: `extract-${result.id}`, // 幂等键,防止重复派发
});
}
// Manager 派发完毕后直接退出,不等待 Child 完成
// 任务状态翻转由 "Last Child Wins" 机制在 Child Worker 中完成
throw error;
}
// ═══════════════════════════════════════════════════════
// PKB 快照冻结(从旧 Manager 移至 API 层)
// 提取可能持续 50 分钟,期间用户可能在 PKB 删除/修改文档
// ═══════════════════════════════════════════════════════
const pkbDocs = await Promise.all(
documentIds.map(id => pkbBridge.getDocumentDetail(id))
);
const resultsData = pkbDocs.map(doc => ({
taskId: task.id,
projectId,
pkbDocumentId: doc.documentId,
snapshotStorageKey: doc.storageKey,
snapshotFilename: doc.filename,
status: 'pending',
}));
await prisma.aslExtractionResult.createMany({ data: resultsData });
const createdResults = await prisma.aslExtractionResult.findMany({ where: { taskId: task.id } });
// ═══════════════════════════════════════════════════════
// 散装派发N 个独立 Job 一次入队(无 Manager 中转)
// ═══════════════════════════════════════════════════════
const jobs = createdResults.map(result => ({
name: 'asl_extract_single',
data: { resultId: result.id, taskId: task.id, pkbDocumentId: result.pkbDocumentId },
options: {
retryLimit: 3,
retryBackoff: true,
expireInMinutes: 30, // 执行超时,非排队等待超时
singletonKey: `extract-${result.id}`,
},
}));
await jobQueue.insert(jobs);
return reply.send({ success: true, taskId: task.id });
}
```
### 4.3 ExtractionChildWorker乐观锁 + Last Child Wins + 错误分级)
**要点:**
- `idempotencyKey @unique` + P2002 = DB 物理级幂等
- PKB 快照冻结在 API 层完成(无需 Manager
- `singletonKey: extract-${result.id}` 防重复派发
- `jobQueue.insert(jobs)` 批量入队100 篇 < 0.1 秒
### 4.3 ExtractionSingleWorker幽灵守卫 + 错误分级)
> 🚀 v2.0**删除 $transaction、删除 increment、删除 Last Child Wins**。
> Worker 只管更新自己的 Result 行,绝不碰 Task 表。收口由 Aggregator§4.6)负责。
```typescript
// Child Worker — ⚠️ v1.4.2 终极修正:乐观锁 + 原子递增 + Last Child Wins + 错误分级路由
class ExtractionChildWorker {
async handle(job: { data: { taskId: string; resultId: string; pkbDocumentId: string } }) {
const { taskId, resultId, pkbDocumentId } = job.data;
// 🚀 v2.0 散装 Worker — 只管自己的 Result绝不碰 Task 表
class ExtractionSingleWorker {
async handle(job: { data: { resultId: string; taskId: string; pkbDocumentId: string } }) {
const { resultId, taskId, pkbDocumentId } = job.data;
// ═══════════════════════════════════════════════════════
// 幽灵重试守卫:只允许 pending → extracting
// OOM 崩溃 → status 卡在 extracting → 重试被跳过
// → Aggregator 30 分钟后将 extracting 标为 error 兜底
// ═══════════════════════════════════════════════════════
const lock = await prisma.aslExtractionResult.updateMany({
where: { id: resultId, status: 'pending' },
data: { status: 'extracting' },
});
if (lock.count === 0) {
return { success: true, note: 'Phantom retry skipped' };
}
try {
// ═══════════════════════════════════════════════════════════
// ⚠️ v1.4.2 补丁 2乐观锁抢占替代 Read-then-Write 反模式)
// 利用 updateMany 的 WHERE 条件充当原子锁:
// 只有 status='pending' 的行才允许被更新为 'extracting'
// 并发重试时第二个 Worker 会得到 count=0直接退出
// ═══════════════════════════════════════════════════════════
const lock = await prisma.aslExtractionResult.updateMany({
where: { id: resultId, status: 'pending' },
data: { status: 'extracting' },
const data = await this.extractionService.extractOne(resultId, taskId);
// ✅ 只更新自己的 Result 行,绝不碰 Task 表!
await prisma.aslExtractionResult.update({
where: { id: resultId },
data: { status: 'completed', extractedData: data, processedAt: new Date() },
});
if (lock.count === 0) {
// 已被其他 Worker 抢占或已完成,幂等跳过
return { success: true, note: 'Idempotent skip: already processing or completed' };
}
// 执行提取(此时该行已被本 Worker 独占为 'extracting'
const extractResult = await this.extractionService.extractOne(resultId, taskId);
// ═══════════════════════════════════════════════════════════
// ⚠️ v1.4.2 补丁 1 + v1.4 原子递增:
// 事务内更新 Result 状态 + 原子递增父任务计数
// 返回更新后的 Task用于 "Last Child Wins" 判断
// ═══════════════════════════════════════════════════════════
const [_resultUpdate, taskAfterUpdate] = await prisma.$transaction([
prisma.aslExtractionResult.update({
where: { id: resultId },
data: { status: 'completed', extractedData: extractResult.data, processedAt: new Date() }
}),
prisma.aslExtractionTask.update({
where: { id: taskId },
data: {
successCount: { increment: 1 },
totalTokens: { increment: extractResult.tokens },
totalCost: { increment: extractResult.cost },
}
}),
]);
// 🚨 v1.6SSE 推送日志(跨 Pod 广播,替代原 sseEmitter.emit
await broadcastLog(taskId, { source: 'system', message: `${extractResult.filename} extracted` });
// ═══════════════════════════════════════════════════════════
// ⚠️ v1.4.2 补丁 1"Last Child Wins" 终止器
// 最后一个完成(成功或失败)的 Child 负责将父任务翻转为 completed
// 这是 Fan-out 模式的关键收口逻辑——没有它Task 永远卡在 processing
// ═══════════════════════════════════════════════════════════
if (taskAfterUpdate.successCount + taskAfterUpdate.failedCount >= taskAfterUpdate.totalCount) {
await prisma.aslExtractionTask.update({
where: { id: taskId },
data: { status: 'completed', completedAt: new Date() },
});
await broadcastLog(taskId, { source: 'system', type: 'complete', message: '🎉 All documents extracted.' });
}
} catch (error) {
// ⚠️ v1.4 错误分级路由:区分"致命错误"和"临时错误"
if (error instanceof PkbDocumentNotFoundError || error.name === 'PdfCorruptedError') {
// 致命错误:标记业务状态为 error + 原子递增 failedCount
const taskAfterFail = await prisma.$transaction(async (tx) => {
await tx.aslExtractionResult.update({
where: { id: resultId },
data: { status: 'error', errorMessage: error.message }
});
return tx.aslExtractionTask.update({
where: { id: taskId },
data: { failedCount: { increment: 1 } }
});
if (isPermanentError(error)) {
// 致命错误:标记 error不重试
await prisma.aslExtractionResult.update({
where: { id: resultId },
data: { status: 'error', errorMessage: error.message },
});
// ⚠️ v1.4.2 "Last Child Wins":失败的 Child 也要检查是否是最后一个
if (taskAfterFail.successCount + taskAfterFail.failedCount >= taskAfterFail.totalCount) {
await prisma.aslExtractionTask.update({
where: { id: taskId },
data: { status: 'completed', completedAt: new Date() },
});
await broadcastLog(taskId, { source: 'system', type: 'complete', message: '🎉 All documents extracted.' });
}
return { success: false, reason: 'Permanent failure, aborted retry.' };
return { success: false, note: 'Permanent error' };
}
// ═══════════════════════════════════════════════════════════
// 🚨 v1.6 补丁:临时错误 throw 前必须释放乐观锁!
// 原因:上方 updateMany 已将 status 改为 'extracting'。
// 如果裸 throwpg-boss 重试时乐观锁 where: { status: 'pending' }
// 返回 count=0 → 误判"幂等跳过" → 计数永远少一票 → Last Child Wins 永远不触发。
// ═══════════════════════════════════════════════════════════
// 临时错误:回退 status → pending让下次重试能通过幽灵守卫
await prisma.aslExtractionResult.update({
where: { id: resultId },
data: { status: 'pending' },
});
throw error; // pg-boss 指数退避重试
}
}
}
// 临时错误 (429/网络抖动)throw → pg-boss 自动指数退避重试
throw error;
function isPermanentError(error: any): boolean {
return error instanceof PkbDocumentNotFoundError
|| error.name === 'PdfCorruptedError'
|| (error.status && error.status >= 400 && error.status < 500 && error.status !== 429);
}
```
**与旧版 Fan-out Child Worker 的关键区别:**
-~~`prisma.$transaction([...Result, ...Task])`~~ → ✅ 只 `prisma.aslExtractionResult.update()`
-~~`successCount: { increment: 1 }`~~ → ✅ Worker 不碰 Task 表
-~~Last Child Wins 终止器~~ → ✅ Aggregator 定时收口§4.6
- ✅ 幽灵守卫 + 临时错误回退 pending 保留不变
### 4.4 Worker + Aggregator 注册(扁平单队列)
> 🚀 v2.0:废弃三级嵌套队列,改为**单一扁平队列** `asl_extract_single`。
> MinerU / LLM 调用由 Worker 内部 `await` 串行完成,`teamConcurrency` 即为总并发上限。
```typescript
// ═══════════════════════════════════════════════════════
// 单一 Worker 队列:全局最多 10 篇文献并行提取
// ═══════════════════════════════════════════════════════
jobQueue.work('asl_extract_single', { teamConcurrency: 10 }, async (job) => {
await extractionSingleWorker.handle(job);
});
// ═══════════════════════════════════════════════════════
// Aggregator 定时收口:每 2 分钟扫描(代码见 §4.6
// ═══════════════════════════════════════════════════════
await jobQueue.schedule('asl_extraction_aggregator', '*/2 * * * *');
await jobQueue.work('asl_extraction_aggregator', aggregatorHandler);
```
> **扁平架构对比:**
> ```
> 旧 Fan-out三级嵌套已废弃
> asl_extraction_manager → asl_extraction_child(10) → asl_mineru_extract(2) + asl_llm_extract(5)
>
> 新散装(扁平一级):
> asl_extract_single (teamConcurrency: 10) ← Worker 内部串行调 MinerU + LLM
> asl_extraction_aggregator (schedule: */2 * * * *) ← 定时收口 + 僵尸清理
> ```
> `teamConcurrency: 10` 即为 MinerU + LLM 的隐式总并发上限,无需额外子队列。
### 4.5 Postgres-Only 安全规范速查
| 规范 | 要求 | v2.0 散装模式实现 |
|------|------|-----------|
| **API 幂等** | 防止重复创建任务 | `idempotencyKey @unique` + P2002 冲突捕获 |
| **Worker 幂等** | 容忍 pg-boss 重投 | `updateMany({ where: { status: 'pending' } })` 幽灵守卫 |
| **Worker 独立** | Worker 绝不碰 Task 表 | 只写自己的 Result 行Aggregator 负责收口 |
| **Payload 轻量** | Job data 不超过数 KB | 仅传 `{ resultId, taskId, pkbDocumentId }`< 200 bytes |
| **过期时间** | 必须设置 `expireInMinutes` | `expireInMinutes: 30`(执行超时,非排队等待超时) |
| **错误分级** | 区分"可重试"和"永久失败" | 临时错误 → 回退 pending + throw致命错误 → 标 error + return |
| **死信处理** | 超过 retryLimit 的 Job | Aggregator 兼职 Sweeper`extracting` 超 30min → error |
| **进度查询** | 不在 Task 存冗余计数 | 前端 `groupBy` 实时聚合 Result 状态 |
### 4.6 ExtractionAggregator — 轮询收口 + 僵尸清理
> 🚀 v2.0**Aggregator 兼职 Sweeper一个组件两个职责。**
> 取代旧版独立的 `asl_extraction_sweeper` + Fan-out `Last Child Wins` 收口逻辑。
```typescript
// ExtractionAggregator.ts — pg-boss schedule 每 2 分钟执行一次
async function aggregatorHandler() {
const tasks = await prisma.aslExtractionTask.findMany({
where: { status: 'processing' },
});
for (const task of tasks) {
// ═══════════════════════════════════════════════════════
// 职责 1僵尸清理Sweeper 兼职)
// extracting 超过 30 分钟 → 大概率 Worker 进程崩溃OOM/SIGKILL
// ═══════════════════════════════════════════════════════
await prisma.aslExtractionResult.updateMany({
where: {
taskId: task.id,
status: 'extracting',
updatedAt: { lt: new Date(Date.now() - 30 * 60 * 1000) },
},
data: { status: 'error', errorMessage: '[Aggregator] Timeout, likely worker crash.' },
});
// ═══════════════════════════════════════════════════════
// 职责 2聚合统计 — groupBy 一次查询
// ═══════════════════════════════════════════════════════
const stats = await prisma.aslExtractionResult.groupBy({
by: ['status'],
where: { taskId: task.id },
_count: true,
});
const pending = stats.find(s => s.status === 'pending')?._count ?? 0;
const extracting = stats.find(s => s.status === 'extracting')?._count ?? 0;
// ═══════════════════════════════════════════════════════
// 职责 3收口 — pending 和 extracting 都清零即完成
// ═══════════════════════════════════════════════════════
if (pending === 0 && extracting === 0) {
await prisma.aslExtractionTask.update({
where: { id: task.id },
data: { status: 'completed', completedAt: new Date() },
});
logger.info(`[Aggregator] Task ${task.id} completed`);
}
}
}
```
### 4.4 Worker 注册(三级限流 + 队列命名合规)
**Aggregator 与旧 Sweeper / Last Child Wins 的对比:**
```typescript
// ⚠️ v1.4.2 补丁 3队列名称全部使用下划线遵守《Postgres-Only 指南》§4.1 红线)
// 点号(.)在 pg-boss 底层解析中可能被识别为 Schema 分隔符,导致路由截断异常
jobQueue.work('asl_extraction_child', { teamConcurrency: 10 }, async (job) => {
// 全局最多 10 个文献同时在 Node.js 内存中处理
// 其余在 PostgreSQL 中排队(零内存占用)
await extractionChildWorker.handle(job);
});
// MinerU 子队列:全局仅允许 2 个并行(跨所有 Pod
jobQueue.work('asl_mineru_extract', { teamConcurrency: 2 }, async (job) => {
const { storageKey, kbId, docId } = job.data;
return await pdfPipeline.extractTables(storageKey, kbId, docId); // 含 OSS 缓存
});
// LLM 子队列:全局仅允许 5 个并行
jobQueue.work('asl_llm_extract', { teamConcurrency: 5 }, async (job) => {
const { resultId, taskId, prompt } = job.data;
return await llmGateway.call(prompt);
});
// Child Worker 内部调用方式(不再使用 P-Queue
class ExtractionChildWorker {
async extractWithMinerU(storageKey: string, kbId: string, docId: string) {
const jobId = await pgBoss.send('asl_mineru_extract', { storageKey, kbId, docId });
return await pgBoss.getJobResult(jobId);
}
}
```
> **三级限流架构:**
> ```
> asl_extraction_child (teamConcurrency: 10) ← 背压阀门,防 OOM
> └─ asl_mineru_extract (teamConcurrency: 2) ← 昂贵 API 保护
> └─ asl_llm_extract (teamConcurrency: 5) ← LLM 并发保护
> ```
> 全部基于 PostgreSQL 行锁实现全局并发控制,跨所有 Node.js 实例生效。
### 4.5 Postgres-Only 安全规范速查
| 规范 | 要求 | 本模块实现 |
|------|------|-----------|
| **幂等性** | Worker 必须容忍 pg-boss 重投at-least-once | ⚠️ v1.4.2 `updateMany({ where: { status: 'pending' } })` 乐观锁原子抢占 |
| **Payload 轻量** | Job data 不超过数 KB禁止塞 PDF 正文 | 仅传 `{ taskId, resultId, pkbDocumentId }`,不超过 200 bytes |
| **过期时间** | 必须设置 `expireInMinutes`,防止僵尸 Job | Manager: 60minChild: 30min |
| **错误分级** | 区分"可重试"和"永久失败" | 429/5xx → retrypg-boss 指数退避4xx/解析错误 → 标记 error不 retry |
| **死信处理** | 超过 retryLimit 的 Job 进入 DLQ | pg-boss 内置 `onFail` handler 标记该篇为 `error` |
| **进度追踪** | 不在 Job data 中存大量进度 | 进度统一走 `CheckpointService`Job data 仅含 ID 引用 |
### 🆕 4.6 Sweeper 清道夫 — 进程硬崩溃兜底v1.6
> **Fan-out 指南 v1.2 强制要求:** 单兵 Worker 无法处理自身猝死OOM/SIGKILL
> 必须有系统级外部定时任务兜底。否则父任务可能永远卡在 `processing`。
```typescript
// ===== 工具 3 专属清道夫(模块启动时注册) =====
async function aslExtractionSweeper() {
const stuckTasks = await prisma.aslExtractionTask.findMany({
where: {
status: 'processing',
// 🚨 使用 updatedAt最后活跃时间而非 startedAt
// 500 篇文献正常排队可能需要 3+ 小时,用 startedAt 会误杀健康任务。
// 只要 Child 还在完成并递增计数updatedAt 就会持续刷新。
updatedAt: { lt: new Date(Date.now() - 2 * 60 * 60 * 1000) },
},
});
for (const task of stuckTasks) {
await prisma.aslExtractionTask.update({
where: { id: task.id },
data: {
status: 'failed',
errorMessage: '[Sweeper] No progress for 2h — likely Child Worker OOM/SIGKILL. Force-closed.',
completedAt: new Date(),
},
});
// 广播失败事件,确保前端 SSE 能感知
await broadcastLog(task.id, {
source: 'system',
type: 'complete',
message: '❌ [Sweeper] Task force-closed after 2h inactivity.',
});
logger.warn(`[Sweeper] Force-closed stuck task ${task.id} (no progress for 2h)`);
}
}
// 注册为 pg-boss 定时任务(每 10 分钟扫描一次)
await jobQueue.schedule('asl_extraction_sweeper', '*/10 * * * *');
await jobQueue.work('asl_extraction_sweeper', aslExtractionSweeper);
```
> **关键:** Sweeper 判断"卡死"基于 `updatedAt` 而非 `startedAt`,避免误杀正在排队的超大批量任务。
| 旧方案(已废弃) | 新方案 Aggregator |
|---|---|
| Last Child Wins 收口 + 独立 Sweeper 清道夫 | Aggregator 一人兼两职 |
| Worker 必须 `$transaction` 递增 Task 计数 | Worker 只写 Result零行锁争用 |
| Sweeper 用 `updatedAt > 2h` 判断卡死 | Aggregator 用 `extracting > 30min` 清僵尸 |
| 两个独立组件需分别注册 | 一个 `pg-boss schedule` 搞定 |
---
@@ -658,34 +624,38 @@ function useExtractionLogs(taskId: string) {
```typescript
// Step 2 页面组件:双轨制组合
function ExtractionProgress({ taskId }: { taskId: string }) {
const { data: task } = useTaskStatus(taskId); // 主驱动:轮询
const { logs } = useExtractionLogs(taskId); // 增强SSE 日志
// 进度条由 React Query 驱动(稳健)
const percent = task ? Math.round((task.successCount + task.failedCount) / task.totalCount * 100) : 0;
// 完成检测由 React Query 驱动(不依赖 SSE complete 事件)
// 🚀 v2.0:进度 API 返回 groupBy 聚合结果(无 successCount/failedCount 冗余字段)
const { data: progress } = useTaskStatus(taskId); // 主驱动:轮询
const { logs } = useExtractionLogs(taskId); // 增强SSE 日志
const processed = (progress?.completedCount ?? 0) + (progress?.errorCount ?? 0);
const percent = progress ? Math.round(processed / progress.totalCount * 100) : 0;
useEffect(() => {
if (task?.status === 'completed' || task?.status === 'failed') {
if (progress?.status === 'completed' || progress?.status === 'failed') {
navigate(`/asl/extraction/workbench/${taskId}`);
}
}, [task?.status]);
}, [progress?.status]);
return (
<>
<Progress percent={percent} />
<ProcessingTerminal logs={logs} /> {/* SSE 驱动,纯视觉 */}
<ProcessingTerminal logs={logs} />
</>
);
}
```
> **双轨制分工:** React Query 轮询驱动进度条和步骤跳转稳健可靠SSE 仅灌日志流给 ProcessingTerminal视觉增强断开无影响
> **v2.0 变化:** 进度 API 返回 `groupBy` 聚合的 `completedCount` / `errorCount` / `extractingCount` / `pendingCount`,不再依赖 Task 表冗余计数字段。
### 7.6 SSE 跨 Pod 广播 — PostgreSQL NOTIFY/LISTENv1.5M2 实施)
### 7.6 [P2 可选] SSE 跨 Pod 广播 — PostgreSQL NOTIFY/LISTEN
> ⚠️ **v2.0 降级说明:** 散装架构下,进度条和步骤跳转完全由 React Query 轮询驱动§7.5
> SSE 日志流仅为视觉增强。跨 Pod 广播为 **P2 可选增强**M1/M2 前期可用本 Pod 内存事件替代。
>
> **物理限制:** `sseEmitter.emit()` 基于内存 EventEmitter用户连 Pod A、Worker 跑 Pod B → Pod A 零日志。
> 使用 PostgreSQL `NOTIFY/LISTEN` 实现 Postgres-Only 合规的跨实例广播(不引入 Redis
> 如需实现跨 Pod 日志广播,可使用以下 PostgreSQL `NOTIFY/LISTEN` 方案
```typescript
// ===== Worker 发送端ExtractionChildWorker 内部) =====
@@ -757,8 +727,8 @@ class SseNotifyBridge {
- NOTIFY payload 物理上限 **~8000 bytes** → 发送前必须截断至 **7000 bytes**v1.6 强制规范)
- **禁止 `$executeRawUnsafe` + 字符串拼接!** 必须使用 `$executeRaw` Tagged Template + `pg_notify()`v1.6 强制规范)
- LISTEN 连接必须**独立于 Prisma 连接池**PgClient 单独创建)
- NOTIFY 是 fire-and-forget无持久化完美匹配 v1.4 双轨制定位
- `complete` 事件仍走 NOTIFY 广播,确保"Last Child Wins"翻转状态后所有 Pod 的 SSE 客户端都能收到
- NOTIFY 是 fire-and-forget无持久化完美匹配双轨制定位SSE 断开不影响业务)
- v2.0 下 `complete` 事件由前端轮询到 `status === 'completed'` 触发,不再强依赖 SSE 广播
---