feat(ssa): Complete Phase V-A editable analysis plan variables
Features: - Add editable variable selection in workflow plan (SingleVarSelect + MultiVarTags) - Implement 3-layer flexible interception (warning bar + icon + blocking dialog) - Add tool_param_constraints.json for 12 statistical tools parameter validation - Add PATCH /workflow/:id/params API with Zod structural validation - Implement synchronous parameter sync before execution (Promise chaining) - Fix LLM hallucination by strict system prompt constraints - Fix DynamicReport object-based rows compatibility (R baseline_table) - Fix Word export row.map error with same normalization logic - Restore inferGroupingVar for smart default variable selection - Add ReactMarkdown rendering in SSAChatPane - Update SSA module status document to v3.5 Modified files: - backend: workflow.routes, ChatHandlerService, SystemPromptService, FlowTemplateService - frontend: WorkflowTimeline, SSAWorkspacePane, DynamicReport, SSAChatPane, ssaStore, ssa.css - config: tool_param_constraints.json (new) - docs: SSA status doc, team review reports Tested: Cohort study end-to-end execution + report export verified Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -3,7 +3,7 @@
|
||||
> **所属:** 工具 3 全文智能提取工作台 V2.0
|
||||
> **架构总纲:** `08-工具3-全文智能提取工作台V2.0开发计划.md`
|
||||
> **代码手册:** `08d-工具3-代码模式与技术规范.md`(所有代码模式均在此手册中,开发时按需查阅)
|
||||
> **建议时间:** Week 1(5-6 天)
|
||||
> **建议时间:** Week 1(5.5-6.5 天,含 v1.6 Sweeper 清道夫 0.5 天)
|
||||
> **核心目标:** 证明 "PKB 拿数据 → Fan-out 分发 → LLM 盲提 → 数据落库 → 前端看到 completed" 这条管线是通的。
|
||||
|
||||
---
|
||||
@@ -69,13 +69,17 @@
|
||||
- `PkbBridgeService.ts`:调用 `PkbExportService`,代理所有 PKB 数据访问
|
||||
|
||||
**Step C — Fan-out Manager + Child Worker(1 天)⚠️ 核心战役:**
|
||||
- `ExtractionManagerWorker.ts`:读取任务 → ⚠️ v1.5 批量快照 PKB 元数据(`snapshotStorageKey` + `snapshotFilename`)冻结到 `AslExtractionResult` → 为每篇文献 `pgBoss.send('asl_extraction_child', ...)` → 退出(Fire-and-forget)
|
||||
- `ExtractionManagerWorker.ts`:读取任务 → 🆕 **v1.6 空集合守卫**(`results.length === 0` → 直接 completed) → ⚠️ v1.5 批量快照 PKB 元数据(`snapshotStorageKey` + `snapshotFilename`)冻结到 `AslExtractionResult` → 为每篇文献 `pgBoss.send('asl_extraction_child', ...)` → 退出(Fire-and-forget)
|
||||
- `ExtractionChildWorker.ts` 完整逻辑:
|
||||
1. **乐观锁抢占**:`updateMany({ where: { status: 'pending' }, data: { status: 'extracting' } })`
|
||||
2. **纯文本降级提取**:从 PKB 读 `extractedText` + 写死 RCT Schema → 调用 DeepSeek
|
||||
3. **原子递增**:事务内 `update Result + increment Task counts`
|
||||
4. **Last Child Wins**:`successCount + failedCount >= totalCount` → 翻转 `status = completed`
|
||||
5. **错误分级路由**:致命错误 return / 临时错误 throw
|
||||
5. **错误分级路由**:致命错误 return / 🆕 **v1.6 临时错误 throw 前释放乐观锁(回退 status → pending)**
|
||||
|
||||
**Step D — 🆕 Sweeper 清道夫注册(0.5 天)(v1.6 新增):**
|
||||
- `asl_extraction_sweeper`:pg-boss 定时任务,每 10 分钟扫描 `processing` 且 `updatedAt > 2h` 的任务,强制标记 `failed`
|
||||
- 使用 `updatedAt`(最后活跃时间)判断卡死,禁止用 `startedAt`(防误杀健康的超大批量任务)
|
||||
|
||||
**Worker 注册(遵守队列命名规范):**
|
||||
```
|
||||
@@ -94,9 +98,14 @@ jobQueue.work('asl_extraction_child', { teamConcurrency: 10 }, handler)
|
||||
- [ ] Last Child Wins:最后一个 Child 翻转 Task status = completed
|
||||
- [ ] 致命错误(PKB 文档不存在)→ 该篇标 error + 不重试 + 不阻塞其他篇
|
||||
- [ ] 临时错误(429)→ pg-boss 指数退避重试
|
||||
- [ ] 🆕 临时错误 throw 前回退 `status → pending`:模拟 429 重试后乐观锁仍能抢占成功(v1.6 乐观锁释放验证)
|
||||
- [ ] 🆕 Manager 空集合守卫:`results.length === 0` 时 Task 直接标记 `completed`(v1.6 边界验证)
|
||||
- [ ] 🆕 Sweeper 清道夫已注册:`asl_extraction_sweeper` 定时任务在 pg-boss 中可查到(v1.6)
|
||||
- [ ] 🆕 Sweeper 判定条件为 `updatedAt > 2h`,而非 `startedAt`(v1.6 防误杀验证)
|
||||
|
||||
> 📖 Fan-out 架构图、Worker 代码模式、研发红线见架构总纲 Task 2.3
|
||||
> 📖 ACL 防腐层设计见架构总纲 Task 3.3b
|
||||
> 📖 ACL 防腐层设计见架构总纲 Task 3.3b
|
||||
> 📖 Sweeper、乐观锁释放、空集合守卫代码见 08d §4.2 / §4.3 / §4.6
|
||||
|
||||
---
|
||||
|
||||
@@ -150,6 +159,9 @@ jobQueue.work('asl_extraction_child', { teamConcurrency: 10 }, handler)
|
||||
| 5 | Job Payload 仅传 ID(< 200 bytes),禁止塞 PDF 正文 | pg-boss 阻塞 |
|
||||
| 6 | ACL 防腐层:ASL 不 import PKB 内部类型 | 模块耦合蔓延 |
|
||||
| 7 | Manager 必须快照 `snapshotStorageKey` + `snapshotFilename`,Child 禁止运行时回查 PKB 获取 storageKey(v1.5) | 提取中 PKB 删文档 → 批量崩溃 |
|
||||
| 8 | 🆕 临时错误 `throw` 前必须 `update({ status: 'pending' })` 释放乐观锁(v1.6) | 重试时被"幂等跳过",计数永远缺一票,Task 永久卡死 |
|
||||
| 9 | 🆕 Manager 必须检查 `results.length === 0` 并直接 completed(v1.6) | 空文献 → 无 Child → Last Child Wins 死锁 |
|
||||
| 10 | 🆕 必须注册 `asl_extraction_sweeper` 清道夫(`updatedAt > 2h`,禁止用 `startedAt`)(v1.6) | 进程 OOM/SIGKILL 后 Task 永久挂起 |
|
||||
|
||||
---
|
||||
|
||||
@@ -160,6 +172,8 @@ jobQueue.work('asl_extraction_child', { teamConcurrency: 10 }, handler)
|
||||
✅ PKB ACL 防腐层 → PkbExportService + PkbBridgeService
|
||||
✅ Fan-out 全链路:Manager → N × Child → Last Child Wins → completed
|
||||
✅ 乐观锁 + 原子递增 + 错误分级路由 — 所有并发 Bug 已验证
|
||||
✅ 🆕 Sweeper 清道夫注册(v1.6 防 OOM/SIGKILL 卡死)
|
||||
✅ 🆕 乐观锁释放 + 空集合守卫 + pg_notify 参数化(v1.6 全量代码级同步)
|
||||
✅ 前端三步走:选模板/选文献 → 轮询进度 → 极简结果列表
|
||||
❌ 无 MinerU(纯文本降级)
|
||||
❌ 无 SSE 日志流
|
||||
|
||||
@@ -137,13 +137,11 @@ class PdfProcessingPipeline {
|
||||
### 3.2 PKB 复用感知日志
|
||||
|
||||
```typescript
|
||||
// 🚨 v1.6:使用 broadcastLog 跨 Pod 广播(替代 sseEmitter.emit)
|
||||
if (pkbExtractedText) {
|
||||
this.sseEmitter.emit(taskId, {
|
||||
type: 'log',
|
||||
data: {
|
||||
source: 'system',
|
||||
message: `⚡ [Fast-path] Reused full-text from PKB (saved ~10s pymupdf4llm): ${filename}`,
|
||||
}
|
||||
await broadcastLog(taskId, {
|
||||
source: 'system',
|
||||
message: `⚡ [Fast-path] Reused full-text from PKB (saved ~10s pymupdf4llm): ${filename}`,
|
||||
});
|
||||
}
|
||||
```
|
||||
@@ -189,6 +187,21 @@ class ExtractionManagerWorker {
|
||||
const task = await prisma.aslExtractionTask.findUnique({ where: { id: job.data.taskId } });
|
||||
const results = await prisma.aslExtractionResult.findMany({ where: { taskId: task.id } });
|
||||
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
// 🚨 v1.6 空集合边界守卫
|
||||
// 如果文献被全部删除或过滤后 results 为空,无 Child 被派发,
|
||||
// Last Child Wins 永远不触发,Task 永远卡在 processing。
|
||||
// Manager 必须自己充当"收口人"直接完成任务。
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
if (results.length === 0) {
|
||||
await prisma.aslExtractionTask.update({
|
||||
where: { id: task.id },
|
||||
data: { status: 'completed', completedAt: new Date() },
|
||||
});
|
||||
await broadcastLog(task.id, { source: 'system', message: '⚠️ No documents to extract, task auto-completed.' });
|
||||
return;
|
||||
}
|
||||
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
// ⚠️ v1.5 PKB 数据一致性快照
|
||||
// 提取任务可能持续 50 分钟,期间用户可能在 PKB 删除/修改文档。
|
||||
@@ -284,11 +297,8 @@ class ExtractionChildWorker {
|
||||
}),
|
||||
]);
|
||||
|
||||
// SSE 推送日志
|
||||
this.sseEmitter.emit(taskId, {
|
||||
type: 'log',
|
||||
data: { source: 'system', message: `✅ ${extractResult.filename} extracted` }
|
||||
});
|
||||
// 🚨 v1.6:SSE 推送日志(跨 Pod 广播,替代原 sseEmitter.emit)
|
||||
await broadcastLog(taskId, { source: 'system', message: `✅ ${extractResult.filename} extracted` });
|
||||
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
// ⚠️ v1.4.2 补丁 1:"Last Child Wins" 终止器
|
||||
@@ -300,7 +310,7 @@ class ExtractionChildWorker {
|
||||
where: { id: taskId },
|
||||
data: { status: 'completed', completedAt: new Date() },
|
||||
});
|
||||
this.sseEmitter.emit(taskId, { type: 'complete' });
|
||||
await broadcastLog(taskId, { source: 'system', type: 'complete', message: '🎉 All documents extracted.' });
|
||||
}
|
||||
|
||||
} catch (error) {
|
||||
@@ -324,12 +334,23 @@ class ExtractionChildWorker {
|
||||
where: { id: taskId },
|
||||
data: { status: 'completed', completedAt: new Date() },
|
||||
});
|
||||
this.sseEmitter.emit(taskId, { type: 'complete' });
|
||||
await broadcastLog(taskId, { source: 'system', type: 'complete', message: '🎉 All documents extracted.' });
|
||||
}
|
||||
|
||||
return { success: false, reason: 'Permanent failure, aborted retry.' };
|
||||
}
|
||||
// 临时错误 (429/网络抖动):直接 throw,让 pg-boss 自动指数退避重试
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
// 🚨 v1.6 补丁:临时错误 throw 前必须释放乐观锁!
|
||||
// 原因:上方 updateMany 已将 status 改为 'extracting'。
|
||||
// 如果裸 throw,pg-boss 重试时乐观锁 where: { status: 'pending' }
|
||||
// 返回 count=0 → 误判"幂等跳过" → 计数永远少一票 → Last Child Wins 永远不触发。
|
||||
// ═══════════════════════════════════════════════════════════
|
||||
await prisma.aslExtractionResult.update({
|
||||
where: { id: resultId },
|
||||
data: { status: 'pending' },
|
||||
});
|
||||
|
||||
// 临时错误 (429/网络抖动):throw → pg-boss 自动指数退避重试
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
@@ -388,6 +409,50 @@ class ExtractionChildWorker {
|
||||
| **死信处理** | 超过 retryLimit 的 Job 进入 DLQ | pg-boss 内置 `onFail` handler 标记该篇为 `error` |
|
||||
| **进度追踪** | 不在 Job data 中存大量进度 | 进度统一走 `CheckpointService`,Job data 仅含 ID 引用 |
|
||||
|
||||
### 🆕 4.6 Sweeper 清道夫 — 进程硬崩溃兜底(v1.6)
|
||||
|
||||
> **Fan-out 指南 v1.2 强制要求:** 单兵 Worker 无法处理自身猝死(OOM/SIGKILL),
|
||||
> 必须有系统级外部定时任务兜底。否则父任务可能永远卡在 `processing`。
|
||||
|
||||
```typescript
|
||||
// ===== 工具 3 专属清道夫(模块启动时注册) =====
|
||||
async function aslExtractionSweeper() {
|
||||
const stuckTasks = await prisma.aslExtractionTask.findMany({
|
||||
where: {
|
||||
status: 'processing',
|
||||
// 🚨 使用 updatedAt(最后活跃时间),而非 startedAt!
|
||||
// 500 篇文献正常排队可能需要 3+ 小时,用 startedAt 会误杀健康任务。
|
||||
// 只要 Child 还在完成并递增计数,updatedAt 就会持续刷新。
|
||||
updatedAt: { lt: new Date(Date.now() - 2 * 60 * 60 * 1000) },
|
||||
},
|
||||
});
|
||||
|
||||
for (const task of stuckTasks) {
|
||||
await prisma.aslExtractionTask.update({
|
||||
where: { id: task.id },
|
||||
data: {
|
||||
status: 'failed',
|
||||
errorMessage: '[Sweeper] No progress for 2h — likely Child Worker OOM/SIGKILL. Force-closed.',
|
||||
completedAt: new Date(),
|
||||
},
|
||||
});
|
||||
// 广播失败事件,确保前端 SSE 能感知
|
||||
await broadcastLog(task.id, {
|
||||
source: 'system',
|
||||
type: 'complete',
|
||||
message: '❌ [Sweeper] Task force-closed after 2h inactivity.',
|
||||
});
|
||||
logger.warn(`[Sweeper] Force-closed stuck task ${task.id} (no progress for 2h)`);
|
||||
}
|
||||
}
|
||||
|
||||
// 注册为 pg-boss 定时任务(每 10 分钟扫描一次)
|
||||
await jobQueue.schedule('asl_extraction_sweeper', '*/10 * * * *');
|
||||
await jobQueue.work('asl_extraction_sweeper', aslExtractionSweeper);
|
||||
```
|
||||
|
||||
> **关键:** Sweeper 判断"卡死"基于 `updatedAt` 而非 `startedAt`,避免误杀正在排队的超大批量任务。
|
||||
|
||||
---
|
||||
|
||||
## 5. fuzzyQuoteMatch 验证算法
|
||||
@@ -624,24 +689,29 @@ function ExtractionProgress({ taskId }: { taskId: string }) {
|
||||
|
||||
```typescript
|
||||
// ===== Worker 发送端(ExtractionChildWorker 内部) =====
|
||||
// 替代原有的 this.sseEmitter.emit(),改用 NOTIFY 广播
|
||||
// 🚨 v1.6 修正:使用 pg_notify() + Prisma 参数化绑定(免疫 SQL 注入)
|
||||
// 替代原有的 this.sseEmitter.emit() 和 $executeRawUnsafe 字符串拼接
|
||||
async function broadcastLog(taskId: string, logEntry: LogEntry) {
|
||||
const payload = JSON.stringify({
|
||||
const payloadStr = JSON.stringify({
|
||||
taskId,
|
||||
type: 'log',
|
||||
type: logEntry.type ?? 'log',
|
||||
data: logEntry,
|
||||
});
|
||||
// NOTIFY payload 上限 8000 bytes,日志消息绰绰有余
|
||||
await prisma.$executeRawUnsafe(
|
||||
`NOTIFY asl_sse_channel, '${payload.replace(/'/g, "''")}'`
|
||||
);
|
||||
|
||||
// 🚨 NOTIFY payload 物理上限 ~8000 bytes,LLM 错误堆栈可能超限
|
||||
const safePayload = payloadStr.length > 7000
|
||||
? payloadStr.substring(0, 7000) + '..."}'
|
||||
: payloadStr;
|
||||
|
||||
// 参数化绑定:$executeRaw Tagged Template + pg_notify()
|
||||
// 彻底免疫 SQL 注入,无需手动 .replace 转义
|
||||
await prisma.$executeRaw`SELECT pg_notify('asl_sse_channel', ${safePayload})`;
|
||||
}
|
||||
|
||||
// 使用方式(替代 this.sseEmitter.emit)
|
||||
// 使用方式(全面替代 this.sseEmitter.emit)
|
||||
await broadcastLog(taskId, {
|
||||
source: 'system',
|
||||
message: `✅ ${filename} extracted`,
|
||||
timestamp: new Date().toISOString(),
|
||||
});
|
||||
```
|
||||
|
||||
@@ -684,7 +754,8 @@ class SseNotifyBridge {
|
||||
```
|
||||
|
||||
**关键约束:**
|
||||
- NOTIFY payload 上限 **8000 bytes**(日志消息远小于此限制)
|
||||
- NOTIFY payload 物理上限 **~8000 bytes** → 发送前必须截断至 **7000 bytes**(v1.6 强制规范)
|
||||
- **禁止 `$executeRawUnsafe` + 字符串拼接!** 必须使用 `$executeRaw` Tagged Template + `pg_notify()`(v1.6 强制规范)
|
||||
- LISTEN 连接必须**独立于 Prisma 连接池**(PgClient 单独创建)
|
||||
- NOTIFY 是 fire-and-forget(无持久化),完美匹配 v1.4 双轨制定位
|
||||
- `complete` 事件仍走 NOTIFY 广播,确保"Last Child Wins"翻转状态后所有 Pod 的 SSE 客户端都能收到
|
||||
|
||||
Reference in New Issue
Block a user