feat(ssa): Complete Phase V-A editable analysis plan variables

Features: - Add editable variable selection in workflow plan (SingleVarSelect + MultiVarTags) - Implement 3-layer flexible interception (warning bar + icon + blocking dialog) - Add tool_param_constraints.json for 12 statistical tools parameter validation - Add PATCH /workflow/:id/params API with Zod structural validation - Implement synchronous parameter sync before execution (Promise chaining) - Fix LLM hallucination by strict system prompt constraints - Fix DynamicReport object-based rows compatibility (R baseline_table) - Fix Word export row.map error with same normalization logic - Restore inferGroupingVar for smart default variable selection - Add ReactMarkdown rendering in SSAChatPane - Update SSA module status document to v3.5 Modified files: - backend: workflow.routes, ChatHandlerService, SystemPromptService, FlowTemplateService - frontend: WorkflowTimeline, SSAWorkspacePane, DynamicReport, SSAChatPane, ssaStore, ssa.css - config: tool_param_constraints.json (new) - docs: SSA status doc, team review reports Tested: Cohort study end-to-end execution + report export verified Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-24 13:08:29 +08:00
parent dc6b292308
commit 85fda830c2
27 changed files with 2732 additions and 154 deletions
--- a/docs/03-业务模块/ASL-AI智能文献/00-系统设计/证据整合V2.0/工具3批量提取技术架构设计(散装与轮询版).md
+++ b/docs/03-业务模块/ASL-AI智能文献/00-系统设计/证据整合V2.0/工具3批量提取技术架构设计(散装与轮询版).md
@@ -0,0 +1,250 @@
+# **工具 3 批量提取技术架构设计：散装派发与轮询收口模式**
+
+**文档版本：** V2.0 (Startup Agile Edition)
+
+**核心架构：** 散装派发 (Scatter) \+ 独立单兵 Worker \+ 定时轮询聚合 (Polling Aggregator)
+
+**业务目标：** 支撑工具 3（百篇文献并发提取），告别行锁争用与死锁，实现最高开发效率与多节点并发性能。
+
+## **💡 一、 为什么选择这套架构？(The Philosophy)**
+
+在处理“1 个任务包含 100 篇文献提取”的场景时，我们放弃了传统的“父子任务 Fan-out”强一致性模型，转而采用一种\*\*“最终一致性”\*\*的松耦合架构：
+
+1. **极简的写入逻辑 (无并发冲突)：** 100 个 Worker 抢到任务后各干各的，**只更新属于自己的那 1 行 Result 记录**。绝对不去触碰父任务（Task 表），彻底消灭了多进程对同一行的行锁竞争 (Row-Lock Contention)。  
+2. **读写分离的进度感知：** 前端查询进度时，API 实时去数据库做 COUNT(Result) 聚合，读操作极快且不阻塞写操作。  
+3. **单线程结账 (无死锁)：** 用一个每 10 秒跑一次的全局定时任务（Aggregator）充当“包工头”，扫描所有任务，发现哪个任务下面的子项全做完了，就给它打上 Completed 标签。
+
+## **🏗️ 二、 核心数据流转图 (Data Flow)**
+
+\[ 前端 Client \]  
+       │ 1\. POST /tasks (勾选了 100 篇文献)  
+       ▼  
+\[ Node.js API (Controller) \]  
+       │ 2\. 创建 1 个 Task 记录  
+       │ 3\. 批量创建 100 个 Result 记录 (status: pending)  
+       │ 4\. 🚀 散装派发：for 循环 100 次 \`pgBoss.send('asl\_extract\_single', ...)\`  
+       └─\> 返回 TaskID 给前端 (耗时 \< 0.1秒)  
+         
+\======================== 异步处理域 (多 SAE 实例并发) \========================
+
+\[ pg-boss 队列 (Postgres) \] \<── 存放着 100 个单篇提取任务
+
+\[ Pod A \]             \[ Pod B \]             \[ Pod C \]  
+ Worker 抢单           Worker 抢单           Worker 抢单  
+  │                     │                     │  
+  ├─ 提取 文献 1        ├─ 提取 文献 2        ├─ 提取 文献 3  
+  │                     │                     │  
+  └─ UPDATE Result 1    └─ UPDATE Result 2    └─ UPDATE Result 3   
+     (status: completed)  (status: error)       (status: completed)  
+     ※ 各干各的，互不干扰，不碰 Task 表！
+
+\======================== 全局收口域 (单线程定时器) \========================
+
+\[ pg-boss 调度器 (10秒触发一次) \]  
+       │   
+\[ Task Aggregator (全局唯一包工头) \]  
+       │ 1\. 查出所有 status='processing' 的 Task  
+       │ 2\. GROUP BY 统计其下 Result 的状态  
+       │ 3\. 如果 pending=0 且 extracting=0   
+       └─\> UPDATE Task SET status='completed' (终点收口！)
+
+## **🗄️ 三、 数据库设计微调 (Prisma Schema)**
+
+采用该模式后，AslExtractionTask 表不再需要频繁更新，成为一个极其稳定的元数据表。
+
+model AslExtractionTask {  
+  id              String   @id @default(uuid())  
+  projectId       String  
+  templateId      String  
+  totalCount      Int      // 总文献数 (前端传入，创建后不再改变)  
+    
+  // 核心状态：'processing' (进行中), 'completed' (已完成)  
+  // 此字段仅由 API 创建时设为 processing，由 Aggregator 统一改为 completed  
+  status          String   @default("processing")   
+    
+  // 弃用：不再需要 successCount / failedCount 字段，改由实时 COUNT 聚合得出！  
+    
+  createdAt       DateTime @default(now())  
+  completedAt     DateTime?  
+    
+  results         AslExtractionResult\[\]  
+  @@schema("asl\_schema")  
+}
+
+model AslExtractionResult {  
+  id              String   @id @default(uuid())  
+  taskId          String  
+  pkbDocumentId   String  
+    
+  // 子任务状态: 'pending' (排队中), 'extracting' (提取中), 'completed' (成功), 'error' (失败)  
+  status          String   @default("pending")  
+  extractedData   Json?    // 最终提取的 JSON 结果  
+  errorMessage    String?    
+    
+  task            AslExtractionTask @relation(fields: \[taskId\], references: \[id\])  
+    
+  // 添加索引：极大提升 Aggregator 聚合统计的速度  
+  @@index(\[taskId, status\])  
+  @@schema("asl\_schema")  
+}
+
+## **💻 四、 核心代码落地指南 (Show me the code)**
+
+### **1\. API 层：极速散装派发**
+
+**文件：** ExtractionController.ts
+
+无需编写 Manager Worker，直接在 API 接口中进行 for 循环派发。
+
+async function createTask(req: Request, reply: FastifyReply) {  
+  const { projectId, templateId, documentIds } \= req.body;  
+    
+  if (documentIds.length \=== 0\) throw new Error("未选择文献");
+
+  // 1\. 批量创建记录  
+  const task \= await prisma.aslExtractionTask.create({  
+    data: { projectId, templateId, totalCount: documentIds.length, status: 'processing' }  
+  });
+
+  const resultsData \= documentIds.map(docId \=\> ({  
+    taskId: task.id, pkbDocumentId: docId, status: 'pending'  
+  }));  
+  await prisma.aslExtractionResult.createMany({ data: resultsData });  
+    
+  // 查询出刚创建的 Result IDs  
+  const createdResults \= await prisma.aslExtractionResult.findMany({ where: { taskId: task.id } });
+
+  // 2\. 🚀 散装派发 (Scatter) \- 直接压入单篇队列  
+  // 即使 100 次循环，得益于 pg-boss 内部的批量插入优化，耗时极短  
+  const jobs \= createdResults.map(result \=\> ({  
+    name: 'asl\_extract\_single',  
+    data: { resultId: result.id, pkbDocumentId: result.pkbDocumentId },  
+    options: { retryLimit: 3, retryBackoff: true, expireInMinutes: 30 } // 单篇重试机制  
+  }));  
+  await jobQueue.insert(jobs); // 假设你们底层封装了批量 insert 方法
+
+  return reply.send({ success: true, taskId: task.id });  
+}
+
+### **2\. Worker 层：无脑单兵作战**
+
+**文件：** ExtractionSingleWorker.ts
+
+这里是真正调 MinerU 和 LLM 的地方。**没有任何并发锁，没有任何父任务更新。**
+
+// 限制单机并发，防 OOM 和 API 熔断  
+jobQueue.work('asl\_extract\_single', { teamConcurrency: 10 }, async (job) \=\> {  
+  const { resultId, pkbDocumentId } \= job.data;
+
+  // 1\. 更改自身状态为 extracting (不碰父任务！)  
+  await prisma.aslExtractionResult.update({  
+    where: { id: resultId }, data: { status: 'extracting' }  
+  });
+
+  try {  
+     // 2\. 执行漫长且脆弱的业务逻辑 (MinerU \+ DeepSeek-V3)  
+     const data \= await extractLogic(pkbDocumentId);
+
+     // 3\. 成功：只更新自身！(绝对安全)  
+     await prisma.aslExtractionResult.update({  
+       where: { id: resultId },   
+       data: { status: 'completed', extractedData: data }  
+     });
+
+  } catch (error) {  
+     // 错误分级判断  
+     if (isPermanentError(error)) {  
+        // 致命错误：更新自身为 error，打断重试  
+        await prisma.aslExtractionResult.update({  
+          where: { id: resultId }, data: { status: 'error', errorMessage: error.message }  
+        });  
+        return { success: false, note: 'Permanent Error' };  
+     } else {  
+        // 临时错误 (如网络波动)：让出状态，抛出给 pg-boss 重试  
+        await prisma.aslExtractionResult.update({  
+          where: { id: resultId }, data: { status: 'pending' }  
+        });  
+        throw error;   
+     }  
+  }  
+});
+
+### **3\. Aggregator 层：全局包工头轮询收口**
+
+**文件：** ExtractionAggregator.ts
+
+**触发机制：** 使用 pg-boss 定时器，保证多 Pod 环境下同一时间只有 1 个机器执行此检查。
+
+// 在后端启动时注册：每 10 秒跑一次  
+await jobQueue.schedule('asl\_extraction\_aggregator', '\*/10 \* \* \* \* \*');
+
+jobQueue.work('asl\_extraction\_aggregator', async () \=\> {  
+  // 1\. 找到所有还没结束的父任务  
+  const activeTasks \= await prisma.aslExtractionTask.findMany({  
+    where: { status: 'processing' }  
+  });
+
+  for (const task of activeTasks) {  
+    // 2\. 分组统计其子任务状态 (聚合查询极快)  
+    const stats \= await prisma.aslExtractionResult.groupBy({  
+      by: \['status'\],  
+      where: { taskId: task.id },  
+      \_count: true  
+    });
+
+    const pendingCount \= stats.find(s \=\> s.status \=== 'pending')?.\_count || 0;  
+    const extractingCount \= stats.find(s \=\> s.status \=== 'extracting')?.\_count || 0;
+
+    // 3\. 收口逻辑：没有任何人在排队或干活了，说明这批活彻底干完了（不论成功还是失败）  
+    if (pendingCount \=== 0 && extractingCount \=== 0\) {  
+       await prisma.aslExtractionTask.update({  
+         where: { id: task.id },  
+         data: { status: 'completed', completedAt: new Date() }  
+       });  
+         
+       // 可选：在这里触发全量完成的业务动作 (如发送企业微信通知)  
+       logger.info(\`Task ${task.id} completely finished via Aggregator\!\`);  
+    }  
+  }  
+});
+
+### **4\. 前端查询 API：读写分离的进度感知**
+
+**文件：** TaskStatusController.ts
+
+由于父任务表没有 successCount，前端轮询调用 /tasks/:taskId/status 时，我们**实时读取计算进度**。
+
+async function getTaskStatus(req, reply) {  
+  const { taskId } \= req.params;  
+    
+  const task \= await prisma.aslExtractionTask.findUnique({ where: { id: taskId }});  
+    
+  // 实时动态 COUNT，取代维护冗余字段 (100条数据的 count 耗时 \< 1ms，完全无感)  
+  const successCount \= await prisma.aslExtractionResult.count({  
+    where: { taskId, status: 'completed' }  
+  });  
+  const failedCount \= await prisma.aslExtractionResult.count({  
+    where: { taskId, status: 'error' }  
+  });
+
+  return reply.send({  
+    status: task.status, // processing 或 completed  
+    progress: {  
+      total: task.totalCount,  
+      success: successCount,  
+      failed: failedCount,  
+      percent: Math.round(((successCount \+ failedCount) / task.totalCount) \* 100\)  
+    }  
+  });  
+}
+
+## **🛡️ 五、 方案优势与降维打击总结**
+
+采用这套“散装派发 \+ 轮询聚合”模式后，您的团队获得了如下战略优势：
+
+1. **彻底告别死锁 (No Deadlocks)：** 不再有恶心的乐观锁和竞争态，研发人员只需要专注写“解析 PDF、调大模型、更新一条数据”的纯粹业务逻辑。  
+2. **自带清道夫免疫 (Sweeper-free)：** 如果某个 Node.js 进程在提取中途“猝死”（OOM），该篇文献的状态会一直卡在 extracting。pg-boss 发现它超时后会重新拉起变为 pending。只要它还在 pending/extracting，Aggregator 就不会关闭父任务。这天然规避了此前“硬崩溃导致永远卡死”的顶级漏洞。  
+3. **开发提速 200%：** 架构理解成本降至最低。新人一听就懂：“打散分发，各个击破，定时结账”。  
+4. **性能拉满 (Max Scale-out)：** 多 SAE 实例部署时，100 个任务均匀分布在所有机器上。数据库没有任何行锁竞争，CPU 和 IO 利用率达到最完美的线性扩展。
+
+**恭喜团队做出了最符合创业公司发展阶段的高可用架构决策！您可以直接将本设计文档交由后端研发开展 Sprint 1 的开发！**