feat(asl/extraction): Complete Tool 3 M1+M2 - skeleton pipeline and HITL workbench

M1 Skeleton Pipeline:
- Scatter-dispatch + Aggregator polling pattern (PgBoss)
- PKB ACL bridge (PkbBridgeService -> PkbExportService DTOs)
- ExtractionSingleWorker with DeepSeek-V3 LLM extraction
- PermanentExtractionError for non-retryable failures
- Phantom Retry Guard (idempotent worker)
- 3-step minimal frontend (Setup -> Progress -> Workbench)
- 4 new DB tables (extraction_templates, project_templates, tasks, results)
- 3 system templates seed (RCT, Cohort, QC)
- M1 integration test suite

M2 HITL Workbench:
- MinerU VLM integration for high-fidelity table extraction
- XML-isolated DynamicPromptBuilder with flat JSON output template
- fuzzyQuoteMatch validator (3-tier confidence scoring)
- SSE real-time logging via ExtractionEventBus
- Schema-driven ExtractionDrawer (dynamic field rendering from template)
- Excel wide-table export with flattenModuleData normalization
- M2 integration test suite

Critical Fixes (data normalization):
- DynamicPromptBuilder: explicit flat key-value output format with example
- ExtractionExcelExporter: handle both array and flat data formats
- ExtractionDrawer: schema-driven rendering instead of hardcoded fields
- ExtractionValidator: array-format quote verification support
- SSE route: Fastify register encapsulation to bypass auth for EventSource
- LLM JSON sanitizer: strip illegal control chars before JSON.parse

Also includes: RVW stats verification spec, SSA expert config guide

Tested: M1 pipeline test + M2 HITL test + manual frontend verification
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
2026-02-25 18:29:20 +08:00
parent 371fa53956
commit f0736dbca1
40 changed files with 6138 additions and 48 deletions

View File

@@ -43,6 +43,14 @@ import { logger } from '../logging/index.js'
export class PgBossQueue implements JobQueue {
private boss: PgBoss
private jobs: Map<string, Job> = new Map() // 任务元数据缓存
/**
* 暴露 pg-boss 原生实例,供 Level 2 散装派发模式直接使用。
* Level 1 单体任务继续使用 push/process
* Level 2 批量任务(如 ASL 工具 3通过此方法获取原生 API
* boss.insert(jobs) / boss.work(name, { teamConcurrency }) / boss.schedule(name, cron)
*/
getNativeBoss(): PgBoss { return this.boss }
private handlers: Map<string, JobHandler> = new Map()
private started: boolean = false
@@ -58,7 +66,7 @@ export class PgBossQueue implements JobQueue {
// 维护配置
supervise: true, // 启用监控
maintenanceIntervalSeconds: 300, // 每5分钟运行维护任务
maintenanceIntervalSeconds: 30, // 每30秒运行维护任务(保障 schedule cron 按时触发)
})
// 🛡️ 全局错误监听:防止未捕获错误导致进程崩溃