feat(ssa): Complete QPER architecture - Query, Planner, Execute, Reflection layers

Implement the full QPER intelligent analysis pipeline:

- Phase E+: Block-based standardization for all 7 R tools, DynamicReport renderer, Word export enhancement

- Phase Q: LLM intent parsing with dynamic Zod validation against real column names, ClarificationCard component, DataProfile is_id_like tagging

- Phase P: ConfigLoader with Zod schema validation and hot-reload API, DecisionTableService (4-dimension matching), FlowTemplateService with EPV protection, PlannedTrace audit output

- Phase R: ReflectionService with statistical slot injection, sensitivity analysis conflict rules, ConclusionReport with section reveal animation, conclusion caching API, graceful R error classification

End-to-end test: 40/40 passed across two complete analysis scenarios.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
2026-02-21 18:15:53 +08:00
parent 428a22adf2
commit 371e1c069c
73 changed files with 9242 additions and 706 deletions

View File

@@ -0,0 +1,172 @@
/**
* SSA Intent Prompt Seed 脚本
*
* 将 SSA_QUERY_INTENT prompt 写入 capability_schema.prompt_templates
* 运行: npx tsx scripts/seed-ssa-intent-prompt.ts
*/
import { PrismaClient, PromptStatus } from '@prisma/client';
const prisma = new PrismaClient();
const SSA_INTENT_PROMPT = `你是一个临床统计分析意图理解引擎。你的任务是根据用户的自然语言描述和数据画像,解析出结构化的分析意图。
## 输入信息
### 用户请求
{{userQuery}}
### 数据画像
{{dataProfile}}
### 可用统计工具
{{availableTools}}
## 你的任务
请分析用户的请求,输出一个 JSON 对象(不要输出任何其他内容,只输出 JSON
\`\`\`json
{
"goal": "comparison | correlation | regression | descriptive | cohort_study",
"outcome_var": "结局变量名Y必须是数据画像中存在的列名如果无法确定则为 null",
"outcome_type": "continuous | binary | categorical | ordinal | datetime | null",
"predictor_vars": ["自变量名列表X必须是数据画像中存在的列名"],
"predictor_types": ["对应每个自变量的类型"],
"grouping_var": "分组变量名,必须是数据画像中存在的列名,如果无法确定则为 null",
"design": "independent | paired | longitudinal | cross_sectional",
"confidence": 0.0到1.0之间的数字,
"reasoning": "你的推理过程用1-2句话说明为什么这样解析"
}
\`\`\`
## 关键规则
1. **变量名必须精确匹配数据画像中的列名**,不要翻译、缩写或改写。如果数据里是 "Blood_Pressure",你就输出 "Blood_Pressure",不要输出 "BP"。
2. 如果用户没有明确指出变量,请根据数据画像中的变量类型合理推断,但 confidence 应相应降低。
3. goal 为 "descriptive" 时,不需要 outcome_var 和 predictor_vars。
## Confidence 评分准则(严格按此打分)
- **0.9 - 1.0**: 用户的原话中明确指定了结局变量(Y)和至少一个自变量(X),且这些变量在数据画像中存在。
- **0.7 - 0.8**: 用户指出了 Y 变量,但 X 需要根据数据类型推断;或用户的意图清晰但有轻微歧义。
- **0.5 - 0.6**: 用户意图大致清楚(如"帮我比较一下"),但没有具体指出任何变量名。
- **< 0.5**: 用户只说了"帮我分析一下"这样的模糊表达,既没有明确 Y 也没有明确 X必须追问。
## Few-Shot 示例
### 示例 1明确的差异比较
用户: "帮我比较 Treatment 组和 Control 组的 SBP 有没有差异"
数据画像中有: Group [categorical, 2个水平: Treatment/Control], SBP [numeric]
输出:
\`\`\`json
{"goal":"comparison","outcome_var":"SBP","outcome_type":"continuous","predictor_vars":["Group"],"predictor_types":["binary"],"grouping_var":"Group","design":"independent","confidence":0.95,"reasoning":"用户明确指定了分组变量Group和结局变量SBP要求比较两组差异"}
\`\`\`
### 示例 2相关分析
用户: "年龄和血压有关系吗?"
数据画像中有: Age [numeric], Blood_Pressure [numeric], Gender [categorical]
输出:
\`\`\`json
{"goal":"correlation","outcome_var":"Blood_Pressure","outcome_type":"continuous","predictor_vars":["Age"],"predictor_types":["continuous"],"grouping_var":null,"design":"independent","confidence":0.85,"reasoning":"用户想了解Age和Blood_Pressure的关系两者都是连续变量适合相关分析"}
\`\`\`
### 示例 3多因素回归
用户: "什么因素影响患者的死亡率?"
数据画像中有: Death [categorical, 2个水平: 0/1], Age [numeric], BMI [numeric], Smoking [categorical, 2个水平: Yes/No], Stage [categorical, 4个水平]
输出:
\`\`\`json
{"goal":"regression","outcome_var":"Death","outcome_type":"binary","predictor_vars":["Age","BMI","Smoking","Stage"],"predictor_types":["continuous","continuous","binary","categorical"],"grouping_var":null,"design":"independent","confidence":0.8,"reasoning":"用户想分析影响死亡率的因素Death是二分类结局其余变量作为预测因素纳入logistic回归"}
\`\`\`
### 示例 4模糊表达 — 需要追问
用户: "帮我分析一下这份数据"
数据画像中有: 10个变量
输出:
\`\`\`json
{"goal":"descriptive","outcome_var":null,"outcome_type":null,"predictor_vars":[],"predictor_types":[],"grouping_var":null,"design":"independent","confidence":0.35,"reasoning":"用户没有指定任何分析目标和变量,只能先做描述性统计,建议追问具体分析目的"}
\`\`\`
### 示例 5队列研究
用户: "我想做一个完整的队列研究分析,看看新药对预后的影响"
数据画像中有: Drug [categorical, 2个水平], Outcome [categorical, 2个水平: 0/1], Age [numeric], Gender [categorical], BMI [numeric], Comorbidity [categorical]
输出:
\`\`\`json
{"goal":"cohort_study","outcome_var":"Outcome","outcome_type":"binary","predictor_vars":["Drug","Age","Gender","BMI","Comorbidity"],"predictor_types":["binary","continuous","binary","continuous","categorical"],"grouping_var":"Drug","design":"independent","confidence":0.85,"reasoning":"用户明确要做队列研究分析Drug是暴露因素/分组变量Outcome是结局其余为协变量"}
\`\`\`
请只输出 JSON不要输出其他内容。`;
async function main() {
console.log('🚀 开始写入 SSA Intent Prompt...\n');
const existing = await prisma.prompt_templates.findUnique({
where: { code: 'SSA_QUERY_INTENT' }
});
if (existing) {
console.log('⚠️ SSA_QUERY_INTENT 已存在 (id=%d),创建新版本...', existing.id);
const latestVersion = await prisma.prompt_versions.findFirst({
where: { template_id: existing.id },
orderBy: { version: 'desc' }
});
const newVersion = (latestVersion?.version ?? 0) + 1;
// 归档旧的 ACTIVE 版本
await prisma.prompt_versions.updateMany({
where: { template_id: existing.id, status: 'ACTIVE' },
data: { status: 'ARCHIVED' }
});
await prisma.prompt_versions.create({
data: {
template_id: existing.id,
version: newVersion,
content: SSA_INTENT_PROMPT,
model_config: { model: 'deepseek-v3', temperature: 0.3, maxTokens: 2048 },
status: 'ACTIVE',
changelog: `Phase Q v1.0: 5 组 Few-Shot + Confidence Rubric 客观化`,
created_by: 'system-seed',
}
});
console.log(' ✅ 新版本 v%d 已创建并设为 ACTIVE', newVersion);
} else {
console.log('📝 创建 SSA_QUERY_INTENT 模板...');
const template = await prisma.prompt_templates.create({
data: {
code: 'SSA_QUERY_INTENT',
name: 'SSA 意图理解 Prompt',
module: 'SSA',
description: 'Phase Q — 将用户自然语言转化为结构化的统计分析意图 (ParsedQuery)',
variables: ['userQuery', 'dataProfile', 'availableTools'],
}
});
await prisma.prompt_versions.create({
data: {
template_id: template.id,
version: 1,
content: SSA_INTENT_PROMPT,
model_config: { model: 'deepseek-v3', temperature: 0.3, maxTokens: 2048 },
status: 'ACTIVE',
changelog: 'Phase Q v1.0: 初始版本5 组 Few-Shot + Confidence Rubric',
created_by: 'system-seed',
}
});
console.log(' ✅ 模板 id=%d + 版本 v1 已创建', template.id);
}
console.log('\n✅ SSA Intent Prompt 写入完成!');
}
main()
.catch(e => {
console.error('❌ 写入失败:', e);
process.exit(1);
})
.finally(() => prisma.$disconnect());