Files

HaHafeng 66255368b7 feat(admin): Add user management and upgrade to module permission system

Features - User Management (Phase 4.1):
- Database: Add user_modules table for fine-grained module permissions
- Database: Add 4 user permissions (view/create/edit/delete) to role_permissions
- Backend: UserService (780 lines) - CRUD with tenant isolation
- Backend: UserController + UserRoutes (648 lines) - 13 API endpoints
- Backend: Batch import users from Excel
- Frontend: UserListPage (412 lines) - list/filter/search/pagination
- Frontend: UserFormPage (341 lines) - create/edit with module config
- Frontend: UserDetailPage (393 lines) - details/tenant/module management
- Frontend: 3 modal components (592 lines) - import/assign/configure
- API: GET/POST/PUT/DELETE /api/admin/users/* endpoints

Architecture Upgrade - Module Permission System:
- Backend: Add getUserModules() method in auth.service
- Backend: Login API returns modules array in user object
- Frontend: AuthContext adds hasModule() method
- Frontend: Navigation filters modules based on user.modules
- Frontend: RouteGuard checks requiredModule instead of requiredVersion
- Frontend: Remove deprecated version-based permission system
- UX: Only show accessible modules in navigation (clean UI)
- UX: Smart redirect after login (avoid 403 for regular users)

Fixes:
- Fix UTF-8 encoding corruption in ~100 docs files
- Fix pageSize type conversion in userService (String to Number)
- Fix authUser undefined error in TopNavigation
- Fix login redirect logic with role-based access check
- Update Git commit guidelines v1.2 with UTF-8 safety rules

Database Changes:
- CREATE TABLE user_modules (user_id, tenant_id, module_code, is_enabled)
- ADD UNIQUE CONSTRAINT (user_id, tenant_id, module_code)
- INSERT 4 permissions + role assignments
- UPDATE PUBLIC tenant with 8 module subscriptions

Technical:
- Backend: 5 new files (~2400 lines)
- Frontend: 10 new files (~2500 lines)
- Docs: 1 development record + 2 status updates + 1 guideline update
- Total: ~4900 lines of code

Status: User management 100% complete, module permission system operational

2026-01-16 13:42:10 +08:00

24 KiB

Raw Permalink Blame History

ASL 质量保障与可追溯策略

文档版本： V1.0
创建日期： 2025-11-15
适用模块： AI 智能文献（ASL）
目标： 分阶段提升文献筛选、数据提取的准确率、质量控制和可追溯性

📋 文档概述

本文档定义了 ASL 模块在 MVP → V1.0 → V2.0 三个阶段中，如何逐步提升：

提取准确率：从基础可用 → 高质量 → 医学级标准
质量控制：从人工抽查 → 自动验证 → 智能仲裁
可追溯性：从基本记录 → 完整证据链 → 审计级日志

核心设计原则

原则	说明
成本可控	MVP 阶段优先使用 DeepSeek + Qwen3，成本敏感
质量可升级	可切换到 GPT-5-Pro + Claude-4.5 高端组合
分步实施	避免过度设计，每个阶段交付可用功能
医学场景优化	针对英文医学文献的特点优化策略

🎯 三阶段路线图

MVP (4周)              V1.0 (6周)            V2.0 (8周)
├─ 基础双模型验证      ├─ 智能质量控制      ├─ 医学级质量保障
├─ JSON Schema 约束    ├─ 分段提取优化      ├─ 多模型共识仲裁
├─ 置信度评分          ├─ 证据链完整追溯    ├─ 自动质量审计
├─ 人工复核机制        ├─ 规则引擎验证      ├─ 提示词版本管理
└─ 基本追溯日志        └─ Few-shot 示例库   └─ HITL 智能分流
   ↓                      ↓                    ↓
  可用                  高质量                医学级

🚀 MVP 阶段（4 周）

目标定位

准确率目标：≥ 85%
成本预算：筛选 1000 篇文献 ≤ ¥50
交付标准：基础功能可用，支持双模型对比

一、模型选择策略

1.1 主力模型组合（成本优先）

角色	模型	Model ID	用途	成本
模型 A	DeepSeek-V3	`deepseek-chat`	快速初筛	¥0.001/1K tokens
模型 B	Qwen3-72B	`qwen-max`	交叉验证	¥0.004/1K tokens

切换选项（质量优先）：

高端组合：GPT-5-Pro (gpt-5-pro) + Claude-4.5-Sonnet (claude-sonnet-4-5-20250929)
成本增加：约 3-5 倍
准确率提升：85% → 92%+

1.2 模型调用策略

// 双模型并行调用
async function dualModelScreening(
  literature: Literature,
  protocol: Protocol
) {
  // 并行调用两个模型
  const [resultA, resultB] = await Promise.all([
    llmService.chat('deepseek', buildPrompt(literature, protocol)),
    llmService.chat('qwen', buildPrompt(literature, protocol))
  ]);

  // 解析 JSON 结果
  const decisionA = parseJSON(resultA.content);
  const decisionB = parseJSON(resultB.content);

  // 一致性判断
  if (decisionA.decision === decisionB.decision) {
    return {
      finalDecision: decisionA.decision,
      consensus: 'high',
      needReview: false,
      models: [decisionA, decisionB]
    };
  }

  // 冲突 → 人工复核
  return {
    finalDecision: 'uncertain',
    consensus: 'conflict',
    needReview: true,
    models: [decisionA, decisionB]
  };
}

二、核心技术策略

2.1 ✅ 双模型交叉验证

实施方案：

所有筛选任务同时调用两个模型
自动对比结果，标记差异
一致率作为质量指标（目标 ≥ 80%）

代码示例：

interface DualModelResult {
  consensus: 'high' | 'conflict';
  finalDecision: 'include' | 'exclude' | 'uncertain';
  needReview: boolean;
  models: ModelDecision[];
}

2.2 ✅ JSON Schema 约束

实施方案：

定义严格的输出格式
使用枚举限制取值
区分必填/可选字段

Schema 定义：

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": ["decision", "reason", "confidence", "pico"],
  "properties": {
    "decision": {
      "type": "string",
      "enum": ["include", "exclude", "uncertain"]
    },
    "reason": {
      "type": "string",
      "minLength": 10,
      "maxLength": 500
    },
    "confidence": {
      "type": "number",
      "minimum": 0,
      "maximum": 1
    },
    "pico": {
      "type": "object",
      "required": ["population", "intervention", "comparison", "outcome"],
      "properties": {
        "population": {
          "type": "string",
          "enum": ["match", "partial", "mismatch"]
        },
        "intervention": {
          "type": "string",
          "enum": ["match", "partial", "mismatch"]
        },
        "comparison": {
          "type": "string",
          "enum": ["match", "partial", "mismatch", "not_applicable"]
        },
        "outcome": {
          "type": "string",
          "enum": ["match", "partial", "mismatch"]
        }
      }
    },
    "studyDesign": {
      "type": "string",
      "enum": ["RCT", "cohort", "case-control", "cross-sectional", "other"]
    }
  }
}

提示词模板：

const prompt = `
你是一位医学文献筛选专家。请根据以下 PICO 标准判断这篇文献是否应该纳入系统评价。

# PICO 标准
- Population: ${protocol.population}
- Intervention: ${protocol.intervention}
- Comparison: ${protocol.comparison}
- Outcome: ${protocol.outcome}

# 文献信息
标题: ${literature.title}
摘要: ${literature.abstract}

# 输出要求
请严格按照以下 JSON Schema 输出结果：

${JSON.stringify(schema, null, 2)}

注意：
1. decision 只能是 "include"、"exclude" 或 "uncertain"
2. reason 必须具体说明判断依据（10-500字）
3. confidence 为 0-1 之间的数值，表示你的判断把握
4. pico 字段逐项评估匹配程度
`;

2.3 ✅ 置信度评分

实施方案：

要求模型对每个判断给出置信度（0-1）
置信度 < 0.7 自动标记为需人工复核
记录置信度分布，优化阈值

自动分流规则：

function autoTriage(result: DualModelResult) {
  const avgConfidence = (
    result.models[0].confidence + 
    result.models[1].confidence
  ) / 2;

  // 规则1：冲突 → 必须复核
  if (result.consensus === 'conflict') {
    return { needReview: true, priority: 'high' };
  }

  // 规则2：低置信度 → 需要复核
  if (avgConfidence < 0.7) {
    return { needReview: true, priority: 'medium' };
  }

  // 规则3：高置信度 + 一致 → 自动通过
  return { needReview: false, priority: 'low' };
}

2.4 ✅ 基础可追溯

实施方案：

保存原始提示词和模型输出
记录模型版本和时间戳
关联人工复核记录

数据库设计：

model ScreeningResult {
  id              String   @id @default(uuid())
  literatureId    String
  protocolId      String
  
  // 模型A结果
  modelAName      String   // "deepseek-chat"
  modelAOutput    Json     // 原始JSON输出
  modelAConfidence Float
  
  // 模型B结果
  modelBName      String   // "qwen-max"
  modelBOutput    Json
  modelBConfidence Float
  
  // 最终决策
  finalDecision   String   // "include"/"exclude"/"uncertain"
  consensus       String   // "high"/"conflict"
  needReview      Boolean
  
  // 人工复核
  reviewedBy      String?
  reviewedAt      DateTime?
  reviewDecision  String?
  reviewNotes     String?
  
  // 可追溯信息
  promptTemplate  String   @db.Text  // 使用的提示词模板
  createdAt       DateTime @default(now())
  
  @@map("asl_screening_results")
}

三、MVP 成本预算

场景：筛选 1000 篇文献

项目	DeepSeek	Qwen3	合计
输入 tokens（平均）	800	800	-
输出 tokens（平均）	200	200	-
单次成本	¥0.001	¥0.004	¥0.005
1000 篇总成本	¥1	¥4	¥5

冲突率 20% 人工复核：

自动通过：800 篇 × ¥0.005 = ¥4
人工复核：200 篇 × 2 分钟 = 6.7 小时
总成本：¥4 + 人工成本

四、MVP 验收标准

指标	目标	验证方法
双模型一致率	≥ 80%	统计报表
JSON Schema 验证通过率	≥ 95%	自动检查
人工复核队列占比	≤ 20%	系统统计
提取结果可追溯	100%	审计检查
成本控制	≤ ¥50/1000 篇	账单监控

📈 V1.0 阶段（6 周）

目标定位

准确率目标：≥ 90%
成本预算：筛选 1000 篇文献 ≤ ¥80
交付标准：高质量输出，智能质量控制

一、模型策略优化

1.1 成本优化策略

核心思路：80% 用低成本模型，20% 高价值任务用顶级模型

async function smartScreening(literature: Literature, protocol: Protocol) {
  // 第一阶段：快速初筛（DeepSeek）
  const quickResult = await llmService.chat('deepseek', buildPrompt(...));
  const quickDecision = parseJSON(quickResult.content);

  // 如果高置信度 + 明确结论 → 直接采纳
  if (
    quickDecision.confidence > 0.85 && 
    quickDecision.decision !== 'uncertain'
  ) {
    return {
      finalDecision: quickDecision.decision,
      strategy: 'cost-optimized',
      models: [quickDecision]
    };
  }

  // 否则 → 启用高端模型复核
  const detailedResult = await llmService.chat('gpt5', buildPrompt(...));
  return {
    finalDecision: detailedResult.decision,
    strategy: 'quality-assured',
    models: [quickDecision, detailedResult]
  };
}

预期成本节省：

80% 任务用 DeepSeek：800 × ¥0.001 = ¥0.8
20% 任务用 GPT-5：200 × ¥0.10 = ¥20
总成本：¥20.8（相比全用 GPT-5 节省 80%）

二、核心技术增强

2.1 ✅ Few-shot 示例库

实施方案：

人工标注 20-30 个高质量示例
针对不同研究类型分类（RCT、队列、病例对照）
动态选择相似示例嵌入提示词

示例格式：

{
  "examples": [
    {
      "title": "Effect of aspirin on cardiovascular events in patients with diabetes",
      "abstract": "...",
      "goldStandard": {
        "decision": "include",
        "reason": "RCT研究，人群为糖尿病患者（匹配P），干预为阿司匹林（匹配I），对照为安慰剂（匹配C），结局为心血管事件（匹配O）",
        "pico": {
          "population": "match",
          "intervention": "match",
          "comparison": "match",
          "outcome": "match"
        },
        "studyDesign": "RCT"
      }
    }
  ]
}

提示词增强：

const promptWithExamples = `
# 参考示例

以下是 3 个标注好的示例，帮助你理解判断标准：

${examples.map((ex, i) => `
## 示例 ${i + 1}
标题: ${ex.title}
摘要: ${ex.abstract}
判断: ${ex.goldStandard.decision}
理由: ${ex.goldStandard.reason}
`).join('\n')}

# 待筛选文献
标题: ${literature.title}
摘要: ${literature.abstract}

请参考上述示例，输出你的判断结果（JSON格式）。
`;

2.2 ✅ 分段提取

实施方案：

针对全文数据提取，按章节分段处理
每段独立提取，减少上下文混淆
最后合并结果，交叉验证一致性

分段策略：

async function segmentedExtraction(fullText: string, protocol: Protocol) {
  // 分段
  const sections = {
    methods: extractSection(fullText, 'methods'),
    results: extractSection(fullText, 'results'),
    tables: extractTables(fullText),
  };

  // 并行提取
  const [methodsData, resultsData, tablesData] = await Promise.all([
    extractFromMethods(sections.methods, protocol),
    extractFromResults(sections.results, protocol),
    extractFromTables(sections.tables, protocol),
  ]);

  // 合并结果
  return mergeExtractionResults([methodsData, resultsData, tablesData]);
}

提取示例（方法学部分）：

const methodsPrompt = `
请从以下方法学部分提取研究设计信息：

# 方法学原文
${methodsSection}

# 提取字段
- 研究设计类型（RCT/cohort/case-control等）
- 样本量（干预组/对照组）
- 纳入标准
- 排除标准
- 随机化方法（如适用）
- 盲法（如适用）

# 输出格式（JSON）
${methodsSchema}
`;

2.3 ✅ 规则引擎验证

实施方案：

定义业务规则，自动检查逻辑错误
数值范围验证
必填字段完整性检查

验证规则：

const validationRules = [
  {
    name: '样本量合理性',
    check: (data) => {
      const total = data.sampleSize.intervention + data.sampleSize.control;
      return total >= 10 && total <= 100000;
    },
    errorMessage: '样本量超出合理范围（10-100000）'
  },
  {
    name: 'P值范围',
    check: (data) => {
      return data.pValue >= 0 && data.pValue <= 1;
    },
    errorMessage: 'P值必须在0-1之间'
  },
  {
    name: '必填字段完整性',
    check: (data) => {
      const required = ['studyDesign', 'sampleSize', 'primaryOutcome'];
      return required.every(field => data[field] != null);
    },
    errorMessage: '缺少必填字段'
  }
];

function validateExtraction(data: ExtractionResult): ValidationReport {
  const errors = [];
  for (const rule of validationRules) {
    if (!rule.check(data)) {
      errors.push(rule.errorMessage);
    }
  }
  return {
    isValid: errors.length === 0,
    errors
  };
}

2.4 ✅ 完整证据链

实施方案：

记录原文引用位置（页码、段落、句子）
保存模型完整输出（含中间推理）
关联所有人工修改记录

数据库增强：

model ExtractionResult {
  id              String   @id @default(uuid())
  
  // 提取内容
  extractedData   Json
  
  // 证据链（新增）
  evidenceChain   Json     // {
                           //   "sampleSize": {
                           //     "value": 150,
                           //     "source": {
                           //       "page": 3,
                           //       "paragraph": 2,
                           //       "text": "A total of 150 patients were enrolled..."
                           //     }
                           //   }
                           // }
  
  // 模型信息
  modelName       String
  modelVersion    String
  promptVersion   String   // "v1.2.0"
  rawOutput       String   @db.Text  // 原始输出（含CoT推理）
  
  // 修改历史
  revisions       ExtractionRevision[]
  
  createdAt       DateTime @default(now())
  @@map("asl_extraction_results")
}

model ExtractionRevision {
  id              String   @id @default(uuid())
  extractionId    String
  
  fieldName       String   // 修改的字段
  oldValue        Json
  newValue        Json
  reason          String   // 修改理由
  
  revisedBy       String
  revisedAt       DateTime @default(now())
  
  extraction      ExtractionResult @relation(fields: [extractionId], references: [id])
  @@map("asl_extraction_revisions")
}

三、V1.0 成本预算

场景：筛选 1000 篇 + 提取 200 篇全文

任务	策略	成本
标题摘要筛选	80% DeepSeek + 20% GPT-5	¥21
全文数据提取	分段提取（GPT-5）	¥60
总成本	-	¥81

四、V1.0 验收标准

指标	目标	验证方法
提取准确率	≥ 90%	人工抽查 50 篇
Few-shot 示例库	≥ 20 个	人工标注
规则引擎覆盖率	≥ 80%	代码审查
证据链完整性	100%	审计检查
成本控制	≤ ¥80/项目	账单监控

🏆 V2.0 阶段（8 周）

目标定位

准确率目标：≥ 95%（医学级）
成本预算：按需配置
交付标准：自动化质量审计，符合临床研究规范

一、医学级质量保障

1.1 ✅ 三模型共识仲裁

实施方案：

双模型冲突时，自动启用第三方仲裁
三模型投票决策
记录仲裁过程

async function threeModelArbitration(
  literature: Literature,
  protocol: Protocol
) {
  // 第一轮：双模型
  const [resultA, resultB] = await Promise.all([
    llmService.chat('deepseek', buildPrompt(...)),
    llmService.chat('qwen', buildPrompt(...))
  ]);

  // 如果一致，直接返回
  if (resultA.decision === resultB.decision) {
    return { finalDecision: resultA.decision, arbitration: false };
  }

  // 冲突 → 启用 Claude 仲裁
  console.log('检测到冲突，启用 Claude-4.5 仲裁...');
  const resultC = await llmService.chat('claude', buildPrompt(...));

  // 三模型投票
  const votes = [resultA.decision, resultB.decision, resultC.decision];
  const voteCount = {
    include: votes.filter(v => v === 'include').length,
    exclude: votes.filter(v => v === 'exclude').length,
    uncertain: votes.filter(v => v === 'uncertain').length,
  };

  // 多数决
  const winner = Object.entries(voteCount)
    .sort((a, b) => b[1] - a[1])[0][0];

  return {
    finalDecision: winner,
    arbitration: true,
    votes: { resultA, resultB, resultC },
    consensus: voteCount[winner] >= 2 ? 'strong' : 'weak'
  };
}

成本控制：

仅在冲突时启用仲裁（预计 10-15%）
单次仲裁额外成本：¥0.021（Claude-4.5）

1.2 ✅ HITL 智能分流

实施方案：

基于规则的智能优先级排序
高价值/高风险任务优先人工复核
低风险任务自动化处理

分流规则：

function intelligentTriage(result: ScreeningResult): TriageDecision {
  let priority = 0;
  let needReview = false;

  // 规则1：三模型仍不一致 → 最高优先级
  if (result.arbitration && result.consensus === 'weak') {
    priority = 100;
    needReview = true;
  }
  // 规则2：RCT 研究 → 中等优先级
  else if (result.studyDesign === 'RCT') {
    priority = 70;
    needReview = result.confidence < 0.9;
  }
  // 规则3：关键结局指标 → 高优先级
  else if (result.outcome.includes('mortality')) {
    priority = 80;
    needReview = result.confidence < 0.85;
  }
  // 规则4：高置信度 + 一致 → 自动通过
  else if (result.confidence > 0.95 && result.consensus === 'high') {
    priority = 10;
    needReview = false;
  }

  return { priority, needReview };
}

1.3 ✅ 提示词版本管理

实施方案：

Git 管理提示词模板
版本号标记（语义化版本）
A/B 测试不同版本效果

目录结构：

backend/prompts/asl/
├── screening/
│   ├── v1.0.0-basic.txt
│   ├── v1.1.0-with-examples.txt
│   └── v1.2.0-cot.txt
├── extraction/
│   ├── v1.0.0-methods.txt
│   └── v1.1.0-methods-segmented.txt
└── changelog.md

版本记录：

model PromptVersion {
  id              String   @id @default(uuid())
  
  name            String   // "screening-v1.2.0"
  content         String   @db.Text
  version         String   // "1.2.0"
  changelog       String   // "增加 Few-shot 示例"
  
  // 性能指标
  accuracy        Float?   // 0.92
  usageCount      Int      @default(0)
  
  isActive        Boolean  @default(false)
  createdAt       DateTime @default(now())
  
  @@map("asl_prompt_versions")
}

1.4 ✅ 自动质量审计

实施方案：

定期批量抽查（10%）
自动生成质量报告
异常检测和告警

审计报表：

interface QualityAuditReport {
  period: { start: Date; end: Date };
  totalTasks: number;
  sampledTasks: number;
  
  metrics: {
    accuracy: number;           // 准确率
    interRaterAgreement: number; // 人机一致性
    falsePositiveRate: number;   // 假阳性率
    falseNegativeRate: number;   // 假阴性率
  };
  
  modelPerformance: {
    deepseek: { accuracy: number; avgConfidence: number };
    qwen: { accuracy: number; avgConfidence: number };
    gpt5: { accuracy: number; avgConfidence: number };
  };
  
  issues: {
    type: string;
    count: number;
    examples: string[];
  }[];
  
  recommendations: string[];
}

二、高级提示词工程

2.1 ✅ Chain of Thought (CoT)

实施方案：

要求模型输出推理过程
分步骤判断 PICO 匹配度
最后给出综合结论

提示词示例：

请按照以下步骤判断这篇文献是否应该纳入：

# Step 1: 研究设计判断
- 识别研究类型（RCT/队列/病例对照等）
- 判断是否符合纳入标准

# Step 2: PICO 逐项评估
- Population: 详细分析人群是否匹配
- Intervention: 详细分析干预措施是否匹配
- Comparison: 详细分析对照是否匹配
- Outcome: 详细分析结局指标是否匹配

# Step 3: 综合判断
- 汇总以上分析
- 给出最终决策（include/exclude/uncertain）
- 评估置信度（0-1）

# 输出格式
{
  "reasoning": {
    "studyDesign": "这是一项...",
    "population": "人群匹配度分析...",
    "intervention": "干预措施分析...",
    "comparison": "对照分析...",
    "outcome": "结局指标分析..."
  },
  "decision": "include",
  "confidence": 0.95,
  "reason": "基于以上分析..."
}

2.2 ✅ 动态示例选择

实施方案：

计算待筛选文献与示例库的语义相似度
动态选择最相似的 3-5 个示例
嵌入提示词

async function selectSimilarExamples(
  literature: Literature,
  examplePool: Example[]
): Promise<Example[]> {
  // 使用嵌入模型计算相似度
  const literatureEmbedding = await getEmbedding(
    `${literature.title} ${literature.abstract}`
  );

  const similarities = examplePool.map(ex => ({
    example: ex,
    similarity: cosineSimilarity(literatureEmbedding, ex.embedding)
  }));

  // 返回最相似的 5 个
  return similarities
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, 5)
    .map(s => s.example);
}

三、V2.0 成本预算

场景：高质量系统评价项目（筛选 5000 篇 + 提取 300 篇）

任务	策略	成本
标题摘要筛选	成本优化 + 15% 仲裁	¥120
全文数据提取	GPT-5 + Claude 双模型	¥350
质量审计	10% 抽查	¥30
总成本	-	¥500

四、V2.0 验收标准

指标	目标	验证方法
提取准确率	≥ 95%	人工抽查 100 篇
人机一致性	≥ 90%	Cohen's Kappa
假阳性率	≤ 5%	统计分析
假阴性率	≤ 3%	统计分析
提示词版本管理	100%	Git 历史
自动化审计	每周 1 次	系统报表

📊 三阶段对比总结

维度	MVP	V1.0	V2.0
准确率	85%	90%	95%
模型组合	DeepSeek + Qwen3	成本优化策略	三模型仲裁
质量控制	双模型验证	规则引擎 + Few-shot	HITL + 自动审计
可追溯性	基本日志	完整证据链	审计级记录
成本/1000 篇	¥5	¥21	¥24 + 仲裁
开发周期	4 周	6 周	8 周
适用场景	快速验证	常规项目	高质量发表

🔄 实施路径

阶段 1: MVP 开发（Week 1-4）

Week 1：基础架构

LLM 服务封装（DeepSeek + Qwen3）
JSON Schema 定义
数据库表设计

Week 2：核心功能

双模型并行调用
一致性判断逻辑
人工复核队列

Week 3：前端开发

筛选工作台
冲突对比视图
人工复核界面

Week 4：测试验收

功能测试
准确率评估
成本监控

阶段 2: V1.0 增强（Week 5-10）

Week 5-6：智能优化

成本优化策略
Few-shot 示例库
动态示例选择

Week 7-8：质量控制

分段提取
规则引擎
证据链完整化

Week 9-10：测试优化

A/B 测试
准确率提升
文档完善

阶段 3: V2.0 完善（Week 11-18）

Week 11-13：高级功能

三模型仲裁
HITL 智能分流
提示词版本管理

Week 14-16：质量审计

自动审计系统
质量报表
异常检测

Week 17-18：发布准备

全量测试
医学专家验证
文档和培训

📚 相关文档

更新日志：

2025-11-15: 创建文档，定义 MVP/V1.0/V2.0 三阶段策略

24 KiB Raw Permalink Blame History Unescape Escape

ASL 质量保障与可追溯策略

📋 文档概述

核心设计原则

🎯 三阶段路线图

🚀 MVP 阶段（4 周）

目标定位

一、模型选择策略

1.1 主力模型组合（成本优先）

1.2 模型调用策略

二、核心技术策略

2.1 ✅ 双模型交叉验证

2.2 ✅ JSON Schema 约束

2.3 ✅ 置信度评分

2.4 ✅ 基础可追溯

三、MVP 成本预算

四、MVP 验收标准

📈 V1.0 阶段（6 周）

目标定位

一、模型策略优化

1.1 成本优化策略

二、核心技术增强

2.1 ✅ Few-shot 示例库

2.2 ✅ 分段提取

2.3 ✅ 规则引擎验证

2.4 ✅ 完整证据链

三、V1.0 成本预算

四、V1.0 验收标准

🏆 V2.0 阶段（8 周）

目标定位

一、医学级质量保障

1.1 ✅ 三模型共识仲裁

1.2 ✅ HITL 智能分流

1.3 ✅ 提示词版本管理

1.4 ✅ 自动质量审计

二、高级提示词工程

2.1 ✅ Chain of Thought (CoT)

2.2 ✅ 动态示例选择

三、V2.0 成本预算

四、V2.0 验收标准

📊 三阶段对比总结

🔄 实施路径

阶段 1: MVP 开发（Week 1-4）

阶段 2: V1.0 增强（Week 5-10）

阶段 3: V2.0 完善（Week 11-18）

📚 相关文档

24 KiB

Raw Permalink Blame History