Files
AIclinicalresearch/docs/02-通用能力层/01-LLM大模型网关/[AI对接] LLM网关快速上下文.md
HaHafeng e3e7e028e8 feat(platform): Complete platform infrastructure implementation and verification
Platform Infrastructure - 8 Core Modules Completed:
- Storage Service (LocalAdapter + OSSAdapter stub)
- Logging System (Winston + JSON format)
- Cache Service (MemoryCache + Redis stub)
- Async Job Queue (MemoryQueue + DatabaseQueue stub)
- Health Check Endpoints (liveness/readiness/detailed)
- Database Connection Pool (with Serverless optimization)
- Environment Configuration Management
- Monitoring Metrics (DB connections/memory/API)

Key Features:
- Adapter Pattern for zero-code environment switching
- Full backward compatibility with legacy modules
- 100% test coverage (all 8 modules verified)
- Complete documentation (11 docs updated)

Technical Improvements:
- Fixed duplicate /health route registration issue
- Fixed TypeScript interface export (export type)
- Installed winston dependency
- Added structured logging with context support
- Implemented graceful shutdown for Serverless
- Added connection pool optimization for SAE

Documentation Updates:
- Platform infrastructure planning (04-骞冲彴鍩虹璁炬柦瑙勫垝.md)
- Implementation report (2025-11-17-骞冲彴鍩虹璁炬柦瀹炴柦瀹屾垚鎶ュ憡.md)
- Verification report (2025-11-17-骞冲彴鍩虹璁炬柦楠岃瘉鎶ュ憡.md)
- Git commit guidelines (06-Git鎻愪氦瑙勮寖.md) - Added commit frequency rules
- Updated 3 core architecture documents

Code Statistics:
- New code: 2,532 lines
- New files: 22
- Updated files: 130+
- Test pass rate: 100% (8/8 modules)

Deployment Readiness:
- Local environment: 鉁?Ready
- Cloud environment: 馃攧 Needs OSS/Redis dependencies

Next Steps:
- Ready to start ASL module development
- Can directly use storage/logger/cache/jobQueue

Tested: Local verification 100% passed
Related: #Platform-Infrastructure
2025-11-18 08:00:41 +08:00

13 KiB
Raw Blame History

[AI对接] LLM网关快速上下文

阅读时间: 5分钟 | Token消耗 ~2000 tokens
层级: L2 | 优先级: P0
前置阅读: 02-通用能力层/[AI对接] 通用能力快速上下文.md


📋 能力定位

LLM大模型网关是整个平台的AI调用中枢是商业模式的技术基础。

为什么是P0优先级

  • 71%的业务模块依赖5个模块AIA、ASL、PKB、DC、RVW
  • ASL模块开发的前置条件
  • 商业模式的技术基础Feature Flag + 成本控制)

状态: 待实现
建议时间: ASL Week 1Day 1-3同步开发


🎯 核心功能

1. 根据用户版本选择模型

商业价值:

专业版¥99/月)→ DeepSeek-V3¥1/百万tokens
高级版¥299/月)→ DeepSeek + Qwen3-72B¥5/百万tokens
旗舰版¥999/月)→ 全部模型含Claude/GPT

实现方式:

// 查询用户Feature Flag
const userFlags = await featureFlagService.getUserFlags(userId);

// 根据Feature Flag选择可用模型
if (requestModel === 'claude-3.5' && !userFlags.includes('claude_access')) {
  throw new Error('您的套餐不支持Claude模型请升级到旗舰版');
}

// 或自动降级
if (!userFlags.includes('claude_access')) {
  model = 'deepseek-v3'; // 自动降级到DeepSeek
}

2. 统一调用接口

问题: 不同LLM厂商API格式不同

  • OpenAI格式
  • Anthropic格式
  • 国产大模型格式DeepSeek、Qwen

解决方案: 统一接口 + 适配器模式

// 业务模块统一调用
const response = await llmGateway.chat({
  userId: 'user123',
  modelType: 'deepseek-v3', // 或 'qwen3', 'claude-3.5'
  messages: [
    { role: 'user', content: '帮我分析这篇文献...' }
  ],
  stream: false
});

// LLM网关内部
// 1. 检查用户权限Feature Flag
// 2. 检查配额
// 3. 选择对应的适配器
// 4. 调用API
// 5. 记录成本
// 6. 返回统一格式

3. 成本控制

核心需求:

  • 每个用户有月度配额
  • 超出配额自动限流
  • 实时成本统计

实现:

// 调用前检查配额
async function checkQuota(userId: string): Promise<boolean> {
  const usage = await getMonthlyUsage(userId);
  const quota = await getUserQuota(userId);
  
  if (usage.tokenCount >= quota.maxTokens) {
    throw new QuotaExceededError('您的月度配额已用完,请升级套餐');
  }
  
  return true;
}

// 调用后记录成本
async function recordUsage(userId: string, usage: {
  modelType: string;
  tokenCount: number;
  cost: number;
}) {
  await db.llmUsage.create({
    userId,
    modelType,
    inputTokens: usage.tokenCount,
    cost: usage.cost,
    timestamp: new Date()
  });
}

4. 流式/非流式统一处理

场景:

  • AIA智能问答 → 需要流式输出(实时显示)
  • ASL文献筛选 → 非流式(批量处理)

统一接口:

interface ChatOptions {
  userId: string;
  modelType: ModelType;
  messages: Message[];
  stream: boolean;  // 是否流式输出
  temperature?: number;
  maxTokens?: number;
}

// 流式
const stream = await llmGateway.chat({ ...options, stream: true });
for await (const chunk of stream) {
  console.log(chunk.content);
}

// 非流式
const response = await llmGateway.chat({ ...options, stream: false });
console.log(response.content);

🏗️ 技术架构

目录结构

backend/src/modules/llm-gateway/
  ├── controllers/
  │   └── llmController.ts           # HTTP接口
  ├── services/
  │   ├── llmGatewayService.ts       # 核心服务 ⭐
  │   ├── featureFlagService.ts      # Feature Flag查询
  │   ├── quotaService.ts            # 配额管理
  │   └── usageService.ts            # 使用统计
  ├── adapters/                      # 适配器模式 ⭐
  │   ├── baseAdapter.ts
  │   ├── deepseekAdapter.ts
  │   ├── qwenAdapter.ts
  │   ├── claudeAdapter.ts
  │   └── openaiAdapter.ts
  ├── types/
  │   └── llm.types.ts
  └── routes/
      └── llmRoutes.ts

核心类设计

1. LLMGatewayService核心

class LLMGatewayService {
  private adapters: Map<ModelType, BaseLLMAdapter>;
  
  async chat(options: ChatOptions): Promise<ChatResponse | AsyncIterator> {
    // 1. 验证用户权限Feature Flag
    await this.checkAccess(options.userId, options.modelType);
    
    // 2. 检查配额
    await quotaService.checkQuota(options.userId);
    
    // 3. 选择适配器
    const adapter = this.adapters.get(options.modelType);
    
    // 4. 调用LLM API
    const response = await adapter.chat(options);
    
    // 5. 记录使用量
    await usageService.record({
      userId: options.userId,
      modelType: options.modelType,
      tokenCount: response.tokenUsage,
      cost: this.calculateCost(options.modelType, response.tokenUsage)
    });
    
    // 6. 返回结果
    return response;
  }
  
  private calculateCost(modelType: ModelType, tokens: number): number {
    const prices = {
      'deepseek-v3': 0.000001,  // ¥1/百万tokens
      'qwen3-72b': 0.000005,    // ¥5/百万tokens
      'claude-3.5': 0.00003     // $15/百万tokens ≈ ¥0.0003/千tokens
    };
    return tokens * prices[modelType];
  }
}

2. BaseLLMAdapter适配器基类

abstract class BaseLLMAdapter {
  abstract chat(options: ChatOptions): Promise<ChatResponse>;
  abstract chatStream(options: ChatOptions): AsyncIterator<ChatChunk>;
  
  protected abstract buildRequest(options: ChatOptions): any;
  protected abstract parseResponse(response: any): ChatResponse;
}

3. DeepSeekAdapter实现示例

class DeepSeekAdapter extends BaseLLMAdapter {
  private apiKey: string;
  private baseUrl = 'https://api.deepseek.com/v1';
  
  async chat(options: ChatOptions): Promise<ChatResponse> {
    const request = this.buildRequest(options);
    
    const response = await fetch(`${this.baseUrl}/chat/completions`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify(request)
    });
    
    const data = await response.json();
    return this.parseResponse(data);
  }
  
  protected buildRequest(options: ChatOptions) {
    return {
      model: 'deepseek-chat',
      messages: options.messages,
      temperature: options.temperature || 0.7,
      max_tokens: options.maxTokens || 4096,
      stream: options.stream || false
    };
  }
  
  protected parseResponse(response: any): ChatResponse {
    return {
      content: response.choices[0].message.content,
      tokenUsage: response.usage.total_tokens,
      finishReason: response.choices[0].finish_reason
    };
  }
}

📊 数据库设计

platform_schema.llm_usage

CREATE TABLE platform_schema.llm_usage (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES platform_schema.users(id),
  model_type VARCHAR(50) NOT NULL,        -- 'deepseek-v3', 'qwen3', 'claude-3.5'
  input_tokens INTEGER NOT NULL,
  output_tokens INTEGER NOT NULL,
  total_tokens INTEGER NOT NULL,
  cost DECIMAL(10, 6) NOT NULL,           -- 实际成本(人民币)
  request_id VARCHAR(100),                -- LLM API返回的request_id
  module VARCHAR(50),                     -- 哪个模块调用的:'AIA', 'ASL', 'PKB'等
  created_at TIMESTAMP DEFAULT NOW(),
  
  INDEX idx_user_created (user_id, created_at),
  INDEX idx_module (module)
);

platform_schema.llm_quotas

CREATE TABLE platform_schema.llm_quotas (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES platform_schema.users(id) UNIQUE,
  monthly_token_limit INTEGER NOT NULL,   -- 月度token配额
  monthly_cost_limit DECIMAL(10, 2),      -- 月度成本上限(可选)
  reset_day INTEGER DEFAULT 1,            -- 每月重置日期1-28
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);

📋 API端点

1. 聊天接口(非流式)

POST /api/v1/llm/chat

Request:
{
  "modelType": "deepseek-v3",
  "messages": [
    { "role": "user", "content": "分析这篇文献..." }
  ],
  "temperature": 0.7,
  "maxTokens": 4096
}

Response:
{
  "content": "根据文献内容分析...",
  "tokenUsage": {
    "input": 150,
    "output": 500,
    "total": 650
  },
  "cost": 0.00065,
  "modelType": "deepseek-v3"
}

2. 聊天接口(流式)

POST /api/v1/llm/chat/stream

Request: 同上 + "stream": true

Response: Server-Sent Events (SSE)
data: {"chunk": "根据", "tokenUsage": 1}
data: {"chunk": "文献", "tokenUsage": 1}
...
data: {"done": true, "totalTokens": 650, "cost": 0.00065}

3. 查询配额

GET /api/v1/llm/quota

Response:
{
  "monthlyLimit": 1000000,
  "used": 245000,
  "remaining": 755000,
  "resetDate": "2025-12-01"
}

4. 使用统计

GET /api/v1/llm/usage?startDate=2025-11-01&endDate=2025-11-30

Response:
{
  "totalTokens": 245000,
  "totalCost": 1.23,
  "byModel": {
    "deepseek-v3": { "tokens": 200000, "cost": 0.20 },
    "qwen3-72b": { "tokens": 45000, "cost": 0.23 }
  },
  "byModule": {
    "AIA": 100000,
    "ASL": 120000,
    "PKB": 25000
  }
}

⚠️ 关键技术难点

1. 流式输出的实现

技术方案: Server-Sent Events (SSE)

// 后端Fastify
app.post('/api/v1/llm/chat/stream', async (req, reply) => {
  reply.raw.setHeader('Content-Type', 'text/event-stream');
  reply.raw.setHeader('Cache-Control', 'no-cache');
  reply.raw.setHeader('Connection', 'keep-alive');
  
  const stream = await llmGateway.chatStream(req.body);
  
  for await (const chunk of stream) {
    reply.raw.write(`data: ${JSON.stringify(chunk)}\n\n`);
  }
  
  reply.raw.end();
});

// 前端React
const eventSource = new EventSource('/api/v1/llm/chat/stream');
eventSource.onmessage = (event) => {
  const data = JSON.parse(event.data);
  setMessages(prev => [...prev, data.chunk]);
};

2. 错误处理和重试

async function chatWithRetry(options: ChatOptions, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await llmGateway.chat(options);
    } catch (error) {
      if (error.code === 'RATE_LIMIT' && i < maxRetries - 1) {
        await sleep(2000 * (i + 1)); // 指数退避
        continue;
      }
      throw error;
    }
  }
}

3. Token计数精确计费

问题: 不同模型的tokenizer不同

解决方案:

  • 使用各厂商提供的API返回值最准确
  • 备用方案tiktoken库OpenAI tokenizer
import { encoding_for_model } from 'tiktoken';

function estimateTokens(text: string, model: string): number {
  const encoder = encoding_for_model(model);
  const tokens = encoder.encode(text);
  encoder.free();
  return tokens.length;
}

📅 开发计划3天

Day 1基础架构6-8小时

  • 创建目录结构
  • 实现BaseLLMAdapter抽象类
  • 实现DeepSeekAdapter
  • 数据库表创建llm_usage, llm_quotas
  • 基础API端点非流式

Day 2核心功能6-8小时

  • Feature Flag集成
  • 配额检查和记录
  • 实现QwenAdapter
  • 错误处理和重试机制
  • 单元测试

Day 3流式输出 + 优化6-8小时

  • 实现流式输出SSE
  • 前端SSE接收处理
  • 成本统计API
  • 配额查询API
  • 集成测试
  • 文档完善

开发检查清单

开始前确认:

  • Feature Flag表已创建platform_schema.feature_flags
  • 用户表已有version字段professional/premium/enterprise
  • 各LLM厂商API Key已配置
  • Prisma Schema已更新

开发中:

  • 每个适配器都有完整的错误处理
  • 所有LLM调用都记录到llm_usage表
  • 配额检查在每次调用前执行
  • 流式和非流式都已测试

完成后:

  • ASL模块可以成功调用LLM网关
  • ADMIN可以查看用户LLM使用统计
  • 配额超限会正确拒绝请求

🔗 相关文档

依赖:

被依赖:


最后更新: 2025-11-06
维护人: 技术架构师
优先级: P0