# [AI对接] LLM网关快速上下文 > **阅读时间：** 5分钟 | **Token消耗：** ~2000 tokens > **层级：** L2 | **优先级：** P0 ⭐⭐⭐⭐⭐ > **前置阅读：** 02-通用能力层/[AI对接] 通用能力快速上下文.md --- ## 📋 能力定位 **LLM大模型网关是整个平台的AI调用中枢，是商业模式的技术基础。** **为什么是P0优先级：** - 71%的业务模块依赖（5个模块：AIA、ASL、PKB、DC、RVW） - ASL模块开发的**前置条件** - 商业模式的**技术基础**（Feature Flag + 成本控制） **状态：** ❌ 待实现 **建议时间：** ASL Week 1（Day 1-3）同步开发 --- ## 🎯 核心功能 ### 1. 根据用户版本选择模型 ⭐⭐⭐⭐⭐ **商业价值：** ``` 专业版（¥99/月）→ DeepSeek-V3（¥1/百万tokens）高级版（¥299/月）→ DeepSeek + Qwen3-72B（¥5/百万tokens）旗舰版（¥999/月）→ 全部模型（含Claude/GPT） ``` **实现方式：** ```typescript // 查询用户Feature Flag const userFlags = await featureFlagService.getUserFlags(userId); // 根据Feature Flag选择可用模型 if (requestModel === 'claude-3.5' && !userFlags.includes('claude_access')) { throw new Error('您的套餐不支持Claude模型，请升级到旗舰版'); } // 或自动降级 if (!userFlags.includes('claude_access')) { model = 'deepseek-v3'; // 自动降级到DeepSeek } ``` --- ### 2. 统一调用接口 ⭐⭐⭐⭐⭐ **问题：** 不同LLM厂商API格式不同 - OpenAI格式 - Anthropic格式 - 国产大模型格式（DeepSeek、Qwen） **解决方案：** 统一接口 + 适配器模式 ```typescript // 业务模块统一调用 const response = await llmGateway.chat({ userId: 'user123', modelType: 'deepseek-v3', // 或 'qwen3', 'claude-3.5' messages: [ { role: 'user', content: '帮我分析这篇文献...' } ], stream: false }); // LLM网关内部： // 1. 检查用户权限（Feature Flag） // 2. 检查配额 // 3. 选择对应的适配器 // 4. 调用API // 5. 记录成本 // 6. 返回统一格式 ``` --- ### 3. 成本控制 ⭐⭐⭐⭐ **核心需求：** - 每个用户有月度配额 - 超出配额自动限流 - 实时成本统计 **实现：** ```typescript // 调用前检查配额 async function checkQuota(userId: string): Promise { const usage = await getMonthlyUsage(userId); const quota = await getUserQuota(userId); if (usage.tokenCount >= quota.maxTokens) { throw new QuotaExceededError('您的月度配额已用完，请升级套餐'); } return true; } // 调用后记录成本 async function recordUsage(userId: string, usage: { modelType: string; tokenCount: number; cost: number; }) { await db.llmUsage.create({ userId, modelType, inputTokens: usage.tokenCount, cost: usage.cost, timestamp: new Date() }); } ``` --- ### 4. 流式/非流式统一处理 ⭐⭐⭐ **场景：** - AIA智能问答 → 需要流式输出（实时显示） - ASL文献筛选 → 非流式（批量处理） **统一接口：** ```typescript interface ChatOptions { userId: string; modelType: ModelType; messages: Message[]; stream: boolean; // 是否流式输出 temperature?: number; maxTokens?: number; } // 流式 const stream = await llmGateway.chat({ ...options, stream: true }); for await (const chunk of stream) { console.log(chunk.content); } // 非流式 const response = await llmGateway.chat({ ...options, stream: false }); console.log(response.content); ``` --- ## 🏗️ 技术架构 ### 目录结构 ``` backend/src/modules/llm-gateway/ ├── controllers/ │ └── llmController.ts # HTTP接口 ├── services/ │ ├── llmGatewayService.ts # 核心服务 ⭐ │ ├── featureFlagService.ts # Feature Flag查询 │ ├── quotaService.ts # 配额管理 │ └── usageService.ts # 使用统计 ├── adapters/ # 适配器模式 ⭐ │ ├── baseAdapter.ts │ ├── deepseekAdapter.ts │ ├── qwenAdapter.ts │ ├── claudeAdapter.ts │ └── openaiAdapter.ts ├── types/ │ └── llm.types.ts └── routes/ └── llmRoutes.ts ``` --- ### 核心类设计 #### 1. LLMGatewayService（核心） ```typescript class LLMGatewayService { private adapters: Map; async chat(options: ChatOptions): Promise { // 1. 验证用户权限（Feature Flag） await this.checkAccess(options.userId, options.modelType); // 2. 检查配额 await quotaService.checkQuota(options.userId); // 3. 选择适配器 const adapter = this.adapters.get(options.modelType); // 4. 调用LLM API const response = await adapter.chat(options); // 5. 记录使用量 await usageService.record({ userId: options.userId, modelType: options.modelType, tokenCount: response.tokenUsage, cost: this.calculateCost(options.modelType, response.tokenUsage) }); // 6. 返回结果 return response; } private calculateCost(modelType: ModelType, tokens: number): number { const prices = { 'deepseek-v3': 0.000001, // ¥1/百万tokens 'qwen3-72b': 0.000005, // ¥5/百万tokens 'claude-3.5': 0.00003 // $15/百万tokens ≈ ¥0.0003/千tokens }; return tokens * prices[modelType]; } } ``` #### 2. BaseLLMAdapter（适配器基类） ```typescript abstract class BaseLLMAdapter { abstract chat(options: ChatOptions): Promise; abstract chatStream(options: ChatOptions): AsyncIterator; protected abstract buildRequest(options: ChatOptions): any; protected abstract parseResponse(response: any): ChatResponse; } ``` #### 3. DeepSeekAdapter（实现示例） ```typescript class DeepSeekAdapter extends BaseLLMAdapter { private apiKey: string; private baseUrl = 'https://api.deepseek.com/v1'; async chat(options: ChatOptions): Promise { const request = this.buildRequest(options); const response = await fetch(`${this.baseUrl}/chat/completions`, { method: 'POST', headers: { 'Authorization': `Bearer ${this.apiKey}`, 'Content-Type': 'application/json' }, body: JSON.stringify(request) }); const data = await response.json(); return this.parseResponse(data); } protected buildRequest(options: ChatOptions) { return { model: 'deepseek-chat', messages: options.messages, temperature: options.temperature || 0.7, max_tokens: options.maxTokens || 4096, stream: options.stream || false }; } protected parseResponse(response: any): ChatResponse { return { content: response.choices[0].message.content, tokenUsage: response.usage.total_tokens, finishReason: response.choices[0].finish_reason }; } } ``` --- ## 📊 数据库设计 ### platform_schema.llm_usage ```sql CREATE TABLE platform_schema.llm_usage ( id SERIAL PRIMARY KEY, user_id INTEGER REFERENCES platform_schema.users(id), model_type VARCHAR(50) NOT NULL, -- 'deepseek-v3', 'qwen3', 'claude-3.5' input_tokens INTEGER NOT NULL, output_tokens INTEGER NOT NULL, total_tokens INTEGER NOT NULL, cost DECIMAL(10, 6) NOT NULL, -- 实际成本（人民币） request_id VARCHAR(100), -- LLM API返回的request_id module VARCHAR(50), -- 哪个模块调用的：'AIA', 'ASL', 'PKB'等 created_at TIMESTAMP DEFAULT NOW(), INDEX idx_user_created (user_id, created_at), INDEX idx_module (module) ); ``` ### platform_schema.llm_quotas ```sql CREATE TABLE platform_schema.llm_quotas ( id SERIAL PRIMARY KEY, user_id INTEGER REFERENCES platform_schema.users(id) UNIQUE, monthly_token_limit INTEGER NOT NULL, -- 月度token配额 monthly_cost_limit DECIMAL(10, 2), -- 月度成本上限（可选） reset_day INTEGER DEFAULT 1, -- 每月重置日期（1-28） created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() ); ``` --- ## 📋 API端点 ### 1. 聊天接口（非流式） ``` POST /api/v1/llm/chat Request: { "modelType": "deepseek-v3", "messages": [ { "role": "user", "content": "分析这篇文献..." } ], "temperature": 0.7, "maxTokens": 4096 } Response: { "content": "根据文献内容分析...", "tokenUsage": { "input": 150, "output": 500, "total": 650 }, "cost": 0.00065, "modelType": "deepseek-v3" } ``` ### 2. 聊天接口（流式） ``` POST /api/v1/llm/chat/stream Request: 同上 + "stream": true Response: Server-Sent Events (SSE) data: {"chunk": "根据", "tokenUsage": 1} data: {"chunk": "文献", "tokenUsage": 1} ... data: {"done": true, "totalTokens": 650, "cost": 0.00065} ``` ### 3. 查询配额 ``` GET /api/v1/llm/quota Response: { "monthlyLimit": 1000000, "used": 245000, "remaining": 755000, "resetDate": "2025-12-01" } ``` ### 4. 使用统计 ``` GET /api/v1/llm/usage?startDate=2025-11-01&endDate=2025-11-30 Response: { "totalTokens": 245000, "totalCost": 1.23, "byModel": { "deepseek-v3": { "tokens": 200000, "cost": 0.20 }, "qwen3-72b": { "tokens": 45000, "cost": 0.23 } }, "byModule": { "AIA": 100000, "ASL": 120000, "PKB": 25000 } } ``` --- ## ⚠️ 关键技术难点 ### 1. 流式输出的实现 **技术方案：** Server-Sent Events (SSE) ```typescript // 后端（Fastify） app.post('/api/v1/llm/chat/stream', async (req, reply) => { reply.raw.setHeader('Content-Type', 'text/event-stream'); reply.raw.setHeader('Cache-Control', 'no-cache'); reply.raw.setHeader('Connection', 'keep-alive'); const stream = await llmGateway.chatStream(req.body); for await (const chunk of stream) { reply.raw.write(`data: ${JSON.stringify(chunk)}\n\n`); } reply.raw.end(); }); // 前端（React） const eventSource = new EventSource('/api/v1/llm/chat/stream'); eventSource.onmessage = (event) => { const data = JSON.parse(event.data); setMessages(prev => [...prev, data.chunk]); }; ``` --- ### 2. 错误处理和重试 ```typescript async function chatWithRetry(options: ChatOptions, maxRetries = 3) { for (let i = 0; i < maxRetries; i++) { try { return await llmGateway.chat(options); } catch (error) { if (error.code === 'RATE_LIMIT' && i < maxRetries - 1) { await sleep(2000 * (i + 1)); // 指数退避 continue; } throw error; } } } ``` --- ### 3. Token计数（精确计费） **问题：** 不同模型的tokenizer不同 **解决方案：** - 使用各厂商提供的API返回值（最准确） - 备用方案：tiktoken库（OpenAI tokenizer） ```typescript import { encoding_for_model } from 'tiktoken'; function estimateTokens(text: string, model: string): number { const encoder = encoding_for_model(model); const tokens = encoder.encode(text); encoder.free(); return tokens.length; } ``` --- ## 📅 开发计划（3天） ### Day 1：基础架构（6-8小时） - [ ] 创建目录结构 - [ ] 实现BaseLLMAdapter抽象类 - [ ] 实现DeepSeekAdapter - [ ] 数据库表创建（llm_usage, llm_quotas） - [ ] 基础API端点（非流式） ### Day 2：核心功能（6-8小时） - [ ] Feature Flag集成 - [ ] 配额检查和记录 - [ ] 实现QwenAdapter - [ ] 错误处理和重试机制 - [ ] 单元测试 ### Day 3：流式输出 + 优化（6-8小时） - [ ] 实现流式输出（SSE） - [ ] 前端SSE接收处理 - [ ] 成本统计API - [ ] 配额查询API - [ ] 集成测试 - [ ] 文档完善 --- ## ✅ 开发检查清单 **开始前确认：** - [ ] Feature Flag表已创建（platform_schema.feature_flags） - [ ] 用户表已有version字段（professional/premium/enterprise） - [ ] 各LLM厂商API Key已配置 - [ ] Prisma Schema已更新 **开发中：** - [ ] 每个适配器都有完整的错误处理 - [ ] 所有LLM调用都记录到llm_usage表 - [ ] 配额检查在每次调用前执行 - [ ] 流式和非流式都已测试 **完成后：** - [ ] ASL模块可以成功调用LLM网关 - [ ] ADMIN可以查看用户LLM使用统计 - [ ] 配额超限会正确拒绝请求 --- ## 🔗 相关文档 **依赖：** - [用户与权限中心(UAM)](../../01-平台基础层/01-用户与权限中心(UAM)/README.md) - Feature Flag - [运营管理端](../../03-业务模块/ADMIN-运营管理端/README.md) - LLM模型管理 **被依赖：** - [ASL-AI智能文献](../../03-业务模块/ASL-AI智能文献/README.md) ⭐ P0 - [AIA-AI智能问答](../../03-业务模块/AIA-AI智能问答/README.md) - [PKB-个人知识库](../../03-业务模块/PKB-个人知识库/README.md) --- **最后更新：** 2025-11-06 **维护人：** 技术架构师 **优先级：** P0 ⭐⭐⭐⭐⭐