# [AI对接] LLM网关快速上下文 > **阅读时间:** 5分钟 | **Token消耗:** ~2000 tokens > **层级:** L2 | **优先级:** P0 ⭐⭐⭐⭐⭐ > **前置阅读:** 02-通用能力层/[AI对接] 通用能力快速上下文.md --- ## 📋 能力定位 **LLM大模型网关是整个平台的AI调用中枢,是商业模式的技术基础。** **为什么是P0优先级:** - 71%的业务模块依赖(5个模块:AIA、ASL、PKB、DC、RVW) - ASL模块开发的**前置条件** - 商业模式的**技术基础**(Feature Flag + 成本控制) **状态:** ❌ 待实现 **建议时间:** ASL Week 1(Day 1-3)同步开发 --- ## 🎯 核心功能 ### 1. 根据用户版本选择模型 ⭐⭐⭐⭐⭐ **商业价值:** ``` 专业版(¥99/月)→ DeepSeek-V3(¥1/百万tokens) 高级版(¥299/月)→ DeepSeek + Qwen3-72B(¥5/百万tokens) 旗舰版(¥999/月)→ 全部模型(含Claude/GPT) ``` **实现方式:** ```typescript // 查询用户Feature Flag const userFlags = await featureFlagService.getUserFlags(userId); // 根据Feature Flag选择可用模型 if (requestModel === 'claude-3.5' && !userFlags.includes('claude_access')) { throw new Error('您的套餐不支持Claude模型,请升级到旗舰版'); } // 或自动降级 if (!userFlags.includes('claude_access')) { model = 'deepseek-v3'; // 自动降级到DeepSeek } ``` --- ### 2. 统一调用接口 ⭐⭐⭐⭐⭐ **问题:** 不同LLM厂商API格式不同 - OpenAI格式 - Anthropic格式 - 国产大模型格式(DeepSeek、Qwen) **解决方案:** 统一接口 + 适配器模式 ```typescript // 业务模块统一调用 const response = await llmGateway.chat({ userId: 'user123', modelType: 'deepseek-v3', // 或 'qwen3', 'claude-3.5' messages: [ { role: 'user', content: '帮我分析这篇文献...' } ], stream: false }); // LLM网关内部: // 1. 检查用户权限(Feature Flag) // 2. 检查配额 // 3. 选择对应的适配器 // 4. 调用API // 5. 记录成本 // 6. 返回统一格式 ``` --- ### 3. 成本控制 ⭐⭐⭐⭐ **核心需求:** - 每个用户有月度配额 - 超出配额自动限流 - 实时成本统计 **实现:** ```typescript // 调用前检查配额 async function checkQuota(userId: string): Promise { const usage = await getMonthlyUsage(userId); const quota = await getUserQuota(userId); if (usage.tokenCount >= quota.maxTokens) { throw new QuotaExceededError('您的月度配额已用完,请升级套餐'); } return true; } // 调用后记录成本 async function recordUsage(userId: string, usage: { modelType: string; tokenCount: number; cost: number; }) { await db.llmUsage.create({ userId, modelType, inputTokens: usage.tokenCount, cost: usage.cost, timestamp: new Date() }); } ``` --- ### 4. 流式/非流式统一处理 ⭐⭐⭐ **场景:** - AIA智能问答 → 需要流式输出(实时显示) - ASL文献筛选 → 非流式(批量处理) **统一接口:** ```typescript interface ChatOptions { userId: string; modelType: ModelType; messages: Message[]; stream: boolean; // 是否流式输出 temperature?: number; maxTokens?: number; } // 流式 const stream = await llmGateway.chat({ ...options, stream: true }); for await (const chunk of stream) { console.log(chunk.content); } // 非流式 const response = await llmGateway.chat({ ...options, stream: false }); console.log(response.content); ``` --- ## 🏗️ 技术架构 ### 目录结构 ``` backend/src/modules/llm-gateway/ ├── controllers/ │ └── llmController.ts # HTTP接口 ├── services/ │ ├── llmGatewayService.ts # 核心服务 ⭐ │ ├── featureFlagService.ts # Feature Flag查询 │ ├── quotaService.ts # 配额管理 │ └── usageService.ts # 使用统计 ├── adapters/ # 适配器模式 ⭐ │ ├── baseAdapter.ts │ ├── deepseekAdapter.ts │ ├── qwenAdapter.ts │ ├── claudeAdapter.ts │ └── openaiAdapter.ts ├── types/ │ └── llm.types.ts └── routes/ └── llmRoutes.ts ``` --- ### 核心类设计 #### 1. LLMGatewayService(核心) ```typescript class LLMGatewayService { private adapters: Map; async chat(options: ChatOptions): Promise { // 1. 验证用户权限(Feature Flag) await this.checkAccess(options.userId, options.modelType); // 2. 检查配额 await quotaService.checkQuota(options.userId); // 3. 选择适配器 const adapter = this.adapters.get(options.modelType); // 4. 调用LLM API const response = await adapter.chat(options); // 5. 记录使用量 await usageService.record({ userId: options.userId, modelType: options.modelType, tokenCount: response.tokenUsage, cost: this.calculateCost(options.modelType, response.tokenUsage) }); // 6. 返回结果 return response; } private calculateCost(modelType: ModelType, tokens: number): number { const prices = { 'deepseek-v3': 0.000001, // ¥1/百万tokens 'qwen3-72b': 0.000005, // ¥5/百万tokens 'claude-3.5': 0.00003 // $15/百万tokens ≈ ¥0.0003/千tokens }; return tokens * prices[modelType]; } } ``` #### 2. BaseLLMAdapter(适配器基类) ```typescript abstract class BaseLLMAdapter { abstract chat(options: ChatOptions): Promise; abstract chatStream(options: ChatOptions): AsyncIterator; protected abstract buildRequest(options: ChatOptions): any; protected abstract parseResponse(response: any): ChatResponse; } ``` #### 3. DeepSeekAdapter(实现示例) ```typescript class DeepSeekAdapter extends BaseLLMAdapter { private apiKey: string; private baseUrl = 'https://api.deepseek.com/v1'; async chat(options: ChatOptions): Promise { const request = this.buildRequest(options); const response = await fetch(`${this.baseUrl}/chat/completions`, { method: 'POST', headers: { 'Authorization': `Bearer ${this.apiKey}`, 'Content-Type': 'application/json' }, body: JSON.stringify(request) }); const data = await response.json(); return this.parseResponse(data); } protected buildRequest(options: ChatOptions) { return { model: 'deepseek-chat', messages: options.messages, temperature: options.temperature || 0.7, max_tokens: options.maxTokens || 4096, stream: options.stream || false }; } protected parseResponse(response: any): ChatResponse { return { content: response.choices[0].message.content, tokenUsage: response.usage.total_tokens, finishReason: response.choices[0].finish_reason }; } } ``` --- ## 📊 数据库设计 ### platform_schema.llm_usage ```sql CREATE TABLE platform_schema.llm_usage ( id SERIAL PRIMARY KEY, user_id INTEGER REFERENCES platform_schema.users(id), model_type VARCHAR(50) NOT NULL, -- 'deepseek-v3', 'qwen3', 'claude-3.5' input_tokens INTEGER NOT NULL, output_tokens INTEGER NOT NULL, total_tokens INTEGER NOT NULL, cost DECIMAL(10, 6) NOT NULL, -- 实际成本(人民币) request_id VARCHAR(100), -- LLM API返回的request_id module VARCHAR(50), -- 哪个模块调用的:'AIA', 'ASL', 'PKB'等 created_at TIMESTAMP DEFAULT NOW(), INDEX idx_user_created (user_id, created_at), INDEX idx_module (module) ); ``` ### platform_schema.llm_quotas ```sql CREATE TABLE platform_schema.llm_quotas ( id SERIAL PRIMARY KEY, user_id INTEGER REFERENCES platform_schema.users(id) UNIQUE, monthly_token_limit INTEGER NOT NULL, -- 月度token配额 monthly_cost_limit DECIMAL(10, 2), -- 月度成本上限(可选) reset_day INTEGER DEFAULT 1, -- 每月重置日期(1-28) created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() ); ``` --- ## 📋 API端点 ### 1. 聊天接口(非流式) ``` POST /api/v1/llm/chat Request: { "modelType": "deepseek-v3", "messages": [ { "role": "user", "content": "分析这篇文献..." } ], "temperature": 0.7, "maxTokens": 4096 } Response: { "content": "根据文献内容分析...", "tokenUsage": { "input": 150, "output": 500, "total": 650 }, "cost": 0.00065, "modelType": "deepseek-v3" } ``` ### 2. 聊天接口(流式) ``` POST /api/v1/llm/chat/stream Request: 同上 + "stream": true Response: Server-Sent Events (SSE) data: {"chunk": "根据", "tokenUsage": 1} data: {"chunk": "文献", "tokenUsage": 1} ... data: {"done": true, "totalTokens": 650, "cost": 0.00065} ``` ### 3. 查询配额 ``` GET /api/v1/llm/quota Response: { "monthlyLimit": 1000000, "used": 245000, "remaining": 755000, "resetDate": "2025-12-01" } ``` ### 4. 使用统计 ``` GET /api/v1/llm/usage?startDate=2025-11-01&endDate=2025-11-30 Response: { "totalTokens": 245000, "totalCost": 1.23, "byModel": { "deepseek-v3": { "tokens": 200000, "cost": 0.20 }, "qwen3-72b": { "tokens": 45000, "cost": 0.23 } }, "byModule": { "AIA": 100000, "ASL": 120000, "PKB": 25000 } } ``` --- ## ⚠️ 关键技术难点 ### 1. 流式输出的实现 **技术方案:** Server-Sent Events (SSE) ```typescript // 后端(Fastify) app.post('/api/v1/llm/chat/stream', async (req, reply) => { reply.raw.setHeader('Content-Type', 'text/event-stream'); reply.raw.setHeader('Cache-Control', 'no-cache'); reply.raw.setHeader('Connection', 'keep-alive'); const stream = await llmGateway.chatStream(req.body); for await (const chunk of stream) { reply.raw.write(`data: ${JSON.stringify(chunk)}\n\n`); } reply.raw.end(); }); // 前端(React) const eventSource = new EventSource('/api/v1/llm/chat/stream'); eventSource.onmessage = (event) => { const data = JSON.parse(event.data); setMessages(prev => [...prev, data.chunk]); }; ``` --- ### 2. 错误处理和重试 ```typescript async function chatWithRetry(options: ChatOptions, maxRetries = 3) { for (let i = 0; i < maxRetries; i++) { try { return await llmGateway.chat(options); } catch (error) { if (error.code === 'RATE_LIMIT' && i < maxRetries - 1) { await sleep(2000 * (i + 1)); // 指数退避 continue; } throw error; } } } ``` --- ### 3. Token计数(精确计费) **问题:** 不同模型的tokenizer不同 **解决方案:** - 使用各厂商提供的API返回值(最准确) - 备用方案:tiktoken库(OpenAI tokenizer) ```typescript import { encoding_for_model } from 'tiktoken'; function estimateTokens(text: string, model: string): number { const encoder = encoding_for_model(model); const tokens = encoder.encode(text); encoder.free(); return tokens.length; } ``` --- ## 📅 开发计划(3天) ### Day 1:基础架构(6-8小时) - [ ] 创建目录结构 - [ ] 实现BaseLLMAdapter抽象类 - [ ] 实现DeepSeekAdapter - [ ] 数据库表创建(llm_usage, llm_quotas) - [ ] 基础API端点(非流式) ### Day 2:核心功能(6-8小时) - [ ] Feature Flag集成 - [ ] 配额检查和记录 - [ ] 实现QwenAdapter - [ ] 错误处理和重试机制 - [ ] 单元测试 ### Day 3:流式输出 + 优化(6-8小时) - [ ] 实现流式输出(SSE) - [ ] 前端SSE接收处理 - [ ] 成本统计API - [ ] 配额查询API - [ ] 集成测试 - [ ] 文档完善 --- ## ✅ 开发检查清单 **开始前确认:** - [ ] Feature Flag表已创建(platform_schema.feature_flags) - [ ] 用户表已有version字段(professional/premium/enterprise) - [ ] 各LLM厂商API Key已配置 - [ ] Prisma Schema已更新 **开发中:** - [ ] 每个适配器都有完整的错误处理 - [ ] 所有LLM调用都记录到llm_usage表 - [ ] 配额检查在每次调用前执行 - [ ] 流式和非流式都已测试 **完成后:** - [ ] ASL模块可以成功调用LLM网关 - [ ] ADMIN可以查看用户LLM使用统计 - [ ] 配额超限会正确拒绝请求 --- ## 🔗 相关文档 **依赖:** - [用户与权限中心(UAM)](../../01-平台基础层/01-用户与权限中心(UAM)/README.md) - Feature Flag - [运营管理端](../../03-业务模块/ADMIN-运营管理端/README.md) - LLM模型管理 **被依赖:** - [ASL-AI智能文献](../../03-业务模块/ASL-AI智能文献/README.md) ⭐ P0 - [AIA-AI智能问答](../../03-业务模块/AIA-AI智能问答/README.md) - [PKB-个人知识库](../../03-业务模块/PKB-个人知识库/README.md) --- **最后更新:** 2025-11-06 **维护人:** 技术架构师 **优先级:** P0 ⭐⭐⭐⭐⭐