Files
AIclinicalresearch/docs/02-通用能力层/01-LLM大模型网关/[AI对接] LLM网关快速上下文.md
HaHafeng 88cc049fb3 feat(asl): Complete Day 5 - Fulltext Screening Backend API Development
- Implement 5 core API endpoints (create task, get progress, get results, update decision, export Excel)
- Add FulltextScreeningController with Zod validation (652 lines)
- Implement ExcelExporter service with 4-sheet report generation (352 lines)
- Register routes under /api/v1/asl/fulltext-screening
- Create 31 REST Client test cases
- Add automated integration test script
- Fix PDF extraction fallback mechanism in LLM12FieldsService
- Update API design documentation to v3.0
- Update development plan to v1.2
- Create Day 5 development record
- Clean up temporary test files
2025-11-23 10:52:07 +08:00

548 lines
13 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# [AI对接] LLM网关快速上下文
> **阅读时间:** 5分钟 | **Token消耗** ~2000 tokens
> **层级:** L2 | **优先级:** P0 ⭐⭐⭐⭐⭐
> **前置阅读:** 02-通用能力层/[AI对接] 通用能力快速上下文.md
---
## 📋 能力定位
**LLM大模型网关是整个平台的AI调用中枢是商业模式的技术基础。**
**为什么是P0优先级**
- 71%的业务模块依赖5个模块AIA、ASL、PKB、DC、RVW
- ASL模块开发的**前置条件**
- 商业模式的**技术基础**Feature Flag + 成本控制)
**状态:** ❌ 待实现
**建议时间:** ASL Week 1Day 1-3同步开发
---
## 🎯 核心功能
### 1. 根据用户版本选择模型 ⭐⭐⭐⭐⭐
**商业价值:**
```
专业版¥99/月)→ DeepSeek-V3¥1/百万tokens
高级版¥299/月)→ DeepSeek + Qwen3-72B¥5/百万tokens
旗舰版¥999/月)→ 全部模型含Claude/GPT
```
**实现方式:**
```typescript
// 查询用户Feature Flag
const userFlags = await featureFlagService.getUserFlags(userId);
// 根据Feature Flag选择可用模型
if (requestModel === 'claude-3.5' && !userFlags.includes('claude_access')) {
throw new Error('您的套餐不支持Claude模型请升级到旗舰版');
}
// 或自动降级
if (!userFlags.includes('claude_access')) {
model = 'deepseek-v3'; // 自动降级到DeepSeek
}
```
---
### 2. 统一调用接口 ⭐⭐⭐⭐⭐
**问题:** 不同LLM厂商API格式不同
- OpenAI格式
- Anthropic格式
- 国产大模型格式DeepSeek、Qwen
**解决方案:** 统一接口 + 适配器模式
```typescript
// 业务模块统一调用
const response = await llmGateway.chat({
userId: 'user123',
modelType: 'deepseek-v3', // 或 'qwen3', 'claude-3.5'
messages: [
{ role: 'user', content: '帮我分析这篇文献...' }
],
stream: false
});
// LLM网关内部
// 1. 检查用户权限Feature Flag
// 2. 检查配额
// 3. 选择对应的适配器
// 4. 调用API
// 5. 记录成本
// 6. 返回统一格式
```
---
### 3. 成本控制 ⭐⭐⭐⭐
**核心需求:**
- 每个用户有月度配额
- 超出配额自动限流
- 实时成本统计
**实现:**
```typescript
// 调用前检查配额
async function checkQuota(userId: string): Promise<boolean> {
const usage = await getMonthlyUsage(userId);
const quota = await getUserQuota(userId);
if (usage.tokenCount >= quota.maxTokens) {
throw new QuotaExceededError('您的月度配额已用完,请升级套餐');
}
return true;
}
// 调用后记录成本
async function recordUsage(userId: string, usage: {
modelType: string;
tokenCount: number;
cost: number;
}) {
await db.llmUsage.create({
userId,
modelType,
inputTokens: usage.tokenCount,
cost: usage.cost,
timestamp: new Date()
});
}
```
---
### 4. 流式/非流式统一处理 ⭐⭐⭐
**场景:**
- AIA智能问答 → 需要流式输出(实时显示)
- ASL文献筛选 → 非流式(批量处理)
**统一接口:**
```typescript
interface ChatOptions {
userId: string;
modelType: ModelType;
messages: Message[];
stream: boolean; // 是否流式输出
temperature?: number;
maxTokens?: number;
}
// 流式
const stream = await llmGateway.chat({ ...options, stream: true });
for await (const chunk of stream) {
console.log(chunk.content);
}
// 非流式
const response = await llmGateway.chat({ ...options, stream: false });
console.log(response.content);
```
---
## 🏗️ 技术架构
### 目录结构
```
backend/src/modules/llm-gateway/
├── controllers/
│ └── llmController.ts # HTTP接口
├── services/
│ ├── llmGatewayService.ts # 核心服务 ⭐
│ ├── featureFlagService.ts # Feature Flag查询
│ ├── quotaService.ts # 配额管理
│ └── usageService.ts # 使用统计
├── adapters/ # 适配器模式 ⭐
│ ├── baseAdapter.ts
│ ├── deepseekAdapter.ts
│ ├── qwenAdapter.ts
│ ├── claudeAdapter.ts
│ └── openaiAdapter.ts
├── types/
│ └── llm.types.ts
└── routes/
└── llmRoutes.ts
```
---
### 核心类设计
#### 1. LLMGatewayService核心
```typescript
class LLMGatewayService {
private adapters: Map<ModelType, BaseLLMAdapter>;
async chat(options: ChatOptions): Promise<ChatResponse | AsyncIterator> {
// 1. 验证用户权限Feature Flag
await this.checkAccess(options.userId, options.modelType);
// 2. 检查配额
await quotaService.checkQuota(options.userId);
// 3. 选择适配器
const adapter = this.adapters.get(options.modelType);
// 4. 调用LLM API
const response = await adapter.chat(options);
// 5. 记录使用量
await usageService.record({
userId: options.userId,
modelType: options.modelType,
tokenCount: response.tokenUsage,
cost: this.calculateCost(options.modelType, response.tokenUsage)
});
// 6. 返回结果
return response;
}
private calculateCost(modelType: ModelType, tokens: number): number {
const prices = {
'deepseek-v3': 0.000001, // ¥1/百万tokens
'qwen3-72b': 0.000005, // ¥5/百万tokens
'claude-3.5': 0.00003 // $15/百万tokens ≈ ¥0.0003/千tokens
};
return tokens * prices[modelType];
}
}
```
#### 2. BaseLLMAdapter适配器基类
```typescript
abstract class BaseLLMAdapter {
abstract chat(options: ChatOptions): Promise<ChatResponse>;
abstract chatStream(options: ChatOptions): AsyncIterator<ChatChunk>;
protected abstract buildRequest(options: ChatOptions): any;
protected abstract parseResponse(response: any): ChatResponse;
}
```
#### 3. DeepSeekAdapter实现示例
```typescript
class DeepSeekAdapter extends BaseLLMAdapter {
private apiKey: string;
private baseUrl = 'https://api.deepseek.com/v1';
async chat(options: ChatOptions): Promise<ChatResponse> {
const request = this.buildRequest(options);
const response = await fetch(`${this.baseUrl}/chat/completions`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${this.apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify(request)
});
const data = await response.json();
return this.parseResponse(data);
}
protected buildRequest(options: ChatOptions) {
return {
model: 'deepseek-chat',
messages: options.messages,
temperature: options.temperature || 0.7,
max_tokens: options.maxTokens || 4096,
stream: options.stream || false
};
}
protected parseResponse(response: any): ChatResponse {
return {
content: response.choices[0].message.content,
tokenUsage: response.usage.total_tokens,
finishReason: response.choices[0].finish_reason
};
}
}
```
---
## 📊 数据库设计
### platform_schema.llm_usage
```sql
CREATE TABLE platform_schema.llm_usage (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES platform_schema.users(id),
model_type VARCHAR(50) NOT NULL, -- 'deepseek-v3', 'qwen3', 'claude-3.5'
input_tokens INTEGER NOT NULL,
output_tokens INTEGER NOT NULL,
total_tokens INTEGER NOT NULL,
cost DECIMAL(10, 6) NOT NULL, -- 实际成本(人民币)
request_id VARCHAR(100), -- LLM API返回的request_id
module VARCHAR(50), -- 哪个模块调用的:'AIA', 'ASL', 'PKB'等
created_at TIMESTAMP DEFAULT NOW(),
INDEX idx_user_created (user_id, created_at),
INDEX idx_module (module)
);
```
### platform_schema.llm_quotas
```sql
CREATE TABLE platform_schema.llm_quotas (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES platform_schema.users(id) UNIQUE,
monthly_token_limit INTEGER NOT NULL, -- 月度token配额
monthly_cost_limit DECIMAL(10, 2), -- 月度成本上限(可选)
reset_day INTEGER DEFAULT 1, -- 每月重置日期1-28
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
```
---
## 📋 API端点
### 1. 聊天接口(非流式)
```
POST /api/v1/llm/chat
Request:
{
"modelType": "deepseek-v3",
"messages": [
{ "role": "user", "content": "分析这篇文献..." }
],
"temperature": 0.7,
"maxTokens": 4096
}
Response:
{
"content": "根据文献内容分析...",
"tokenUsage": {
"input": 150,
"output": 500,
"total": 650
},
"cost": 0.00065,
"modelType": "deepseek-v3"
}
```
### 2. 聊天接口(流式)
```
POST /api/v1/llm/chat/stream
Request: 同上 + "stream": true
Response: Server-Sent Events (SSE)
data: {"chunk": "根据", "tokenUsage": 1}
data: {"chunk": "文献", "tokenUsage": 1}
...
data: {"done": true, "totalTokens": 650, "cost": 0.00065}
```
### 3. 查询配额
```
GET /api/v1/llm/quota
Response:
{
"monthlyLimit": 1000000,
"used": 245000,
"remaining": 755000,
"resetDate": "2025-12-01"
}
```
### 4. 使用统计
```
GET /api/v1/llm/usage?startDate=2025-11-01&endDate=2025-11-30
Response:
{
"totalTokens": 245000,
"totalCost": 1.23,
"byModel": {
"deepseek-v3": { "tokens": 200000, "cost": 0.20 },
"qwen3-72b": { "tokens": 45000, "cost": 0.23 }
},
"byModule": {
"AIA": 100000,
"ASL": 120000,
"PKB": 25000
}
}
```
---
## ⚠️ 关键技术难点
### 1. 流式输出的实现
**技术方案:** Server-Sent Events (SSE)
```typescript
// 后端Fastify
app.post('/api/v1/llm/chat/stream', async (req, reply) => {
reply.raw.setHeader('Content-Type', 'text/event-stream');
reply.raw.setHeader('Cache-Control', 'no-cache');
reply.raw.setHeader('Connection', 'keep-alive');
const stream = await llmGateway.chatStream(req.body);
for await (const chunk of stream) {
reply.raw.write(`data: ${JSON.stringify(chunk)}\n\n`);
}
reply.raw.end();
});
// 前端React
const eventSource = new EventSource('/api/v1/llm/chat/stream');
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
setMessages(prev => [...prev, data.chunk]);
};
```
---
### 2. 错误处理和重试
```typescript
async function chatWithRetry(options: ChatOptions, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await llmGateway.chat(options);
} catch (error) {
if (error.code === 'RATE_LIMIT' && i < maxRetries - 1) {
await sleep(2000 * (i + 1)); // 指数退避
continue;
}
throw error;
}
}
}
```
---
### 3. Token计数精确计费
**问题:** 不同模型的tokenizer不同
**解决方案:**
- 使用各厂商提供的API返回值最准确
- 备用方案tiktoken库OpenAI tokenizer
```typescript
import { encoding_for_model } from 'tiktoken';
function estimateTokens(text: string, model: string): number {
const encoder = encoding_for_model(model);
const tokens = encoder.encode(text);
encoder.free();
return tokens.length;
}
```
---
## 📅 开发计划3天
### Day 1基础架构6-8小时
- [ ] 创建目录结构
- [ ] 实现BaseLLMAdapter抽象类
- [ ] 实现DeepSeekAdapter
- [ ] 数据库表创建llm_usage, llm_quotas
- [ ] 基础API端点非流式
### Day 2核心功能6-8小时
- [ ] Feature Flag集成
- [ ] 配额检查和记录
- [ ] 实现QwenAdapter
- [ ] 错误处理和重试机制
- [ ] 单元测试
### Day 3流式输出 + 优化6-8小时
- [ ] 实现流式输出SSE
- [ ] 前端SSE接收处理
- [ ] 成本统计API
- [ ] 配额查询API
- [ ] 集成测试
- [ ] 文档完善
---
## ✅ 开发检查清单
**开始前确认:**
- [ ] Feature Flag表已创建platform_schema.feature_flags
- [ ] 用户表已有version字段professional/premium/enterprise
- [ ] 各LLM厂商API Key已配置
- [ ] Prisma Schema已更新
**开发中:**
- [ ] 每个适配器都有完整的错误处理
- [ ] 所有LLM调用都记录到llm_usage表
- [ ] 配额检查在每次调用前执行
- [ ] 流式和非流式都已测试
**完成后:**
- [ ] ASL模块可以成功调用LLM网关
- [ ] ADMIN可以查看用户LLM使用统计
- [ ] 配额超限会正确拒绝请求
---
## 🔗 相关文档
**依赖:**
- [用户与权限中心(UAM)](../../01-平台基础层/01-用户与权限中心(UAM)/README.md) - Feature Flag
- [运营管理端](../../03-业务模块/ADMIN-运营管理端/README.md) - LLM模型管理
**被依赖:**
- [ASL-AI智能文献](../../03-业务模块/ASL-AI智能文献/README.md) ⭐ P0
- [AIA-AI智能问答](../../03-业务模块/AIA-AI智能问答/README.md)
- [PKB-个人知识库](../../03-业务模块/PKB-个人知识库/README.md)
---
**最后更新:** 2025-11-06
**维护人:** 技术架构师
**优先级:** P0 ⭐⭐⭐⭐⭐