Files

HaHafeng fa72beea6c feat(platform): Complete Postgres-Only architecture refactoring (Phase 1-7)

Major Changes:
- Implement Platform-Only architecture pattern (unified task management)
- Add PostgresCacheAdapter for unified caching (platform_schema.app_cache)
- Add PgBossQueue for job queue management (platform_schema.job)
- Implement CheckpointService using job.data (generic for all modules)
- Add intelligent threshold-based dual-mode processing (THRESHOLD=50)
- Add task splitting mechanism (auto chunk size recommendation)
- Refactor ASL screening service with smart mode selection
- Refactor DC extraction service with smart mode selection
- Register workers for ASL and DC modules

Technical Highlights:
- All task management data stored in platform_schema.job.data (JSONB)
- Business tables remain clean (no task management fields)
- CheckpointService is generic (shared by all modules)
- Zero code duplication (DRY principle)
- Follows 3-layer architecture principle
- Zero additional cost (no Redis needed, save 8400 CNY/year)

Code Statistics:
- New code: ~1750 lines
- Modified code: ~500 lines
- Test code: ~1800 lines
- Documentation: ~3000 lines

Testing:
- Unit tests: 8/8 passed
- Integration tests: 2/2 passed
- Architecture validation: passed
- Linter errors: 0

Files:
- Platform layer: PostgresCacheAdapter, PgBossQueue, CheckpointService, utils
- ASL module: screeningService, screeningWorker
- DC module: ExtractionController, extractionWorker
- Tests: 11 test files
- Docs: Updated 4 key documents

Status: Phase 1-7 completed, Phase 8-9 pending

2025-12-13 16:10:04 +08:00

52 KiB

Raw Blame History

Redis改造实施计划（缓存+队列完整版）

文档版本： V2.0
更新日期： 2025-12-12
目标完成时间： 2025-12-18（7天）
负责人： 技术团队
风险等级： 🟡 中等（有降级方案）
重要变更： Redis队列从"可选"调整为"必须"

⚠️ 重要说明（V2.0更新）

经过深入分析，Redis队列不是可选项，而是核心功能的必须项：

ASL文献筛选：1000篇文献需要2小时，不用Redis队列失败率 > 95%
DC Tool B病历提取：1000份病历需要2-3小时，同样问题
SAE实例特性：15分钟无流量自动缩容，长任务必然失败

因此本计划调整为：缓存+队列一起实施（7天完成）

1. 改造背景与目标

1.1 为什么要改造？

当前问题：

❌ 违反云原生规范：系统使用内存缓存，违反自己制定的云原生开发规范
❌ LLM成本失控：缓存不持久化，导致重复调用DeepSeek/Qwen API
❌ 长任务不可靠：30-60分钟的文献筛选任务，SAE实例重启后丢失
❌ 多实例不同步：SAE扩容后，各实例缓存不共享
❌ Serverless不适配：内存状态在Serverless环境下不可靠

改造目标：

✅ 符合架构规范：使用分布式缓存（Redis）
✅ 降低API成本：LLM结果缓存持久化，避免重复调用
✅ 任务持久化：长时间任务不因实例重启而丢失
✅ 支持多实例：缓存在多实例间共享
✅ 平滑过渡：保留降级方案，确保系统稳定

2. 当前系统状态分析

2.1 已使用缓存的位置

位置1：HealthCheckService.ts

// 文件：backend/src/modules/dc/tool-b/services/HealthCheckService.ts
// 第47行：读取缓存
const cached = await cache.get<HealthCheckResult>(cacheKey);

// 第145行：写入缓存
await cache.set(cacheKey, result, 86400);  // 24小时

用途：Excel健康检查结果缓存
重要性：🟡 中等（避免重复解析Excel）
数据量：~5KB/项

位置2：LLM12FieldsService.ts

// 文件：backend/src/modules/asl/common/llm/LLM12FieldsService.ts
// 第516行：读取缓存
const cached = await cache.get(cacheKey);

// 第530行：写入缓存
await cache.set(cacheKey, JSON.stringify(result), 3600);  // 1小时

用途：LLM 12字段提取结果缓存
重要性：🔴 高（直接影响API成本）
数据量：~50KB/项
成本影响：

单次提取成本：~¥0.43/篇
如果缓存失效，重复调用：
- 10次 = ¥4.3
- 100次 = ¥43
- 1000次 = ¥430

2.2 长时间异步任务

ASL模块：文献筛选任务

// 文件：backend/src/modules/asl/services/screeningService.ts
// 第63-65行
processLiteraturesInBackground(task.id, projectId, literatures);

问题：

199篇文献需要 33-66分钟
当前使用内存队列（MemoryQueue）
SAE实例重启/缩容时任务丢失

影响：

用户体验极差（任务突然消失）
已处理结果丢失，浪费API费用
无法追溯任务状态

2.3 当前架构配置

# backend/.env
CACHE_TYPE=memory    # ← 需要改为 redis
QUEUE_TYPE=memory    # ← 需要改为 redis

3. Redis配置信息

3.1 阿里云Redis购买信息

配置项	值	说明
产品	Redis 开源版	完整Redis功能
付费方式	包年包月	首次购买享6折优惠
部署模式	云原生（高可用）	主从自动切换
系列	标准版	满足需求
地域	华北2（北京）	与SAE同地域
实例类型	高可用	✅ 99.95%可用性
大版本	Redis 7.0	最新稳定版
架构类型	不启用集群（单节点）	满足当前规模
分片规格	256 MB	初期足够
分片数量	1	单分片
读写分离	关闭	简化配置

3.2 预估成本

基础价格：¥72/年（单机版）
高可用版：¥180/年（估算）
首次购买：¥108/年（6折后）

对比收益：
- 节省LLM API费用：>¥500/年
- 提升用户满意度：无价
- ROI：>400%

3.3 连接信息（购买后获取）

# 阿里云控制台 → Redis实例 → 连接信息
REDIS_HOST=r-xxxxxxxxxxxx.redis.rds.aliyuncs.com
REDIS_PORT=6379
REDIS_PASSWORD=your_secure_password_here
REDIS_DB=0

# 或使用连接字符串
REDIS_URL=redis://:your_password@r-xxxxxxxxxxxx.redis.rds.aliyuncs.com:6379/0

4. 改造详细步骤

4.1 Phase 1：本地开发环境准备 ✅

步骤1.1：安装依赖

cd backend
npm install ioredis --save
npm install @types/ioredis --save-dev

步骤1.2：配置本地Redis

# 确认Docker Redis正在运行
docker ps | findstr redis

# 如果没有运行，启动它
docker start ai-clinical-redis

# 测试连接
docker exec -it ai-clinical-redis redis-cli ping
# 应该返回：PONG

步骤1.3：更新本地.env

# backend/.env
DATABASE_URL=postgresql://postgres:postgres123@localhost:5432/ai_clinical_research

# ==================== Redis配置 ====================
# 启用Redis缓存
CACHE_TYPE=redis
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
# REDIS_PASSWORD=  # 本地无密码

# 队列暂时用内存（分阶段启用）
QUEUE_TYPE=memory

# ==================== JWT ====================
JWT_SECRET=your-secret-key-change-in-production
JWT_EXPIRES_IN=7d

# ==================== LLM API ====================
DEEPSEEK_API_KEY=sk-7f8cc37a79fa4799860b38fc7ba2e150
DASHSCOPE_API_KEY=sk-75b4ff29a14a49e79667a331034f3298

# ==================== Dify ====================
DIFY_API_URL=http://localhost/v1
DIFY_API_KEY=dataset-mfvdiKvQ2l3NvxWm7RoYMN3c

# ==================== Server ====================
PORT=3001
NODE_ENV=development

# ==================== CloseAI配置 ====================
CLOSEAI_API_KEY=sk-cu0iepbXYGGx2jc7BqP6ogtSWmP6fk918qV3RUdtGC3Ed1po
CLOSEAI_OPENAI_BASE_URL=https://api.openai-proxy.org/v1
CLOSEAI_CLAUDE_BASE_URL=https://api.openai-proxy.org/anthropic

# ==================== 存储配置 ====================
STORAGE_TYPE=local
LOCAL_STORAGE_DIR=uploads
LOCAL_STORAGE_URL=http://localhost:3001/uploads

# ==================== CORS配置 ====================
CORS_ORIGIN=http://localhost:5173

# ==================== 日志配置 ====================
LOG_LEVEL=debug

4.2 Phase 2：实现RedisCacheAdapter ✅

步骤2.1：修改RedisCacheAdapter.ts

// 文件：backend/src/common/cache/RedisCacheAdapter.ts

import Redis from 'ioredis';
import type { CacheAdapter } from './CacheAdapter.js';
import { logger } from '../logging/index.js';

/**
 * Redis缓存适配器
 * 
 * 使用ioredis客户端，支持：
 * - 字符串/对象自动序列化
 * - TTL过期时间
 * - 连接池管理
 * - 错误重试
 * 
 * @example
 * const cache = new RedisCacheAdapter({
 *   host: 'localhost',
 *   port: 6379,
 *   password: 'xxx',
 *   db: 0
 * });
 * 
 * await cache.set('key', { data: 'value' }, 60);
 * const value = await cache.get('key');
 */
export class RedisCacheAdapter implements CacheAdapter {
  private redis: Redis;

  constructor(options?: {
    host?: string;
    port?: number;
    password?: string;
    db?: number;
  }) {
    this.redis = new Redis({
      host: options?.host || 'localhost',
      port: options?.port || 6379,
      password: options?.password || undefined,
      db: options?.db || 0,
      // 连接配置
      retryStrategy: (times) => {
        const delay = Math.min(times * 50, 2000);
        logger.warn(`Redis连接重试 ${times} 次，${delay}ms后重试`);
        return delay;
      },
      maxRetriesPerRequest: 3,
      enableReadyCheck: true,
      // 连接池配置
      lazyConnect: false,
      keepAlive: 30000,
    });

    // 监听连接事件
    this.redis.on('connect', () => {
      logger.info('Redis连接成功');
    });

    this.redis.on('error', (error) => {
      logger.error('Redis连接错误', { error: error.message });
    });

    this.redis.on('close', () => {
      logger.warn('Redis连接关闭');
    });

    this.redis.on('reconnecting', () => {
      logger.warn('Redis正在重连...');
    });
  }

  /**
   * 获取缓存值
   */
  async get<T>(key: string): Promise<T | null> {
    try {
      const value = await this.redis.get(key);
      
      if (!value) {
        logger.debug('Redis缓存未命中', { key });
        return null;
      }

      logger.debug('Redis缓存命中', { key, size: value.length });

      // 尝试解析JSON
      try {
        return JSON.parse(value) as T;
      } catch {
        // 如果不是JSON，返回原始字符串
        return value as unknown as T;
      }
    } catch (error) {
      logger.error('Redis GET失败', { key, error });
      return null;  // 降级：返回null而不是抛异常
    }
  }

  /**
   * 设置缓存值
   * @param ttl 过期时间（秒），不传则永不过期（不推荐）
   */
  async set<T>(key: string, value: T, ttl?: number): Promise<void> {
    try {
      // 序列化值
      const serialized = typeof value === 'string' 
        ? value 
        : JSON.stringify(value);

      // 设置值（带TTL）
      if (ttl) {
        await this.redis.setex(key, ttl, serialized);
        logger.debug('Redis SET成功（带TTL）', { 
          key, 
          ttl, 
          size: serialized.length 
        });
      } else {
        await this.redis.set(key, serialized);
        logger.warn('Redis SET成功（无TTL）', { key });  // 警告：无过期时间
      }
    } catch (error) {
      logger.error('Redis SET失败', { key, ttl, error });
      // 不抛出异常，允许系统继续运行
    }
  }

  /**
   * 删除缓存
   */
  async delete(key: string): Promise<boolean> {
    try {
      const result = await this.redis.del(key);
      logger.debug('Redis DEL', { key, deleted: result > 0 });
      return result > 0;
    } catch (error) {
      logger.error('Redis DEL失败', { key, error });
      return false;
    }
  }

  /**
   * 检查key是否存在
   */
  async has(key: string): Promise<boolean> {
    try {
      const result = await this.redis.exists(key);
      return result > 0;
    } catch (error) {
      logger.error('Redis EXISTS失败', { key, error });
      return false;
    }
  }

  /**
   * 清空所有缓存（危险操作！）
   */
  async clear(): Promise<void> {
    try {
      await this.redis.flushdb();
      logger.warn('Redis FLUSHDB执行（所有缓存已清空）');
    } catch (error) {
      logger.error('Redis FLUSHDB失败', { error });
    }
  }

  /**
   * 测试Redis连接
   */
  async ping(): Promise<boolean> {
    try {
      const result = await this.redis.ping();
      return result === 'PONG';
    } catch (error) {
      logger.error('Redis PING失败', { error });
      return false;
    }
  }

  /**
   * 关闭连接（用于优雅关闭）
   */
  async disconnect(): Promise<void> {
    try {
      await this.redis.quit();
      logger.info('Redis连接已关闭');
    } catch (error) {
      logger.error('Redis关闭连接失败', { error });
    }
  }
}

步骤2.2：更新CacheFactory.ts（添加降级策略）

// 文件：backend/src/common/cache/CacheFactory.ts

import { config } from '../../config/env.js';
import type { CacheAdapter } from './CacheAdapter.js';
import { MemoryCacheAdapter } from './MemoryCacheAdapter.js';
import { RedisCacheAdapter } from './RedisCacheAdapter.js';
import { logger } from '../logging/index.js';

/**
 * 缓存工厂（单例）
 * 
 * 根据环境变量自动选择缓存实现：
 * - CACHE_TYPE=memory → MemoryCacheAdapter
 * - CACHE_TYPE=redis → RedisCacheAdapter（支持降级）
 * 
 * @example
 * import { cache } from '@/common/cache'
 * await cache.set('user:123', userData, 60)
 * const user = await cache.get<User>('user:123')
 */
export class CacheFactory {
  private static instance: CacheAdapter | null = null;
  private static fallbackToMemory = false;  // 降级标记

  /**
   * 获取缓存实例（单例）
   */
  static getInstance(): CacheAdapter {
    if (!this.instance) {
      this.instance = this.createCache();
    }
    return this.instance;
  }

  /**
   * 创建缓存实例
   */
  private static createCache(): CacheAdapter {
    const cacheType = config.cacheType || 'memory';

    logger.info('[CacheFactory] 初始化缓存系统', { cacheType });

    switch (cacheType) {
      case 'redis':
        return this.createRedisCache();
      case 'memory':
      default:
        return this.createMemoryCache();
    }
  }

  /**
   * 创建Redis缓存（带降级策略）
   */
  private static createRedisCache(): CacheAdapter {
    try {
      logger.info('[CacheFactory] 正在连接Redis...', {
        host: config.redisHost,
        port: config.redisPort,
        db: config.redisDb,
      });

      const redisCache = new RedisCacheAdapter({
        host: config.redisHost,
        port: config.redisPort,
        password: config.redisPassword,
        db: config.redisDb,
      });

      // 测试连接（同步等待）
      redisCache.ping().then((isConnected) => {
        if (isConnected) {
          logger.info('[CacheFactory] ✅ Redis连接成功');
        } else {
          logger.error('[CacheFactory] ❌ Redis连接失败，已降级到内存缓存');
          this.fallbackToMemory = true;
        }
      }).catch((error) => {
        logger.error('[CacheFactory] ❌ Redis连接异常，已降级到内存缓存', { error });
        this.fallbackToMemory = true;
      });

      return redisCache;
    } catch (error) {
      logger.error('[CacheFactory] ❌ Redis初始化失败，降级到内存缓存', { error });
      this.fallbackToMemory = true;
      return this.createMemoryCache();
    }
  }

  /**
   * 创建内存缓存
   */
  private static createMemoryCache(): MemoryCacheAdapter {
    logger.info('[CacheFactory] 使用内存缓存');
    return new MemoryCacheAdapter();
  }

  /**
   * 检查是否已降级到内存缓存
   */
  static isFallbackMode(): boolean {
    return this.fallbackToMemory;
  }
}

/**
 * 导出单例
 */
export const cache = CacheFactory.getInstance();

步骤2.3：更新env.ts配置

// 文件：backend/src/config/env.ts
// 确保Redis配置正确读取

export const config = {
  // ... 其他配置 ...

  /** 缓存类型 */
  cacheType: process.env.CACHE_TYPE || 'memory',

  /** Redis配置 */
  redisHost: process.env.REDIS_HOST || 'localhost',
  redisPort: parseInt(process.env.REDIS_PORT || '6379', 10),
  redisPassword: process.env.REDIS_PASSWORD || undefined,
  redisDb: parseInt(process.env.REDIS_DB || '0', 10),
  redisUrl: process.env.REDIS_URL || 'redis://localhost:6379',

  /** 队列类型 */
  queueType: process.env.QUEUE_TYPE || 'memory',

  // ... 其他配置 ...
};

// 验证配置
export function validateConfig() {
  console.log('✅ [Config] 环境变量加载成功');
  console.log('[Config] 应用配置:');
  console.log(`  - 缓存: ${config.cacheType}`);
  console.log(`  - 队列: ${config.queueType}`);
  
  if (config.cacheType === 'redis') {
    console.log(`  - Redis: ${config.redisHost}:${config.redisPort}/${config.redisDb}`);
  }
}

4.3 Phase 3：本地测试 ✅

步骤3.1：创建Redis测试脚本

// 文件：backend/src/scripts/test-redis.ts

import { cache } from '../common/cache/index.js';
import { logger } from '../common/logging/index.js';

async function testRedis() {
  console.log('\n🧪 开始测试Redis缓存...\n');

  try {
    // 测试1：基本读写
    console.log('📝 测试1：基本读写');
    await cache.set('test:hello', 'world', 10);
    const value1 = await cache.get('test:hello');
    console.log(`  ✅ 写入: "hello" → "world"`);
    console.log(`  ✅ 读取: "${value1}"`);
    console.assert(value1 === 'world', '值应该匹配');

    // 测试2：对象序列化
    console.log('\n📝 测试2：对象序列化');
    const obj = { id: 123, name: '测试', data: [1, 2, 3] };
    await cache.set('test:object', obj, 10);
    const value2 = await cache.get<typeof obj>('test:object');
    console.log(`  ✅ 写入对象:`, obj);
    console.log(`  ✅ 读取对象:`, value2);
    console.assert(value2?.id === 123, 'ID应该匹配');

    // 测试3：TTL过期
    console.log('\n📝 测试3：TTL过期（2秒）');
    await cache.set('test:expire', 'will-expire', 2);
    console.log(`  ✅ 写入（TTL=2秒）`);
    
    console.log(`  ⏳ 等待1秒...`);
    await sleep(1000);
    const value3a = await cache.get('test:expire');
    console.log(`  ✅ 1秒后读取: "${value3a}"`);
    console.assert(value3a === 'will-expire', '应该还存在');
    
    console.log(`  ⏳ 等待2秒...`);
    await sleep(2000);
    const value3b = await cache.get('test:expire');
    console.log(`  ✅ 3秒后读取: ${value3b}`);
    console.assert(value3b === null, '应该已过期');

    // 测试4：has和delete
    console.log('\n📝 测试4：has和delete');
    await cache.set('test:delete', 'to-be-deleted', 10);
    const exists1 = await cache.has('test:delete');
    console.log(`  ✅ 写入后exists: ${exists1}`);
    
    await cache.delete('test:delete');
    const exists2 = await cache.has('test:delete');
    console.log(`  ✅ 删除后exists: ${exists2}`);
    console.assert(!exists2, '应该不存在');

    // 测试5：大对象（50KB）
    console.log('\n📝 测试5：大对象缓存（模拟LLM结果）');
    const bigObj = {
      literatureId: 'xxx',
      fields: {
        研究类型: 'RCT',
        样本量: '500',
        干预措施: '药物A 100mg',
        // ... 12个字段
      },
      metadata: {
        model: 'deepseek-v3',
        tokens: 8000,
        timestamp: Date.now(),
      },
      rawOutput: 'x'.repeat(40000),  // 模拟大输出
    };
    
    const start = Date.now();
    await cache.set('test:bigobj', bigObj, 3600);
    const value5 = await cache.get('test:bigobj');
    const duration = Date.now() - start;
    
    const size = JSON.stringify(bigObj).length;
    console.log(`  ✅ 写入+读取大对象: ${size} bytes，耗时 ${duration}ms`);
    console.assert(value5 !== null, '应该能读取');

    console.log('\n✅ 所有测试通过！\n');
    process.exit(0);
  } catch (error) {
    console.error('\n❌ 测试失败:', error);
    process.exit(1);
  }
}

function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

// 运行测试
testRedis();

步骤3.2：执行测试

cd backend

# 1. 启动后端（确保Redis配置生效）
npm run dev

# 2. 新开一个终端，运行测试脚本
npx tsx src/scripts/test-redis.ts

预期输出：

🧪 开始测试Redis缓存...

📝 测试1：基本读写
  ✅ 写入: "hello" → "world"
  ✅ 读取: "world"

📝 测试2：对象序列化
  ✅ 写入对象: { id: 123, name: '测试', data: [ 1, 2, 3 ] }
  ✅ 读取对象: { id: 123, name: '测试', data: [ 1, 2, 3 ] }

📝 测试3：TTL过期（2秒）
  ✅ 写入（TTL=2秒）
  ⏳ 等待1秒...
  ✅ 1秒后读取: "will-expire"
  ⏳ 等待2秒...
  ✅ 3秒后读取: null

📝 测试4：has和delete
  ✅ 写入后exists: true
  ✅ 删除后exists: false

📝 测试5：大对象缓存（模拟LLM结果）
  ✅ 写入+读取大对象: 40123 bytes，耗时 5ms

✅ 所有测试通过！

步骤3.3：测试业务代码

# 1. 测试HealthCheckService（DC模块）
# 上传一个Excel文件，查看日志：

[HealthCheck] Cache miss, processing file
[HealthCheck] Check completed
[HealthCheck] Cache SET: health:xxx, TTL=86400

# 第二次上传同一个文件
[HealthCheck] Cache hit  ← 成功从Redis读取

# 2. 测试LLM12FieldsService（ASL模块）
# 提交全文复筛任务，查看日志：

[LLM12FieldsService] 调用LLM提取12字段
[LLM12FieldsService] Result cached with key: fulltext:xxx
[LLM12FieldsService] 缓存写入成功

# 重新运行同一篇PDF
[LLM12FieldsService] Cache hit, returning cached result  ← 节省API费用！

4.4 Phase 4：阿里云Redis配置 ✅

步骤4.1：购买Redis实例

登录阿里云控制台
进入 云数据库 Redis 产品页
点击 创建实例
按照截图配置选择：
- 产品：Redis 开源版
- 部署模式：云原生（高可用）
- 地域：华北2（北京）—— 与SAE同地域！
- 版本：Redis 7.0
- 分片规格：256 MB
- 付费方式：包年包月
提交订单并支付

步骤4.2：配置白名单

阿里云控制台 → Redis实例 → 白名单设置

添加：
1. 本地开发IP（用于本地测试）
   - 你的公网IP/32

2. SAE应用IP（生产环境）
   - 0.0.0.0/0 （临时，后续改为SAE VPC）
   或
   - SAE实例的VPC网段

步骤4.3：获取连接信息

阿里云控制台 → Redis实例 → 连接信息

复制以下信息：
- 连接地址：r-xxxxxxxxxxxx.redis.rds.aliyuncs.com
- 端口：6379
- 实例ID：r-xxxxxxxxxxxx
- 密码：点击"修改密码"设置

步骤4.4：更新SAE环境变量

阿里云控制台 → SAE应用 → 配置管理 → 环境变量

添加：
CACHE_TYPE=redis
REDIS_HOST=r-xxxxxxxxxxxx.redis.rds.aliyuncs.com
REDIS_PORT=6379
REDIS_PASSWORD=你设置的密码
REDIS_DB=0

4.5 Phase 5：启用Redis队列 🔴 必须实施

重要变更：经分析，Redis队列对ASL和DC Tool B模块是必须的，不是可选的！
理由：2小时长任务在SAE环境下不用Redis队列失败率 > 95%

步骤5.1：安装BullMQ

cd backend
npm install bullmq --save

# BullMQ已在package.json中，只需确认安装
npm list bullmq
# 应该显示：bullmq@5.65.0

步骤5.2：实现RedisQueue.ts

// 文件：backend/src/common/jobs/RedisQueue.ts

import { Queue, Worker, Job as BullJob, QueueEvents } from 'bullmq';
import type { Job, JobQueue, JobHandler } from './types.js';
import { logger } from '../logging/index.js';
import { config } from '../../config/env.js';

/**
 * Redis队列实现（基于BullMQ）
 * 
 * 核心功能：
 * - 任务持久化（实例重启不丢失）
 * - 自动重试（失败后指数退避）
 * - 分布式任务分配（多实例协调）
 * - 进度跟踪
 */
export class RedisQueue implements JobQueue {
  private queues: Map<string, Queue> = new Map();
  private workers: Map<string, Worker> = new Map();
  private queueEvents: Map<string, QueueEvents> = new Map();
  
  private connection = {
    host: config.redisHost,
    port: config.redisPort,
    password: config.redisPassword,
    db: config.redisDb,
  };

  /**
   * 推送任务到队列
   */
  async push<T = any>(type: string, data: T, options?: any): Promise<Job> {
    try {
      // 获取或创建队列
      let queue = this.queues.get(type);
      if (!queue) {
        queue = new Queue(type, { 
          connection: this.connection,
          defaultJobOptions: {
            removeOnComplete: 100,  // 保留最近100个完成任务
            removeOnFail: false,    // 失败任务不删除（便于排查）
            attempts: 3,            // 失败重试3次
            backoff: {
              type: 'exponential',
              delay: 2000,          // 2秒、4秒、8秒
            },
          }
        });
        this.queues.set(type, queue);
      }

      // 添加任务
      const job = await queue.add(type, data, {
        ...options,
        jobId: options?.jobId,  // 支持自定义jobId
      });

      logger.info(`[RedisQueue] 任务入队成功`, { 
        type, 
        jobId: job.id,
        dataSize: JSON.stringify(data).length 
      });

      return {
        id: job.id!,
        type,
        data,
        status: 'pending',
        createdAt: new Date(),
      };
    } catch (error) {
      logger.error(`[RedisQueue] 任务入队失败`, { type, error });
      throw error;
    }
  }

  /**
   * 注册任务处理器
   */
  process<T = any>(type: string, handler: JobHandler<T>): void {
    try {
      // 创建Worker
      const worker = new Worker(
        type,
        async (job: BullJob) => {
          logger.info(`[RedisQueue] 开始处理任务`, { 
            type, 
            jobId: job.id,
            attemptsMade: job.attemptsMade,
            attemptsTotal: job.opts.attempts
          });
          
          const startTime = Date.now();
          
          try {
            // 调用业务处理函数
            const result = await handler({
              id: job.id!,
              type,
              data: job.data as T,
              status: 'processing',
              createdAt: new Date(job.timestamp),
            });

            const duration = Date.now() - startTime;
            logger.info(`[RedisQueue] 任务处理成功`, { 
              type, 
              jobId: job.id,
              duration: `${duration}ms`
            });
            
            return result;
          } catch (error) {
            const duration = Date.now() - startTime;
            logger.error(`[RedisQueue] 任务处理失败`, { 
              type, 
              jobId: job.id,
              attemptsMade: job.attemptsMade,
              duration: `${duration}ms`,
              error: error instanceof Error ? error.message : 'Unknown error'
            });
            throw error;  // 抛出错误，触发重试
          }
        },
        { 
          connection: this.connection,
          concurrency: 1,  // 每个Worker并发处理1个任务
        }
      );

      this.workers.set(type, worker);

      // 监听Worker事件
      worker.on('completed', (job) => {
        logger.info(`[RedisQueue] ✅ 任务完成`, { 
          type, 
          jobId: job.id,
          returnvalue: job.returnvalue 
        });
      });

      worker.on('failed', (job, err) => {
        logger.error(`[RedisQueue] ❌ 任务失败`, { 
          type, 
          jobId: job?.id,
          attemptsMade: job?.attemptsMade,
          error: err.message,
          stack: err.stack
        });
      });

      worker.on('error', (err) => {
        logger.error(`[RedisQueue] Worker错误`, { type, error: err.message });
      });

      logger.info(`[RedisQueue] Worker已注册`, { type });

    } catch (error) {
      logger.error(`[RedisQueue] Worker注册失败`, { type, error });
      throw error;
    }
  }

  /**
   * 获取任务状态
   */
  async getJob(id: string): Promise<Job | null> {
    try {
      // 遍历所有队列查找任务
      for (const [type, queue] of this.queues) {
        const job = await queue.getJob(id);
        if (job) {
          return {
            id: job.id!,
            type,
            data: job.data,
            status: await this.getJobStatus(job),
            progress: job.progress as number || 0,
            createdAt: new Date(job.timestamp),
            error: job.failedReason,
          };
        }
      }
      return null;
    } catch (error) {
      logger.error(`[RedisQueue] getJob失败`, { id, error });
      return null;
    }
  }

  /**
   * 获取任务状态
   */
  private async getJobStatus(job: BullJob): Promise<string> {
    const state = await job.getState();
    switch (state) {
      case 'completed': return 'completed';
      case 'failed': return 'failed';
      case 'active': return 'processing';
      case 'waiting': return 'pending';
      case 'delayed': return 'pending';
      default: return 'pending';
    }
  }

  /**
   * 更新任务进度
   */
  async updateProgress(id: string, progress: number, message?: string): Promise<void> {
    try {
      for (const queue of this.queues.values()) {
        const job = await queue.getJob(id);
        if (job) {
          await job.updateProgress(progress);
          if (message) {
            await job.log(message);
          }
          logger.debug(`[RedisQueue] 进度更新`, { id, progress, message });
          return;
        }
      }
      logger.warn(`[RedisQueue] 任务不存在，无法更新进度`, { id });
    } catch (error) {
      logger.error(`[RedisQueue] 更新进度失败`, { id, error });
    }
  }

  /**
   * 取消任务
   */
  async cancelJob(id: string): Promise<boolean> {
    try {
      for (const queue of this.queues.values()) {
        const job = await queue.getJob(id);
        if (job) {
          await job.remove();
          logger.info(`[RedisQueue] 任务已取消`, { id });
          return true;
        }
      }
      logger.warn(`[RedisQueue] 任务不存在，无法取消`, { id });
      return false;
    } catch (error) {
      logger.error(`[RedisQueue] 取消任务失败`, { id, error });
      return false;
    }
  }

  /**
   * 重试失败任务
   */
  async retryJob(id: string): Promise<boolean> {
    try {
      for (const queue of this.queues.values()) {
        const job = await queue.getJob(id);
        if (job) {
          await job.retry();
          logger.info(`[RedisQueue] 任务已重试`, { id });
          return true;
        }
      }
      logger.warn(`[RedisQueue] 任务不存在，无法重试`, { id });
      return false;
    } catch (error) {
      logger.error(`[RedisQueue] 重试任务失败`, { id, error });
      return false;
    }
  }

  /**
   * 清理旧任务
   */
  async cleanup(olderThan: number = 86400000): Promise<number> {
    try {
      let totalCleaned = 0;
      
      for (const [type, queue] of this.queues) {
        // 清理完成的任务（保留最近100个）
        const completed = await queue.clean(olderThan, 100, 'completed');
        // 清理失败的任务（保留最近50个）
        const failed = await queue.clean(olderThan, 50, 'failed');
        
        const cleaned = completed.length + failed.length;
        totalCleaned += cleaned;
        
        if (cleaned > 0) {
          logger.info(`[RedisQueue] 队列清理完成`, { 
            type, 
            completed: completed.length,
            failed: failed.length
          });
        }
      }
      
      return totalCleaned;
    } catch (error) {
      logger.error(`[RedisQueue] 清理任务失败`, { error });
      return 0;
    }
  }

  /**
   * 关闭所有连接（优雅关闭）
   */
  async close(): Promise<void> {
    try {
      // 关闭所有Workers
      for (const [type, worker] of this.workers) {
        await worker.close();
        logger.info(`[RedisQueue] Worker已关闭`, { type });
      }
      
      // 关闭所有Queues
      for (const [type, queue] of this.queues) {
        await queue.close();
        logger.info(`[RedisQueue] Queue已关闭`, { type });
      }
      
      // 关闭所有QueueEvents
      for (const [type, events] of this.queueEvents) {
        await events.close();
        logger.info(`[RedisQueue] QueueEvents已关闭`, { type });
      }
      
      logger.info(`[RedisQueue] 所有连接已关闭`);
    } catch (error) {
      logger.error(`[RedisQueue] 关闭连接失败`, { error });
    }
  }
}

步骤5.3：更新JobFactory支持Redis队列

// 文件：backend/src/common/jobs/JobFactory.ts

import { JobQueue } from './types.js'
import { MemoryQueue } from './MemoryQueue.js'
import { RedisQueue } from './RedisQueue.js'  // ← 新增
import { logger } from '../logging/index.js'
import { config } from '../../config/env.js'

export class JobFactory {
  private static instance: JobQueue | null = null

  static getInstance(): JobQueue {
    if (!this.instance) {
      this.instance = this.createQueue()
    }
    return this.instance
  }

  private static createQueue(): JobQueue {
    const queueType = config.queueType || 'memory'

    logger.info('[JobFactory] 初始化任务队列', { queueType });

    switch (queueType) {
      case 'redis':  // ← 新增
        return this.createRedisQueue()
      
      case 'memory':
        return this.createMemoryQueue()
      
      default:
        logger.warn(`[JobFactory] Unknown QUEUE_TYPE: ${queueType}, fallback to memory`)
        return this.createMemoryQueue()
    }
  }

  /**
   * 创建Redis队列（带降级策略）
   */
  private static createRedisQueue(): JobQueue {
    try {
      logger.info('[JobFactory] 正在连接Redis队列...');
      
      const redisQueue = new RedisQueue();
      
      logger.info('[JobFactory] ✅ Redis队列初始化成功');
      return redisQueue;
      
    } catch (error) {
      logger.error('[JobFactory] ❌ Redis队列初始化失败，降级到内存队列', { error });
      return this.createMemoryQueue();
    }
  }

  private static createMemoryQueue(): MemoryQueue {
    logger.info('[JobFactory] 使用内存队列')
    
    const queue = new MemoryQueue()
    
    // 定期清理（避免内存泄漏）
    if (process.env.NODE_ENV !== 'test') {
      setInterval(() => {
        queue.cleanup()
      }, 60 * 60 * 1000)
    }
    
    return queue
  }

  static reset(): void {
    this.instance = null
  }
}

步骤5.4：修改业务代码使用队列

// 示例：ASL文献筛选改造
// 文件：backend/src/modules/asl/services/screeningService.ts

import { jobQueue } from '../../../common/jobs/index.js';

export async function startScreeningTask(projectId: string, userId: string) {
  // 1. 创建数据库任务记录
  const task = await prisma.aslScreeningTask.create({
    data: {
      projectId,
      status: 'pending',
      totalItems: literatures.length,
      // ...
    }
  });

  // 2. 推送到Redis队列（不阻塞请求）
  await jobQueue.push('asl:title-screening', {
    taskId: task.id,
    projectId,
    literatureIds: literatures.map(lit => lit.id),
  });

  logger.info('任务已入队', { taskId: task.id });

  // 3. 立即返回（前端轮询进度）
  return task;
}

// 注册Worker（在应用启动时）
// 文件：backend/src/index.ts
jobQueue.process('asl:title-screening', async (job) => {
  const { taskId, projectId, literatureIds } = job.data;
  
  logger.info('开始处理筛选任务', { taskId, total: literatureIds.length });
  
  for (let i = 0; i < literatureIds.length; i++) {
    const literatureId = literatureIds[i];
    
    // 处理单篇文献
    await processSingleLiterature(literatureId, projectId);
    
    // 更新进度
    const progress = ((i + 1) / literatureIds.length) * 100;
    await jobQueue.updateProgress(job.id, progress);
    
    // 更新数据库
    await prisma.aslScreeningTask.update({
      where: { id: taskId },
      data: { processedItems: i + 1 }
    });
  }
  
  // 标记完成
  await prisma.aslScreeningTask.update({
    where: { id: taskId },
    data: { status: 'completed', completedAt: new Date() }
  });
  
  logger.info('筛选任务完成', { taskId });
  
  return { success: true, processed: literatureIds.length };
});

步骤5.5：更新.env配置

# backend/.env

# ==================== 任务队列配置 ====================
QUEUE_TYPE=redis  # ← 改为redis

# Redis配置（与缓存共用）
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=
REDIS_DB=0

步骤5.6：测试Redis队列

# 1. 启动后端
cd backend
npm run dev

# 应该看到日志：
# [JobFactory] 正在连接Redis队列...
# [JobFactory] ✅ Redis队列初始化成功

# 2. 提交测试任务（使用REST Client或Postman）
POST http://localhost:3001/api/v1/asl/projects/:projectId/screening

# 3. 观察日志
# [RedisQueue] 任务入队成功 { type: 'asl:title-screening', jobId: '1' }
# [RedisQueue] 开始处理任务 { type: 'asl:title-screening', jobId: '1' }
# [RedisQueue] ✅ 任务完成 { type: 'asl:title-screening', jobId: '1' }

# 4. 测试实例重启恢复
# 提交任务 → 等待处理到50% → Ctrl+C停止 → 重新启动
# 任务应该自动从Redis恢复并继续处理

5. 测试方案

5.1 单元测试清单

测试项	预期结果	实际结果	状态
Redis连接测试	PONG	✅	⬜ 待测
基本读写测试	值匹配	✅	⬜ 待测
对象序列化测试	对象完整	✅	⬜ 待测
TTL过期测试	2秒后为null	✅	⬜ 待测
大对象测试（50KB）	<10ms	✅	⬜ 待测
降级策略测试	自动切换内存	✅	⬜ 待测

5.2 集成测试清单

测试项	操作步骤	预期结果	状态
HealthCheckService	上传Excel 2次	第2次缓存命中	⬜ 待测
LLM12FieldsService	提取同一PDF 2次	第2次缓存命中	⬜ 待测
多实例缓存共享	启动2个后端实例，实例A写入，实例B读取	B能读到A的缓存	⬜ 待测
实例重启数据持久化	写入缓存 → 重启后端 → 读取	能读取到	⬜ 待测

5.3 压力测试

# 使用ab或wrk测试并发读写

# 测试1：并发写入
ab -n 1000 -c 10 http://localhost:3001/api/test/cache-write

# 测试2：并发读取
ab -n 10000 -c 50 http://localhost:3001/api/test/cache-read

# 预期结果：
# - QPS > 1000
# - 响应时间 < 50ms
# - 错误率 = 0%

5.4 故障模拟测试

故障场景	模拟方法	预期行为	状态
Redis突然挂掉	`docker stop ai-clinical-redis`	系统降级到内存缓存，应用继续运行	⬜ 待测
Redis网络延迟	`tc qdisc add dev eth0 root netem delay 500ms`	超时重试，最终返回null	⬜ 待测
Redis内存满	写入大量数据至256MB	触发LRU驱逐，不影响新写入	⬜ 待测
Redis密码错误	修改密码	连接失败，降级到内存缓存	⬜ 待测

6. 风险评估与缓解

6.1 风险矩阵

风险	严重性	概率	影响	缓解措施	状态
Redis连接失败	🔴 高	🟡 中	系统不可用	✅ 降级策略	✅ 已实现
数据丢失	🟡 中	🟢 低	缓存失效	✅ 关键数据双写DB	⏳ 待实现
内存溢出（OOM）	🔴 高	🟡 中	Redis崩溃	✅ 严格TTL + 监控	⏳ 待实现
网络延迟	🟢 低	🟢 低	响应变慢	✅ 批量操作	⏳ 可选优化
配置错误	🟡 中	🟡 中	启动失败	✅ 配置验证	✅ 已实现
密码泄露	🔴 高	🟡 中	数据泄露	✅ KMS管理	⏳ 待实现

6.2 关键缓解措施

缓解措施1：降级策略（必须） ✅

// 已在CacheFactory中实现
// Redis不可用时自动切换到MemoryCache

缓解措施2：关键数据双写（推荐） ⏳

// 需要在业务代码中添加

// 示例：任务进度双写
export async function updateTaskProgress(taskId: string, progress: number) {
  // 1. 写Redis（快速查询）
  await cache.set(`task:${taskId}:progress`, progress, 3600);
  
  // 2. 同时写DB（持久化）
  await prisma.aslScreeningTask.update({
    where: { id: taskId },
    data: { processedItems: progress }
  });
}

缓解措施3：内存监控（推荐） ⏳

// 创建：backend/src/scripts/monitor-redis.ts

import { cache } from '../common/cache/index.js';

setInterval(async () => {
  // 检查内存使用
  const info = await redis.info('memory');
  const used = parseInt(info.match(/used_memory:(\d+)/)[1]);
  const max = 256 * 1024 * 1024;
  
  if (used > max * 0.8) {
    logger.warn('⚠️ Redis内存使用超过80%', { used, max });
    // TODO: 发送钉钉/邮件告警
  }
}, 60000);

缓解措施4：配置验证（已实现） ✅

// env.ts中的validateConfig()
// 启动时检查Redis配置

7. 上线计划

7.1 上线时间表（V2.0更新）

阶段	时间	任务	负责人	状态
Phase 1	Day 1上午	本地开发环境准备	开发	⬜ 待开始
Phase 2	Day 1下午	实现RedisCacheAdapter	开发	⬜ 待开始
Phase 3	Day 2全天	Redis缓存本地测试	开发+测试	⬜ 待开始
Phase 4	Day 3上午	阿里云Redis购买&配置	运维	⬜ 待开始
Phase 5	Day 3下午-Day 5	🔴 实现RedisQueue（必须）	开发	⬜ 待开始
Phase 6	Day 6全天	Redis队列本地测试 + 业务集成	开发+测试	⬜ 待开始
Phase 7	Day 7上午	SAE测试环境验证	开发+测试	⬜ 待开始
Phase 8	Day 7下午	生产环境上线	全员	⬜ 待开始
Phase 9	Day 7晚+	监控观察（24小时）	运维	⬜ 待开始

总工作量：7天（比原计划增加4天，但确保核心功能可用）

7.2 上线步骤（生产环境）

Step 1：发布前检查（15分钟）

✅ 代码已提交Git
✅ 本地测试全部通过
✅ 阿里云Redis已就绪
✅ SAE环境变量已配置
✅ 回滚方案已准备
✅ 监控已就绪

Step 2：灰度发布（30分钟）

1. SAE控制台 → 选择应用
2. 应用部署 → 分批发布
3. 第1批：10%实例（观察15分钟）
4. 第2批：50%实例（观察10分钟）
5. 第3批：100%实例

Step 3：验证（15分钟）

# 1. 检查Redis连接
curl https://your-api.com/api/health
# 应该返回：{ cache: "redis", status: "ok" }

# 2. 测试缓存写入
# 上传Excel → 查看日志 → 确认Redis写入

# 3. 测试缓存读取
# 再次上传 → 查看日志 → 确认缓存命中

# 4. 检查Redis内存
阿里云控制台 → Redis监控 → 内存使用

Step 4：监控观察（24小时）

关注指标：
- Redis连接数（应该稳定）
- 内存使用率（应该 < 50%）
- 缓存命中率（应该 > 80%）
- 应用错误日志（应该无Redis相关错误）
- API成本（应该下降）

8. 回滚方案

8.1 快速回滚（5分钟内）

场景1：Redis连接失败，应用无法启动

# 方法1：修改SAE环境变量
阿里云控制台 → SAE应用 → 配置管理 → 环境变量
修改：CACHE_TYPE=memory
保存 → 应用重启

# 方法2：重新部署上一个版本
SAE控制台 → 应用部署 → 版本管理 → 回滚

场景2：Redis性能问题，响应变慢

# 临时降级到内存缓存
CACHE_TYPE=memory

# 或保留Redis但检查网络
ping r-xxxxxxxxxxxx.redis.rds.aliyuncs.com

场景3：Redis内存满，无法写入

# 方法1：清理Redis（危险！）
redis-cli -h r-xxx.redis.rds.aliyuncs.com -a password
> FLUSHDB

# 方法2：升级Redis规格
阿里云控制台 → Redis实例 → 变配 → 512MB

8.2 回滚检查清单

✅ 应用能否正常启动？
✅ 缓存是否工作（内存模式）？
✅ API响应是否正常？
✅ 错误日志是否清除？
✅ 用户是否能正常使用？

9. 监控与运维

9.1 监控指标

Redis指标（阿里云控制台）

指标	正常范围	告警阈值	处理方案
内存使用率	< 50%	> 80%	检查大key，考虑升配
连接数	< 50	> 100	检查连接泄漏
QPS	< 1000	> 5000	考虑分片
命中率	> 80%	< 50%	检查缓存策略
响应时间	< 5ms	> 50ms	检查网络

应用指标（SAE日志）

# 查找Redis相关错误
grep "Redis" logs/app.log | grep "ERROR"

# 查找缓存命中情况
grep "Cache hit" logs/app.log | wc -l
grep "Cache miss" logs/app.log | wc -l

# 计算命中率
命中率 = hits / (hits + misses) * 100%

9.2 运维命令

常用Redis CLI命令

# 连接Redis
redis-cli -h r-xxx.redis.rds.aliyuncs.com -p 6379 -a your_password

# 查看所有key
KEYS *

# 查看任务相关key
KEYS task:*

# 查看缓存相关key
KEYS fulltext:*

# 查看key的TTL
TTL task:abc123:progress

# 查看key的值
GET task:abc123:progress

# 查看内存使用
INFO memory

# 查看连接数
INFO clients

# 实时监控命令
MONITOR

# 清空数据库（危险！）
FLUSHDB

内存分析

# 查找大key（>10KB）
redis-cli -h xxx -a password --bigkeys

# 查看key的内存占用
MEMORY USAGE task:abc123:progress

9.3 故障排查流程

问题：Redis连接失败
  ↓
1. 检查Redis实例状态（阿里云控制台）
  ↓
2. 检查白名单配置（是否包含SAE IP）
  ↓
3. 检查密码是否正确
  ↓
4. ping测试网络连通性
  ↓
5. 查看应用日志
  ↓
6. 如无法快速解决 → 降级到内存缓存

9.4 定期维护

维度	频率	内容
日常监控	每天	查看内存使用、连接数、错误日志
性能分析	每周	分析缓存命中率、响应时间
容量评估	每月	评估256MB是否够用，是否需要升配
安全检查	每月	检查白名单、密码强度、访问日志

10. 成功标准

10.1 技术指标（V2.0更新）

指标	当前值（内存）	目标值（Redis）	衡量方法
缓存持久化	❌ 实例重启丢失	✅ 持久化保存	重启后仍能读取
多实例共享	❌ 各实例独立	✅ 全局共享	实例A写，实例B能读
LLM API重复调用	🔴 高	🟢 低	同一PDF只调用1次
缓存命中率	N/A	> 60%	监控日志统计
任务持久化	❌ 实例销毁丢失	✅ 任务继续	🔴 新增
长任务成功率	10-30%	> 99%	🔴 关键指标
系统可用性	99%	99.9%	降级策略保障

10.2 业务指标（V2.0更新）

指标	改造前	改造后	衡量方法
LLM API成本	¥X/月	降低40-60%	对比账单
任务丢失率	🔴 70-95%	< 1%	🔴 最重要
任务完成时间	不确定	稳定	监控日志
用户重复提交次数	平均3次	几乎为0	用户行为分析
用户满意度	基线	显著提升	问卷调查

10.3 验收标准（V2.0更新）

Redis缓存验收

✅ 所有缓存单元测试通过
✅ HealthCheckService缓存命中测试通过
✅ LLM12FieldsService缓存命中测试通过
✅ 实例重启后缓存仍存在
✅ 缓存命中率 > 60%
✅ LLM API调用次数下降 > 40%

Redis队列验收 🔴 新增关键项

✅ Redis队列单元测试通过
✅ 长任务（2小时）测试通过
✅ 实例重启后任务自动恢复
✅ 任务失败自动重试（3次）
✅ 1000篇文献筛选成功率 > 99%
✅ 无用户投诉任务丢失
✅ 进度实时更新正常

故障恢复测试 🔴 重要

✅ 模拟实例销毁 → 任务自动恢复
✅ 模拟Redis宕机 → 系统降级运行
✅ 模拟网络延迟 → 任务正常完成
✅ 模拟并发任务 → 正确分配处理

生产环境验收

✅ 生产环境运行48小时无错误
✅ 2个完整的1000篇文献筛选任务成功
✅ 监控指标正常（内存、连接数、QPS）
✅ 无用户投诉

11. FAQ

Q1：如果Redis在半夜突然挂了怎么办？

A1：系统会自动降级到内存缓存，应用继续运行。第二天运维检查并恢复Redis。

Q2：256MB够用吗？什么时候需要升配？

A2：

当前预估：< 50% 使用率
触发升配信号：
- 内存使用 > 80%
- 频繁触发LRU驱逐
- 监控告警
升配方式：阿里云控制台一键升级，无需重启

Q3：Redis会影响系统性能吗？

A3：

本地内存：< 0.1ms
本地Redis：~1ms
阿里云Redis（同地域）：~2-5ms
影响可忽略，且有批量操作优化

Q4：如果发现Redis不适合，能回退到内存缓存吗？

A4：可以！修改 CACHE_TYPE=memory 即可，代码支持热切换。

Q5：需要学习Redis命令吗？

A5：

开发：不需要，代码已封装
运维：建议学习5个基础命令（GET/SET/KEYS/TTL/INFO）

12. 相关文档

13. 附录

附录A：完整的.env配置模板

# ==================== 数据库 ====================
DATABASE_URL=postgresql://postgres:password@localhost:5432/ai_clinical_research

# ==================== Redis配置 ====================
CACHE_TYPE=redis
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=
REDIS_DB=0
# 或使用连接字符串
# REDIS_URL=redis://localhost:6379

# ==================== 队列配置 ====================
QUEUE_TYPE=memory  # 第一阶段用memory，第二阶段改为redis

# ==================== JWT ====================
JWT_SECRET=your-secret-key-change-in-production
JWT_EXPIRES_IN=7d

# ==================== LLM API ====================
DEEPSEEK_API_KEY=sk-xxxxxxxxxxxxxx
DASHSCOPE_API_KEY=sk-xxxxxxxxxxxxxx
CLOSEAI_API_KEY=sk-xxxxxxxxxxxxxx
CLOSEAI_OPENAI_BASE_URL=https://api.openai-proxy.org/v1
CLOSEAI_CLAUDE_BASE_URL=https://api.openai-proxy.org/anthropic

# ==================== Dify ====================
DIFY_API_URL=http://localhost/v1
DIFY_API_KEY=dataset-xxxxxxxxxxxxxx

# ==================== Server ====================
PORT=3001
NODE_ENV=development

# ==================== 存储配置 ====================
STORAGE_TYPE=local
LOCAL_STORAGE_DIR=uploads
LOCAL_STORAGE_URL=http://localhost:3001/uploads

# ==================== CORS配置 ====================
CORS_ORIGIN=http://localhost:5173

# ==================== 日志配置 ====================
LOG_LEVEL=debug

附录B：Redis内存计算器

单个LLM结果缓存：~50KB
单个健康检查缓存：~5KB

预估容量：
- 1000个LLM结果 = 50MB
- 1000个健康检查 = 5MB
- 系统开销 = 20MB
-----------------------------
总计 = 75MB / 256MB = 29% 使用率

附录C：故障演练脚本

#!/bin/bash
# 文件：backend/scripts/disaster-recovery-drill.sh

echo "🚨 Redis故障演练开始..."

# 1. 停止Redis
echo "1. 停止Redis..."
docker stop ai-clinical-redis
sleep 2

# 2. 测试应用是否正常
echo "2. 测试应用健康检查..."
response=$(curl -s http://localhost:3001/api/health)
echo "响应: $response"

if [[ $response == *"memory"* ]]; then
  echo "✅ 降级成功，使用内存缓存"
else
  echo "❌ 降级失败"
  exit 1
fi

# 3. 恢复Redis
echo "3. 恢复Redis..."
docker start ai-clinical-redis
sleep 5

# 4. 测试Redis恢复
echo "4. 测试Redis恢复..."
response=$(curl -s http://localhost:3001/api/health)
echo "响应: $response"

if [[ $response == *"redis"* ]]; then
  echo "✅ Redis恢复成功"
else
  echo "⚠️ Redis未恢复，仍使用内存缓存"
fi

echo "🎉 故障演练完成！"

文档维护者： 技术团队
最后更新： 2025-12-12
文档状态： ✅ 待审核
下次更新： 改造完成后总结经验教训

✅ 改造完成检查清单

在完成Redis改造后，请逐项检查：

代码层面

ioredis 已安装
RedisCacheAdapter 已实现
CacheFactory 已添加降级逻辑
.env 配置已更新
所有使用 cache.set() 的地方都设置了TTL

测试层面

单元测试全部通过
集成测试全部通过
压力测试达标
故障模拟测试通过

部署层面

阿里云Redis已购买
白名单已配置
SAE环境变量已配置
生产环境已验证

文档层面

改造文档已更新
运维文档已补充
监控指标已记录
经验教训已总结

祝改造顺利！如有问题，请及时沟通。 🚀

52 KiB Raw Blame History Unescape Escape