feat(asl): Add Deep Research V2.0 development plan and Unifuncs API site coverage testing

Completed: - Unifuncs DeepSearch API site coverage test (18 medical sites, 9 tier-1 available) - ClinicalTrials.gov dedicated test (4 strategies, English query + depth>=10 works best) - Deep Research V2.0 development plan (5-day phased delivery) - DeepResearch engine capability guide (docs/02-common-capability/) - Test scripts: test-unifuncs-site-coverage.ts, test-unifuncs-clinicaltrials.ts Key findings: - Tier-1 sites: PubMed(28), ClinicalTrials(38), NCBI(18), Scholar(10), Cochrane(4), CNKI(7), SinoMed(9), GeenMedical(5), VIP(1) - Paid databases (WoS/Embase/Scopus/Ovid) cannot be accessed (no credential support) - ClinicalTrials.gov requires English queries with max_depth>=10 Updated: ASL module status doc, system status doc, common capability list Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-22 22:44:41 +08:00
parent 3446909ff7
commit b06daecacd
12 changed files with 2662 additions and 27 deletions
--- a/docs/02-通用能力层/04-DeepResearch引擎/01-Unifuncs
+++ b/docs/02-通用能力层/04-DeepResearch引擎/01-Unifuncs
@@ -0,0 +1,295 @@
+# Unifuncs DeepSearch API 使用指南
+
+> **文档版本：** v1.0  
+> **创建日期：** 2026-02-22  
+> **维护者：** 开发团队  
+> **文档目的：** 指导业务模块正确使用 Unifuncs DeepSearch API，明确可用网站与最佳策略
+
+---
+
+## 1. 概述
+
+Unifuncs DeepSearch 是一个 AI 驱动的深度搜索引擎，可以在指定的网站范围内自动搜索、阅读和汇总信息。在本平台中，它作为**通用能力层**的一部分，为文献检索、临床试验查找等场景提供底层搜索能力。
+
+### 核心能力
+- 自然语言输入 → AI 自动生成搜索策略
+- 多轮迭代搜索（最大深度可配置）
+- 自动阅读网页内容并提取关键信息
+- 返回结构化结果 + 综合报告
+
+### API 基础信息
+
+| 项目 | 值 |
+|------|---|
+| 基础 URL | `https://api.unifuncs.com/deepsearch/v1` |
+| 模型 | `s2` |
+| 认证 | `Authorization: Bearer {UNIFUNCS_API_KEY}` |
+| 环境变量 | `UNIFUNCS_API_KEY`（已配置在 `backend/.env`） |
+
+---
+
+## 2. 网站覆盖能力（2026-02-22 实测）
+
+### 2.1 测试条件
+
+- **查询**：他汀类药物预防心血管疾病的随机对照试验和Meta分析，近5年高质量研究
+- **配置**：max_depth=5，异步模式（create_task + query_task）
+- **ClinicalTrials.gov 专项**：4 种策略对比测试，max_depth=5~15
+
+### 2.2 可用性分级
+
+#### 一级：确认可搜索（返回站内直接链接）
+
+| 站点 | 域名 | 类型 | 站内链接数 | 搜索/阅读 | 最佳策略 |
+|------|------|------|-----------|-----------|---------|
+| **PubMed** | pubmed.ncbi.nlm.nih.gov | 英文 | 28 | 9/29 | 中/英文查询均可，效果最佳 |
+| **NCBI/PMC** | www.ncbi.nlm.nih.gov | 英文 | 18 | 24/19 | 含 PMC 全文链接 |
+| **ClinicalTrials.gov** | clinicaltrials.gov | 英文 | 38 | 6/24 | **必须英文查询**，max_depth≥10 |
+| **Google Scholar** | scholar.google.com | 英文 | 10 | 22/26 | 跨库聚合搜索 |
+| **CBM/SinoMed** | www.sinomed.ac.cn | 中文 | 9 | 17/12 | 中文生物医学文献数据库 |
+| **中国知网 CNKI** | www.cnki.net | 中文 | 7 | 40/6 | 中文核心期刊 |
+| **GeenMedical** | www.geenmedical.com | 英文 | 5 | 38/3 | 医学搜索聚合引擎 |
+| **Cochrane Library** | www.cochranelibrary.com | 英文 | 4 | 38/12 | 系统综述金标准 |
+| **维普 VIP** | www.cqvip.com | 中文 | 1 | 33/3 | 可用但链接较少 |
+
+#### 二级：可到达但链接间接（搜索到内容，但返回链接不指向该站点域名）
+
+| 站点 | 域名 | 类型 | 其他链接数 | 说明 |
+|------|------|------|-----------|------|
+| 中华医学期刊网 | medjournals.cn | 中文 | 12 | 搜索活跃（41次），内容丰富但链接跳转 |
+| 万方数据 | www.wanfangdata.com.cn | 中文 | 7 | 搜索活跃（42次），链接可能转跳 |
+| 中国临床试验注册中心 | www.chictr.org.cn | 中文 | 7 | 有内容产出，链接指向其他站 |
+| 中国中医药数据库 | cintmed.cintcm.cn | 中文 | 22 | 内容最丰富（8631字），链接非直达 |
+| Scopus | www.scopus.com | 英文 | 15 | 付费墙限制，内容来自外部引用 |
+| Embase | www.embase.com | 英文 | 14 | 需机构登录 |
+| Web of Science | www.webofscience.com | 英文 | 6 | 付费墙限制 |
+
+#### 三级：不可用或受限
+
+| 站点 | 域名 | 说明 |
+|------|------|------|
+| Ovid | ovidsp.ovid.com | 仅搜索未读取内容，需机构登录 |
+| NSTL | www.nstl.gov.cn | 搜索到但无有效内容和链接 |
+
+### 2.3 关键发现
+
+1. **付费库无法穿透**：Unifuncs 只能访问公开可达的网页内容，不支持传入用户名密码。Web of Science、Embase、Scopus、Ovid 等需要机构 IP 或账号登录的库无法直接搜索。
+
+2. **ClinicalTrials.gov 必须用英文**：该站点为纯英文网站，中文查询效率极低。使用英文查询 + max_depth≥10 时，可稳定返回 30+ 个 NCT 编号和链接。
+
+3. **中文库表现不一**：CNKI 和 SinoMed 效果较好，能直接返回站内链接；万方和中华医学期刊网可到达但链接不直达。
+
+---
+
+## 3. 两种调用模式
+
+### 3.1 OpenAI 兼容协议（流式，适合实时展示）
+
+```typescript
+import OpenAI from 'openai';
+
+const client = new OpenAI({
+  baseURL: 'https://api.unifuncs.com/deepsearch/v1',
+  apiKey: process.env.UNIFUNCS_API_KEY,
+});
+
+const stream = await client.chat.completions.create({
+  model: 's2',
+  messages: [{ role: 'user', content: query }],
+  stream: true,
+  introduction: '你是一名专业的临床研究文献检索专家',
+  max_depth: 15,
+  domain_scope: ['https://pubmed.ncbi.nlm.nih.gov/'],
+  domain_blacklist: [],
+  reference_style: 'link',
+} as any);
+
+for await (const chunk of stream) {
+  const delta = chunk.choices[0]?.delta;
+  if ((delta as any)?.reasoning_content) {
+    // AI 思考过程（逐字流式）
+  }
+  if (delta?.content) {
+    // 最终结果内容（逐字流式）
+  }
+}
+```
+
+**优点：** 实时展示 AI 思考过程，用户体验好  
+**缺点：** 连接不稳定，离开页面任务丢失，长任务容易超时
+
+### 3.2 异步模式（推荐用于 V2.0）
+
+#### 创建任务
+
+```typescript
+const payload = {
+  model: 's2',
+  messages: [{ role: 'user', content: query }],
+  introduction: '你是一名专业的临床研究文献检索专家',
+  max_depth: 15,
+  domain_scope: ['https://pubmed.ncbi.nlm.nih.gov/'],
+  domain_blacklist: [],
+  reference_style: 'link',
+  generate_summary: true,
+  output_prompt: '请输出结构化报告和文献列表',
+};
+
+const res = await fetch('https://api.unifuncs.com/deepsearch/v1/create_task', {
+  method: 'POST',
+  headers: {
+    'Authorization': `Bearer ${UNIFUNCS_API_KEY}`,
+    'Content-Type': 'application/json',
+  },
+  body: JSON.stringify(payload),
+});
+
+const { data } = await res.json();
+// data.task_id → 保存到数据库
+```
+
+#### 轮询任务
+
+```typescript
+const params = new URLSearchParams({ task_id: taskId });
+const res = await fetch(
+  `https://api.unifuncs.com/deepsearch/v1/query_task?${params}`,
+  { headers: { 'Authorization': `Bearer ${UNIFUNCS_API_KEY}` } }
+);
+
+const { data } = await res.json();
+// data.status: pending / processing / completed / failed
+// data.result.content: 最终结果
+// data.result.reasoning_content: AI 思考过程（增量）
+// data.progress: { current, total, message }
+// data.statistics: { iterations, search_count, read_count, token_usage }
+```
+
+**优点：** 任务持久化，离开页面不中断，可恢复，适合长任务  
+**缺点：** 非实时，需要轮询获取进度
+
+---
+
+## 4. 关键参数说明
+
+| 参数 | 类型 | 推荐值 | 说明 |
+|------|------|--------|------|
+| `model` | string | `"s2"` | 固定值 |
+| `max_depth` | number | 10~25 | 搜索深度。测试用 5，生产用 15~25。越大越全但越慢 |
+| `domain_scope` | string[] | 按需配置 | 限定搜索范围。留空则不限 |
+| `domain_blacklist` | string[] | `[]` | 排除特定站点 |
+| `introduction` | string | 见下方 | 设定 AI 角色和搜索指导 |
+| `reference_style` | string | `"link"` | 引用格式，`link` 或 `character` |
+| `output_prompt` | string | 可选 | 自定义输出格式提示词 |
+| `generate_summary` | boolean | `true` | 异步模式完成后自动生成摘要 |
+
+### 推荐的 introduction 模板
+
+```
+你是一名专业的临床研究文献检索专家。
+请根据用户的研究需求，在指定数据库中系统性地检索相关文献。
+
+检索要求：
+1. 优先检索高质量研究：系统综述、Meta分析、RCT
+2. 关注 PICOS 要素（人群、干预、对照、结局、研究设计）
+3. 优先近 5 年的研究
+4. 返回每篇文献的完整元数据（标题、作者、期刊、年份、链接）
+
+输出要求：
+1. 按研究类型分组
+2. 每篇文献附带直接链接
+3. 最后给出综合性研究概述
+```
+
+---
+
+## 5. 最佳策略指南
+
+### 5.1 针对不同站点的策略
+
+| 目标站点 | 查询语言 | max_depth | 特殊说明 |
+|---------|---------|-----------|---------|
+| PubMed / NCBI | 中文或英文均可 | 15~25 | 效果最好，核心数据源 |
+| ClinicalTrials.gov | **必须英文** | 10~15 | 中文查询极慢甚至超时 |
+| Cochrane Library | 英文优先 | 10~15 | 系统综述专用 |
+| Google Scholar | 中文或英文 | 10~15 | 跨库聚合，可能有重复 |
+| CNKI / SinoMed | 中文 | 10~15 | 中文文献首选 |
+| GeenMedical | 英文优先 | 5~10 | 聚合搜索，速度快 |
+
+### 5.2 多站点组合搜索
+
+```typescript
+// V2.0 推荐：用户选择多个数据源，合并到 domain_scope
+const domainScope = [
+  'https://pubmed.ncbi.nlm.nih.gov/',
+  'https://www.cochranelibrary.com/',
+  'https://scholar.google.com/',
+];
+
+// 如果包含 ClinicalTrials.gov，需求扩写时自动翻译为英文
+```
+
+### 5.3 性能预期
+
+| max_depth | 预计耗时 | 搜索/阅读量 | 适用场景 |
+|-----------|---------|------------|---------|
+| 5 | 1~3 分钟 | 10~40 / 0~20 | 快速探索 |
+| 10 | 2~5 分钟 | 20~50 / 10~30 | 常规检索 |
+| 15 | 3~8 分钟 | 30~80 / 20~50 | 深度检索 |
+| 25 | 5~15 分钟 | 50~150 / 30~80 | 全面研究 |
+
+### 5.4 成本估算
+
+- 单次搜索 Token 消耗：5万~30万 tokens（取决于深度和站点数量）
+- 估算成本：约 ¥0.1~0.5/次（按 unifuncs 定价）
+
+---
+
+## 6. 平台集成方式
+
+### 当前使用（V1.x - ASL 模块）
+
+```
+researchService.ts → OpenAI SDK → SSE 流式
+researchWorker.ts → pg-boss → 异步执行
+```
+
+### 计划升级（V2.0 - ASL Deep Research）
+
+```
+requirementExpansionService.ts → DeepSeek-V3 需求扩写
+unifuncsAsyncClient.ts → create_task / query_task 异步模式
+deepResearchV2Worker.ts → pg-boss Worker → 轮询 + 日志解析
+```
+
+### 其他模块可复用场景
+
+| 模块 | 潜在用途 |
+|------|---------|
+| AIA 智能问答 | 智能体联网搜索增强 |
+| PKB 个人知识库 | 自动补充知识库文献 |
+| RVW 稿件审查 | 自动查找参考文献验证 |
+| IIT 研究管理 | 自动检索同类临床试验 |
+
+---
+
+## 7. 测试脚本
+
+项目中已提供两个测试脚本：
+
+| 脚本 | 路径 | 用途 |
+|------|------|------|
+| 全站覆盖测试 | `backend/scripts/test-unifuncs-site-coverage.ts` | 并行测试 18 个医学网站的搜索能力 |
+| ClinicalTrials 专项 | `backend/scripts/test-unifuncs-clinicaltrials.ts` | 4 种策略对比测试 ClinicalTrials.gov |
+| 快速验证 | `backend/scripts/test-unifuncs-deepsearch.ts` | 单站点 SSE 流式快速测试 |
+
+```bash
+cd backend
+npx tsx scripts/test-unifuncs-site-coverage.ts
+npx tsx scripts/test-unifuncs-clinicaltrials.ts
+```
+
+---
+
+**维护者：** 开发团队  
+**最后更新：** 2026-02-22