Files
AIclinicalresearch/tests/QUICKSTART_快速开始.md
HaHafeng 40c2f8e148 feat(rag): Complete RAG engine implementation with pgvector
Major Features:
- Created ekb_schema (13th schema) with 3 tables: KB/Document/Chunk
- Implemented EmbeddingService (text-embedding-v4, 1024-dim vectors)
- Implemented ChunkService (smart Markdown chunking)
- Implemented VectorSearchService (multi-query + hybrid search)
- Implemented RerankService (qwen3-rerank)
- Integrated DeepSeek V3 QueryRewriter for cross-language search
- Python service: Added pymupdf4llm for PDF-to-Markdown conversion
- PKB: Dual-mode adapter (pgvector/dify/hybrid)

Architecture:
- Brain-Hand Model: Business layer (DeepSeek) + Engine layer (pgvector)
- Cross-language support: Chinese query matches English documents
- Small Embedding (1024) + Strong Reranker strategy

Performance:
- End-to-end latency: 2.5s
- Cost per query: 0.0025 RMB
- Accuracy improvement: +20.5% (cross-language)

Tests:
- test-embedding-service.ts: Vector embedding verified
- test-rag-e2e.ts: Full pipeline tested
- test-rerank.ts: Rerank quality validated
- test-query-rewrite.ts: Cross-language search verified
- test-pdf-ingest.ts: Real PDF document tested (Dongen 2003.pdf)

Documentation:
- Added 05-RAG-Engine-User-Guide.md
- Added 02-Document-Processing-User-Guide.md
- Updated system status documentation

Status: Production ready
2026-01-21 20:24:29 +08:00

1.8 KiB
Raw Blame History

🚀 快速开始 - 1分钟运行测试

Windows用户

方法1双击运行最简单

  1. 双击 run_tests.bat
  2. 等待测试完成

方法2命令行

cd AIclinicalresearch\tests
run_tests.bat

Linux/Mac用户

cd AIclinicalresearch/tests
chmod +x run_tests.sh
./run_tests.sh

⚠️ 前提条件

必须先启动Python服务

# 打开新终端
cd AIclinicalresearch/extraction_service
python main.py

看到这行表示启动成功:

INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8001

📊 预期结果

全部通过

总测试数: 18
✅ 通过: 18
❌ 失败: 0
通过率: 100.0%

🎉 所有测试通过!

⚠️ 部分失败

  • 查看红色错误信息
  • 检查失败的具体测试
  • 查看Python服务日志

🎯 测试内容

  • 6种简单填补方法均值、中位数、众数、固定值、前向填充、后向填充
  • MICE多重插补单列、多列
  • 边界情况100%缺失、0%缺失、特殊字符)
  • 各种数据类型(数值、分类、混合)
  • 性能测试1000行数据

💡 提示

  • 第一次运行会自动安装依赖pandas, numpy, requests
  • 测试时间约 45-60 秒
  • 测试数据自动生成,无需手动准备
  • 颜色输出:绿色=通过,红色=失败,黄色=警告

🆘 遇到问题?

问题1无法连接到服务

解决确保Python服务在运行python main.py

问题2依赖安装失败

解决:手动安装 pip install pandas numpy requests

问题3测试失败

解决:查看错误信息,检查代码逻辑


准备好了吗?启动服务,运行测试! 🚀