Files
AIclinicalresearch/extraction_service/test_execute_simple.py
HaHafeng 40c2f8e148 feat(rag): Complete RAG engine implementation with pgvector
Major Features:
- Created ekb_schema (13th schema) with 3 tables: KB/Document/Chunk
- Implemented EmbeddingService (text-embedding-v4, 1024-dim vectors)
- Implemented ChunkService (smart Markdown chunking)
- Implemented VectorSearchService (multi-query + hybrid search)
- Implemented RerankService (qwen3-rerank)
- Integrated DeepSeek V3 QueryRewriter for cross-language search
- Python service: Added pymupdf4llm for PDF-to-Markdown conversion
- PKB: Dual-mode adapter (pgvector/dify/hybrid)

Architecture:
- Brain-Hand Model: Business layer (DeepSeek) + Engine layer (pgvector)
- Cross-language support: Chinese query matches English documents
- Small Embedding (1024) + Strong Reranker strategy

Performance:
- End-to-end latency: 2.5s
- Cost per query: 0.0025 RMB
- Accuracy improvement: +20.5% (cross-language)

Tests:
- test-embedding-service.ts: Vector embedding verified
- test-rag-e2e.ts: Full pipeline tested
- test-rerank.ts: Rerank quality validated
- test-query-rewrite.ts: Cross-language search verified
- test-pdf-ingest.ts: Real PDF document tested (Dongen 2003.pdf)

Documentation:
- Added 05-RAG-Engine-User-Guide.md
- Added 02-Document-Processing-User-Guide.md
- Updated system status documentation

Status: Production ready
2026-01-21 20:24:29 +08:00

110 lines
1.4 KiB
Python

"""简单的代码执行测试"""
import requests
import json
# 测试数据
test_data = [
{"patient_id": "P001", "age": 25, "gender": ""},
{"patient_id": "P002", "age": 65, "gender": ""},
{"patient_id": "P003", "age": 45, "gender": ""},
]
# 测试代码
test_code = """
df['age_group'] = df['age'].apply(lambda x: '老年' if x > 60 else '非老年')
print(f"处理完成,共 {len(df)} 行")
"""
print("=" * 60)
print("测试: Pandas代码执行")
print("=" * 60)
try:
response = requests.post(
"http://localhost:8000/api/dc/execute",
json={"data": test_data, "code": test_code},
timeout=10
)
print(f"\n状态码: {response.status_code}")
result = response.json()
print(json.dumps(result, indent=2, ensure_ascii=False))
if result.get("success"):
print("\n✅ 代码执行成功!")
print(f"结果数据: {len(result.get('result_data', []))}")
print(f"执行时间: {result.get('execution_time', 0):.3f}")
print(f"\n打印输出:\n{result.get('output', '')}")
print(f"\n结果数据示例:")
for row in result.get('result_data', [])[:3]:
print(f" {row}")
else:
print(f"\n❌ 代码执行失败: {result.get('error')}")
except Exception as e:
print(f"\n❌ 测试异常: {str(e)}")