Files
AIclinicalresearch/extraction_service/test_module.py
HaHafeng 40c2f8e148 feat(rag): Complete RAG engine implementation with pgvector
Major Features:
- Created ekb_schema (13th schema) with 3 tables: KB/Document/Chunk
- Implemented EmbeddingService (text-embedding-v4, 1024-dim vectors)
- Implemented ChunkService (smart Markdown chunking)
- Implemented VectorSearchService (multi-query + hybrid search)
- Implemented RerankService (qwen3-rerank)
- Integrated DeepSeek V3 QueryRewriter for cross-language search
- Python service: Added pymupdf4llm for PDF-to-Markdown conversion
- PKB: Dual-mode adapter (pgvector/dify/hybrid)

Architecture:
- Brain-Hand Model: Business layer (DeepSeek) + Engine layer (pgvector)
- Cross-language support: Chinese query matches English documents
- Small Embedding (1024) + Strong Reranker strategy

Performance:
- End-to-end latency: 2.5s
- Cost per query: 0.0025 RMB
- Accuracy improvement: +20.5% (cross-language)

Tests:
- test-embedding-service.ts: Vector embedding verified
- test-rag-e2e.ts: Full pipeline tested
- test-rerank.ts: Rerank quality validated
- test-query-rewrite.ts: Cross-language search verified
- test-pdf-ingest.ts: Real PDF document tested (Dongen 2003.pdf)

Documentation:
- Added 05-RAG-Engine-User-Guide.md
- Added 02-Document-Processing-User-Guide.md
- Updated system status documentation

Status: Production ready
2026-01-21 20:24:29 +08:00

90 lines
914 B
Python

"""测试dc_executor模块"""
print("测试dc_executor模块导入...")
try:
from services.dc_executor import validate_code, execute_pandas_code
print("✅ 模块导入成功")
# 测试验证功能
print("\n测试validate_code...")
result = validate_code("df['x'] = 1")
print(f"✅ validate_code成功: {result}")
# 测试执行功能
print("\n测试execute_pandas_code...")
test_data = [{"age": 25}, {"age": 65}]
result = execute_pandas_code(test_data, "df['old'] = df['age'] > 60")
print(f"✅ execute_pandas_code成功: success={result['success']}")
if result['success']:
print(f" 结果: {result['result_data']}")
print("\n🎉 所有模块测试通过!")
except Exception as e:
print(f"❌ 测试失败: {e}")
import traceback
traceback.print_exc()