Files
AIclinicalresearch/backend/migrations/add_data_stats_to_tool_c_session.sql
HaHafeng 40c2f8e148 feat(rag): Complete RAG engine implementation with pgvector
Major Features:
- Created ekb_schema (13th schema) with 3 tables: KB/Document/Chunk
- Implemented EmbeddingService (text-embedding-v4, 1024-dim vectors)
- Implemented ChunkService (smart Markdown chunking)
- Implemented VectorSearchService (multi-query + hybrid search)
- Implemented RerankService (qwen3-rerank)
- Integrated DeepSeek V3 QueryRewriter for cross-language search
- Python service: Added pymupdf4llm for PDF-to-Markdown conversion
- PKB: Dual-mode adapter (pgvector/dify/hybrid)

Architecture:
- Brain-Hand Model: Business layer (DeepSeek) + Engine layer (pgvector)
- Cross-language support: Chinese query matches English documents
- Small Embedding (1024) + Strong Reranker strategy

Performance:
- End-to-end latency: 2.5s
- Cost per query: 0.0025 RMB
- Accuracy improvement: +20.5% (cross-language)

Tests:
- test-embedding-service.ts: Vector embedding verified
- test-rag-e2e.ts: Full pipeline tested
- test-rerank.ts: Rerank quality validated
- test-query-rewrite.ts: Cross-language search verified
- test-pdf-ingest.ts: Real PDF document tested (Dongen 2003.pdf)

Documentation:
- Added 05-RAG-Engine-User-Guide.md
- Added 02-Document-Processing-User-Guide.md
- Updated system status documentation

Status: Production ready
2026-01-21 20:24:29 +08:00

85 lines
817 B
SQL
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
-- 为 DcToolCSession 添加 dataStats 字段
-- 用于缓存数据统计信息,支持数据探索问答功能
--
-- 执行方法psql -d airesearch_v2 -f add_data_stats_to_tool_c_session.sql
\c airesearch_v2
-- 添加字段
ALTER TABLE dc_schema.dc_tool_c_sessions
ADD COLUMN IF NOT EXISTS data_stats JSONB NULL;
-- 添加注释
COMMENT ON COLUMN dc_schema.dc_tool_c_sessions.data_stats IS '数据统计信息缓存(用于数据探索问答):{totalRows, columnStats}';
-- 验证
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = 'dc_schema'
AND table_name = 'dc_tool_c_sessions'
AND column_name = 'data_stats';
\echo '✅ 字段 data_stats 已成功添加到 dc_tool_c_sessions 表'