Files
AIclinicalresearch/backend/prisma/migrations/create_tool_c_session.sql
HaHafeng 40c2f8e148 feat(rag): Complete RAG engine implementation with pgvector
Major Features:
- Created ekb_schema (13th schema) with 3 tables: KB/Document/Chunk
- Implemented EmbeddingService (text-embedding-v4, 1024-dim vectors)
- Implemented ChunkService (smart Markdown chunking)
- Implemented VectorSearchService (multi-query + hybrid search)
- Implemented RerankService (qwen3-rerank)
- Integrated DeepSeek V3 QueryRewriter for cross-language search
- Python service: Added pymupdf4llm for PDF-to-Markdown conversion
- PKB: Dual-mode adapter (pgvector/dify/hybrid)

Architecture:
- Brain-Hand Model: Business layer (DeepSeek) + Engine layer (pgvector)
- Cross-language support: Chinese query matches English documents
- Small Embedding (1024) + Strong Reranker strategy

Performance:
- End-to-end latency: 2.5s
- Cost per query: 0.0025 RMB
- Accuracy improvement: +20.5% (cross-language)

Tests:
- test-embedding-service.ts: Vector embedding verified
- test-rag-e2e.ts: Full pipeline tested
- test-rerank.ts: Rerank quality validated
- test-query-rewrite.ts: Cross-language search verified
- test-pdf-ingest.ts: Real PDF document tested (Dongen 2003.pdf)

Documentation:
- Added 05-RAG-Engine-User-Guide.md
- Added 02-Document-Processing-User-Guide.md
- Updated system status documentation

Status: Production ready
2026-01-21 20:24:29 +08:00

97 lines
1.3 KiB
SQL
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
-- 创建 Tool C Session 表
-- 日期: 2025-12-06
-- 用途: 科研数据编辑器会话管理
CREATE TABLE IF NOT EXISTS dc_schema.dc_tool_c_sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id VARCHAR(255) NOT NULL,
file_name VARCHAR(500) NOT NULL,
file_key VARCHAR(500) NOT NULL,
-- 数据元信息
total_rows INTEGER NOT NULL,
total_cols INTEGER NOT NULL,
columns JSONB NOT NULL,
encoding VARCHAR(50),
file_size INTEGER NOT NULL,
-- 时间戳
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP NOT NULL
);
-- 创建索引
CREATE INDEX IF NOT EXISTS idx_dc_tool_c_sessions_user_id ON dc_schema.dc_tool_c_sessions(user_id);
CREATE INDEX IF NOT EXISTS idx_dc_tool_c_sessions_expires_at ON dc_schema.dc_tool_c_sessions(expires_at);
-- 添加注释
COMMENT ON TABLE dc_schema.dc_tool_c_sessions IS 'Tool C (科研数据编辑器) Session会话表';
COMMENT ON COLUMN dc_schema.dc_tool_c_sessions.file_key IS 'OSS存储路径: dc/tool-c/sessions/{timestamp}-{fileName}';
COMMENT ON COLUMN dc_schema.dc_tool_c_sessions.columns IS '列名数组 ["age", "gender", "diagnosis"]';
COMMENT ON COLUMN dc_schema.dc_tool_c_sessions.expires_at IS '过期时间创建后10分钟';