Files
AIclinicalresearch/frontend-v2/docker-entrypoint.sh
HaHafeng 40c2f8e148 feat(rag): Complete RAG engine implementation with pgvector
Major Features:
- Created ekb_schema (13th schema) with 3 tables: KB/Document/Chunk
- Implemented EmbeddingService (text-embedding-v4, 1024-dim vectors)
- Implemented ChunkService (smart Markdown chunking)
- Implemented VectorSearchService (multi-query + hybrid search)
- Implemented RerankService (qwen3-rerank)
- Integrated DeepSeek V3 QueryRewriter for cross-language search
- Python service: Added pymupdf4llm for PDF-to-Markdown conversion
- PKB: Dual-mode adapter (pgvector/dify/hybrid)

Architecture:
- Brain-Hand Model: Business layer (DeepSeek) + Engine layer (pgvector)
- Cross-language support: Chinese query matches English documents
- Small Embedding (1024) + Strong Reranker strategy

Performance:
- End-to-end latency: 2.5s
- Cost per query: 0.0025 RMB
- Accuracy improvement: +20.5% (cross-language)

Tests:
- test-embedding-service.ts: Vector embedding verified
- test-rag-e2e.ts: Full pipeline tested
- test-rerank.ts: Rerank quality validated
- test-query-rewrite.ts: Cross-language search verified
- test-pdf-ingest.ts: Real PDF document tested (Dongen 2003.pdf)

Documentation:
- Added 05-RAG-Engine-User-Guide.md
- Added 02-Document-Processing-User-Guide.md
- Updated system status documentation

Status: Production ready
2026-01-21 20:24:29 +08:00

76 lines
1.2 KiB
Bash

#!/bin/bash
set -e
# 鈿狅笍 鍏抽敭锛氫笉缁欓粯璁ゅ€硷紝寮哄埗鍦?SAE 鎺у埗鍙伴厤缃?
# 濡傛灉鏈厤缃紝鎶ラ敊閫€鍑猴紙閬垮厤浣跨敤閿欒鐨勫悗绔湴鍧€锛?
if [ -z "$BACKEND_SERVICE_HOST" ]; then
echo "鉂?ERROR: BACKEND_SERVICE_HOST environment variable is required!"
echo "Please configure it in SAE console with backend internal IP (e.g., 172.17.x.x)"
exit 1
fi
if [ -z "$BACKEND_SERVICE_PORT" ]; then
echo "鈿狅笍 WARNING: BACKEND_SERVICE_PORT not set, using default: 3001"
export BACKEND_SERVICE_PORT=3001
fi
echo "============================================"
echo "Starting Frontend Nginx Service"
echo "Backend Service: ${BACKEND_SERVICE_HOST}:${BACKEND_SERVICE_PORT}"
echo "Container Timezone: $(cat /etc/timezone)"
echo "Current Time: $(date)"
echo "============================================"
# 浣跨敤 envsubst 鏇挎崲 Nginx 閰嶇疆涓殑鐜鍙橀噺
envsubst '${BACKEND_SERVICE_HOST} ${BACKEND_SERVICE_PORT}' \
< /etc/nginx/templates/nginx.conf.template \
> /etc/nginx/nginx.conf
# 楠岃瘉 Nginx 閰嶇疆
nginx -t
# 鍚姩 Nginx
exec nginx -g 'daemon off;'