Commit Graph

5 Commits

Author SHA1 Message Date
61cdc97eeb feat(platform): Fix pg-boss queue conflict and add safety standards
Summary:
- Fix pg-boss queue conflict (duplicate key violation on queue_pkey)
- Add global error listener to prevent process crash
- Reduce connection pool from 10 to 4
- Add graceful shutdown handling (SIGTERM/SIGINT)
- Fix researchWorker recursive call bug in catch block
- Make screeningWorker idempotent using upsert

Security Standards (v1.1):
- Prohibit recursive retry in Worker catch blocks
- Prohibit payload bloat (only store fileKey/ID in job.data)
- Require Worker idempotency (upsert + unique constraint)
- Recommend task-specific expireInSeconds settings
- Document graceful shutdown pattern

New Features:
- PKB signed URL endpoint for document preview/download
- pg_bigm installation guide for Docker
- Dockerfile.postgres-with-extensions for pgvector + pg_bigm

Documentation:
- Update Postgres-Only async task processing guide (v1.1)
- Add troubleshooting SQL queries
- Update safety checklist

Tested: Local verification passed
2026-01-23 22:07:26 +08:00
9c96f75c52 feat(storage): integrate Alibaba Cloud OSS for file persistence - Add OSSAdapter and LocalAdapter with StorageFactory pattern - Integrate PKB module with OSS upload - Rename difyDocumentId to storageKey - Create 4 OSS buckets and development specification 2026-01-22 22:02:20 +08:00
40c2f8e148 feat(rag): Complete RAG engine implementation with pgvector
Major Features:
- Created ekb_schema (13th schema) with 3 tables: KB/Document/Chunk
- Implemented EmbeddingService (text-embedding-v4, 1024-dim vectors)
- Implemented ChunkService (smart Markdown chunking)
- Implemented VectorSearchService (multi-query + hybrid search)
- Implemented RerankService (qwen3-rerank)
- Integrated DeepSeek V3 QueryRewriter for cross-language search
- Python service: Added pymupdf4llm for PDF-to-Markdown conversion
- PKB: Dual-mode adapter (pgvector/dify/hybrid)

Architecture:
- Brain-Hand Model: Business layer (DeepSeek) + Engine layer (pgvector)
- Cross-language support: Chinese query matches English documents
- Small Embedding (1024) + Strong Reranker strategy

Performance:
- End-to-end latency: 2.5s
- Cost per query: 0.0025 RMB
- Accuracy improvement: +20.5% (cross-language)

Tests:
- test-embedding-service.ts: Vector embedding verified
- test-rag-e2e.ts: Full pipeline tested
- test-rerank.ts: Rerank quality validated
- test-query-rewrite.ts: Cross-language search verified
- test-pdf-ingest.ts: Real PDF document tested (Dongen 2003.pdf)

Documentation:
- Added 05-RAG-Engine-User-Guide.md
- Added 02-Document-Processing-User-Guide.md
- Updated system status documentation

Status: Production ready
2026-01-21 20:24:29 +08:00
f01981bf78 feat(dc/tool-c): 完成AI代码生成服务(Day 3 MVP)
核心功能:
- 新增AICodeService(550行):AI代码生成核心服务
- 新增AIController(257行):4个API端点
- 新增dc_tool_c_ai_history表:存储对话历史
- 实现自我修正机制:最多3次智能重试
- 集成LLMFactory:复用通用能力层
- 10个Few-shot示例:覆盖Level 1-4场景

技术优化:
- 修复NaN序列化问题(Python端转None)
- 修复数据传递问题(从Session获取真实数据)
- 优化System Prompt(明确环境信息)
- 调整Few-shot示例(移除import语句)

测试结果:
- 通过率:9/11(81.8%) 达到MVP标准
- 成功场景:缺失值处理、编码、分箱、BMI、筛选、填补、统计、分类
- 待优化:数值清洗、智能去重(已记录技术债务TD-C-006)

API端点:
- POST /api/v1/dc/tool-c/ai/generate(生成代码)
- POST /api/v1/dc/tool-c/ai/execute(执行代码)
- POST /api/v1/dc/tool-c/ai/process(生成并执行,一步到位)
- GET /api/v1/dc/tool-c/ai/history/:sessionId(对话历史)

文档更新:
- 新增Day 3开发完成总结(770行)
- 新增复杂场景优化技术债务(TD-C-006)
- 更新工具C当前状态文档
- 更新技术债务清单

影响范围:
- backend/src/modules/dc/tool-c/*(新增2个文件,更新1个文件)
- backend/scripts/create-tool-c-ai-history-table.mjs(新增)
- backend/prisma/schema.prisma(新增DcToolCAiHistory模型)
- extraction_service/services/dc_executor.py(NaN序列化修复)
- docs/03-业务模块/DC-数据清洗整理/*(5份文档更新)

Breaking Changes: 无

总代码行数:+950行

Refs: #Tool-C-Day3
2025-12-07 16:21:32 +08:00
AI Clinical Dev Team
39eb62ee79 feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv 2025-11-16 15:32:44 +08:00