AIclinicalresearch

HaHafeng fa72beea6c feat(platform): Complete Postgres-Only architecture refactoring (Phase 1-7)

Major Changes:
- Implement Platform-Only architecture pattern (unified task management)
- Add PostgresCacheAdapter for unified caching (platform_schema.app_cache)
- Add PgBossQueue for job queue management (platform_schema.job)
- Implement CheckpointService using job.data (generic for all modules)
- Add intelligent threshold-based dual-mode processing (THRESHOLD=50)
- Add task splitting mechanism (auto chunk size recommendation)
- Refactor ASL screening service with smart mode selection
- Refactor DC extraction service with smart mode selection
- Register workers for ASL and DC modules

Technical Highlights:
- All task management data stored in platform_schema.job.data (JSONB)
- Business tables remain clean (no task management fields)
- CheckpointService is generic (shared by all modules)
- Zero code duplication (DRY principle)
- Follows 3-layer architecture principle
- Zero additional cost (no Redis needed, save 8400 CNY/year)

Code Statistics:
- New code: ~1750 lines
- Modified code: ~500 lines
- Test code: ~1800 lines
- Documentation: ~3000 lines

Testing:
- Unit tests: 8/8 passed
- Integration tests: 2/2 passed
- Architecture validation: passed
- Linter errors: 0

Files:
- Platform layer: PostgresCacheAdapter, PgBossQueue, CheckpointService, utils
- ASL module: screeningService, screeningWorker
- DC module: ExtractionController, extractionWorker
- Tests: 11 test files
- Docs: Updated 4 key documents

Status: Phase 1-7 completed, Phase 8-9 pending

2025-12-13 16:10:04 +08:00

operations

feat(platform): Complete Postgres-Only architecture refactoring (Phase 1-7)

2025-12-13 16:10:04 +08:00

services

feat(dc/tool-c): 完成AI代码生成服务（Day 3 MVP）

2025-12-07 16:21:32 +08:00

test_files

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

.gitignore

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

install_nougat.bat

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

install.bat

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

main.py

feat(dc-tool-c): Tool C UX重大改进 - 列头筛选/行号/滚动条/全量数据

2025-12-10 18:02:42 +08:00

quick_test.py

feat(dc/tool-c): 完成AI代码生成服务（Day 3 MVP）

2025-12-07 16:21:32 +08:00

README.md

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

requirements.txt

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

start.bat

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

test_dc_api.py

feat(platform): Complete Postgres-Only architecture refactoring (Phase 1-7)

2025-12-13 16:10:04 +08:00

test_execute_simple.py

feat(platform): Complete Postgres-Only architecture refactoring (Phase 1-7)

2025-12-13 16:10:04 +08:00

test_module.py

feat(platform): Complete Postgres-Only architecture refactoring (Phase 1-7)

2025-12-13 16:10:04 +08:00

test_service.py

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

库	版本	用途
fastapi	0.104.1	Web框架
uvicorn	0.24.0	ASGI服务器
PyMuPDF	1.23.8	PDF文本提取
pdfplumber	0.10.3	PDF语言检测
mammoth	1.6.0	Docx提取
langdetect	1.0.9	语言检测
loguru	0.7.2	日志管理

操作	目标时间
20页PDF（PyMuPDF）	<30秒
10页Docx	<10秒
1MB Txt	<5秒

README.md

文档提取微服务

功能特性

快速开始

1. 安装依赖

2. 配置环境变量

3. 启动服务

4. 测试服务

健康检查

PDF文本提取

API文档

项目结构

开发计划

✅ Day 1（已完成）

⏳ Day 2（进行中）

⏳ Day 3

依赖说明

性能指标

常见问题

Q: PyMuPDF安装失败？

Q: 服务无法启动？

Q: 临时文件在哪里？

License

README.md Unescape Escape

文档提取微服务

功能特性

快速开始

1. 安装依赖

2. 配置环境变量

3. 启动服务

4. 测试服务

健康检查

PDF文本提取

API文档

项目结构

开发计划

✅ Day 1（已完成）

⏳ Day 2（进行中）

⏳ Day 3

依赖说明

性能指标

常见问题

Q: PyMuPDF安装失败？

Q: 服务无法启动？

Q: 临时文件在哪里？

License

README.md