AIclinicalresearch

HaHafeng 75ceeb0653 hotfix(dc/tool-c): Fix compute formula validation and binning NaN serialization

Critical fixes:
1. Compute column: Add Chinese comma support in formula validation
   - Problem: Formula with Chinese comma failed validation
   - Fix: Add Chinese comma character to allowed_chars regex
   - Example: Support formulas like 'col1（kg）+ col2，col3'

2. Binning operation: Fix NaN serialization error
   - Problem: 'Out of range float values are not JSON compliant: nan'
   - Fix: Enhanced NaN/inf handling in binning endpoint
   - Added np.inf/-np.inf replacement before JSON serialization
   - Added manual JSON serialization with NaN->null conversion

3. Enhanced all operation endpoints for consistency
   - Updated conditional, dropna endpoints with same NaN/inf handling
   - Ensures all operations return JSON-compliant data

Modified files:
- extraction_service/operations/compute.py: Add Chinese comma to regex
- extraction_service/main.py: Enhanced NaN handling in binning/conditional/dropna

Status: Hotfix complete, ready for testing

2025-12-09 08:45:27 +08:00

operations

hotfix(dc/tool-c): Fix compute formula validation and binning NaN serialization

2025-12-09 08:45:27 +08:00

services

feat(dc/tool-c): 完成AI代码生成服务（Day 3 MVP）

2025-12-07 16:21:32 +08:00

test_files

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

.gitignore

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

install_nougat.bat

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

install.bat

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

main.py

hotfix(dc/tool-c): Fix compute formula validation and binning NaN serialization

2025-12-09 08:45:27 +08:00

quick_test.py

feat(dc/tool-c): 完成AI代码生成服务（Day 3 MVP）

2025-12-07 16:21:32 +08:00

README.md

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

requirements.txt

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

start.bat

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

test_dc_api.py

hotfix(dc/tool-c): Fix compute formula validation and binning NaN serialization

2025-12-09 08:45:27 +08:00

test_execute_simple.py

hotfix(dc/tool-c): Fix compute formula validation and binning NaN serialization

2025-12-09 08:45:27 +08:00

test_module.py

hotfix(dc/tool-c): Fix compute formula validation and binning NaN serialization

2025-12-09 08:45:27 +08:00

test_service.py

feat: add extraction_service (PDF/Docx/Txt) and update .gitignore to exclude venv

2025-11-16 15:32:44 +08:00

库	版本	用途
fastapi	0.104.1	Web框架
uvicorn	0.24.0	ASGI服务器
PyMuPDF	1.23.8	PDF文本提取
pdfplumber	0.10.3	PDF语言检测
mammoth	1.6.0	Docx提取
langdetect	1.0.9	语言检测
loguru	0.7.2	日志管理

操作	目标时间
20页PDF（PyMuPDF）	<30秒
10页Docx	<10秒
1MB Txt	<5秒

README.md

文档提取微服务

功能特性

快速开始

1. 安装依赖

2. 配置环境变量

3. 启动服务

4. 测试服务

健康检查

PDF文本提取

API文档

项目结构

开发计划

✅ Day 1（已完成）

⏳ Day 2（进行中）

⏳ Day 3

依赖说明

性能指标

常见问题

Q: PyMuPDF安装失败？

Q: 服务无法启动？

Q: 临时文件在哪里？

License

README.md Unescape Escape

文档提取微服务

功能特性

快速开始

1. 安装依赖

2. 配置环境变量

3. 启动服务

4. 测试服务

健康检查

PDF文本提取

API文档

项目结构

开发计划

✅ Day 1（已完成）

⏳ Day 2（进行中）

⏳ Day 3

依赖说明

性能指标

常见问题

Q: PyMuPDF安装失败？

Q: 服务无法启动？

Q: 临时文件在哪里？

License

README.md