Files
AIclinicalresearch/extraction_service/requirements-prod.txt
HaHafeng e785969e54 feat(rvw): Implement RVW V2.0 Data Forensics Module - Day 6 StatValidator
Summary:
- Implement L2 Statistical Validator (CI-P consistency, T-test reverse)
- Implement L2.5 Consistency Forensics (SE Triangle, SD>Mean check)
- Add error/warning severity classification with tolerance thresholds
- Support 5+ CI formats parsing (parentheses, brackets, 95% CI prefix)
- Complete Python forensics service (types, config, validator, extractor)

V2.0 Development Progress (Week 2 Day 6):
- Day 1-5: Python service setup, Word table extraction, L1 arithmetic validator
- Day 6: L2 StatValidator + L2.5 consistency forensics (promoted from V2.1)

Test Results:
- Unit tests: 4/4 passed (CI-P, SE Triangle, SD>Mean, T-test)
- Real document tests: 5/5 successful, 2 reasonable WARNINGs

Status: Day 6 completed, ready for Day 7 (Skills Framework)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-17 22:15:27 +08:00

57 lines
1.4 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ========================================
# 生产环境依赖 (2026-01-26 更新)
# 移除 Nougat使用 pymupdf4llm 替代
# ========================================
# Web框架
fastapi==0.104.1
uvicorn[standard]==0.24.0
python-multipart==0.0.6
# 数据处理 (DC工具必需)
pandas>=2.0.0
numpy>=1.24.0
polars>=0.19.0
scipy>=1.11.0 # 统计验证RVW V2.0 数据侦探T检验、卡方检验
# PDF处理 - 使用 pymupdf4llm替代 nougat更轻量
PyMuPDF>=1.24.0 # PDF 核心库(代码中 import fitz 使用)
pymupdf4llm>=0.0.17 # PDF → Markdown
pdfplumber==0.10.3 # 备用 PDF 处理
# Word处理
mammoth==1.6.0 # Docx → Markdown
python-docx==1.1.0 # Docx 读取
pypandoc>=1.13 # Markdown → Docx (需要系统安装 pandoc)
# Excel/CSV处理
openpyxl>=3.1.2 # Excel 读取
tabulate>=0.9.0 # DataFrame → Markdown
# PPT处理
python-pptx>=0.6.23 # PPT 读取
# 语言检测
langdetect==1.0.9
# 编码检测
chardet==5.2.0
# 工具
python-dotenv==1.0.0
pydantic>=2.10.0
# 日志
loguru==0.7.2
# 测试工具
requests==2.31.0
# ========================================
# 注意:生产环境已移除以下重量级依赖
# - nougat-ocr==0.1.17 (约1.5GB)
# - albumentations==1.3.1 (Nougat依赖)
#
# 已使用 pymupdf4llm 替代,功能相似但更轻量
# ========================================