Files
AIclinicalresearch/extraction_service/requirements.txt
HaHafeng e785969e54 feat(rvw): Implement RVW V2.0 Data Forensics Module - Day 6 StatValidator
Summary:
- Implement L2 Statistical Validator (CI-P consistency, T-test reverse)
- Implement L2.5 Consistency Forensics (SE Triangle, SD>Mean check)
- Add error/warning severity classification with tolerance thresholds
- Support 5+ CI formats parsing (parentheses, brackets, 95% CI prefix)
- Complete Python forensics service (types, config, validator, extractor)

V2.0 Development Progress (Week 2 Day 6):
- Day 1-5: Python service setup, Word table extraction, L1 arithmetic validator
- Day 6: L2 StatValidator + L2.5 consistency forensics (promoted from V2.1)

Test Results:
- Unit tests: 4/4 passed (CI-P, SE Triangle, SD>Mean, T-test)
- Real document tests: 5/5 successful, 2 reasonable WARNINGs

Status: Day 6 completed, ready for Day 7 (Skills Framework)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-17 22:15:27 +08:00

42 lines
940 B
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# FastAPI核心依赖
fastapi==0.104.1
uvicorn[standard]==0.24.0
python-multipart==0.0.6
# PDF处理 - 使用 pymupdf4llm替代 nougat更轻量
pymupdf4llm>=0.0.17 # PDF → Markdown自动包含 pymupdf
pdfplumber==0.10.3 # 备用 PDF 处理
# Word处理
mammoth==1.6.0 # Docx → Markdown
python-docx==1.1.0 # Docx 读取
pypandoc>=1.13 # Markdown → Docx (需要系统安装 pandoc)
# Excel/CSV处理
pandas>=2.0.0 # 表格处理
openpyxl>=3.1.2 # Excel 读取
# 统计验证 (RVW V2.0 数据侦探)
scipy>=1.11.0 # T检验、卡方检验逆向计算
tabulate>=0.9.0 # DataFrame → Markdown
# PPT处理
python-pptx>=0.6.23 # PPT 读取
# 语言检测
langdetect==1.0.9
# 编码检测
chardet==5.2.0
# 工具
python-dotenv==1.0.0
pydantic>=2.10.0
# 日志
loguru==0.7.2
# 测试工具
requests==2.31.0