feat(rvw): Implement RVW V2.0 Data Forensics Module - Day 6 StatValidator

Summary:
- Implement L2 Statistical Validator (CI-P consistency, T-test reverse)
- Implement L2.5 Consistency Forensics (SE Triangle, SD>Mean check)
- Add error/warning severity classification with tolerance thresholds
- Support 5+ CI formats parsing (parentheses, brackets, 95% CI prefix)
- Complete Python forensics service (types, config, validator, extractor)

V2.0 Development Progress (Week 2 Day 6):
- Day 1-5: Python service setup, Word table extraction, L1 arithmetic validator
- Day 6: L2 StatValidator + L2.5 consistency forensics (promoted from V2.1)

Test Results:
- Unit tests: 4/4 passed (CI-P, SE Triangle, SD>Mean, T-test)
- Real document tests: 5/5 successful, 2 reasonable WARNINGs

Status: Day 6 completed, ready for Day 7 (Skills Framework)
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
2026-02-17 22:15:27 +08:00
parent 7a299e8562
commit e785969e54
31 changed files with 5925 additions and 15 deletions

View File

@@ -52,6 +52,9 @@ app.add_middleware(
TEMP_DIR = Path(os.getenv("TEMP_DIR", "/tmp/extraction_service"))
TEMP_DIR.mkdir(parents=True, exist_ok=True)
# 注册 RVW V2.0 数据侦探路由
app.include_router(forensics_router)
# 导入服务模块
from services.pdf_extractor import extract_pdf_pymupdf
from services.pdf_processor import extract_pdf, get_pdf_processing_strategy
@@ -66,6 +69,9 @@ from services.pdf_markdown_processor import PdfMarkdownProcessor, extract_pdf_to
# 新增文档导出服务Markdown → Word
from services.doc_export_service import check_pandoc_available, convert_markdown_to_docx, create_protocol_docx
# 新增RVW V2.0 数据侦探模块
from forensics.api import router as forensics_router
# 兼容nougat 相关(已废弃,保留空实现避免报错)
def check_nougat_available(): return False
def get_nougat_info(): return {"available": False, "reason": "已废弃,使用 pymupdf4llm 替代"}