Features - User Management (Phase 4.1): - Database: Add user_modules table for fine-grained module permissions - Database: Add 4 user permissions (view/create/edit/delete) to role_permissions - Backend: UserService (780 lines) - CRUD with tenant isolation - Backend: UserController + UserRoutes (648 lines) - 13 API endpoints - Backend: Batch import users from Excel - Frontend: UserListPage (412 lines) - list/filter/search/pagination - Frontend: UserFormPage (341 lines) - create/edit with module config - Frontend: UserDetailPage (393 lines) - details/tenant/module management - Frontend: 3 modal components (592 lines) - import/assign/configure - API: GET/POST/PUT/DELETE /api/admin/users/* endpoints Architecture Upgrade - Module Permission System: - Backend: Add getUserModules() method in auth.service - Backend: Login API returns modules array in user object - Frontend: AuthContext adds hasModule() method - Frontend: Navigation filters modules based on user.modules - Frontend: RouteGuard checks requiredModule instead of requiredVersion - Frontend: Remove deprecated version-based permission system - UX: Only show accessible modules in navigation (clean UI) - UX: Smart redirect after login (avoid 403 for regular users) Fixes: - Fix UTF-8 encoding corruption in ~100 docs files - Fix pageSize type conversion in userService (String to Number) - Fix authUser undefined error in TopNavigation - Fix login redirect logic with role-based access check - Update Git commit guidelines v1.2 with UTF-8 safety rules Database Changes: - CREATE TABLE user_modules (user_id, tenant_id, module_code, is_enabled) - ADD UNIQUE CONSTRAINT (user_id, tenant_id, module_code) - INSERT 4 permissions + role assignments - UPDATE PUBLIC tenant with 8 module subscriptions Technical: - Backend: 5 new files (~2400 lines) - Frontend: 10 new files (~2500 lines) - Docs: 1 development record + 2 status updates + 1 guideline update - Total: ~4900 lines of code Status: User management 100% complete, module permission system operational
2.3 KiB
2.3 KiB
文档处理引擎
能力定位: 通用能力层
复用率: 86% (6个模块依赖)
优先级: P0
状态: ✅ 已实现(Python微服务)
📋 能力概述
文档处理引擎是平台的核心基础能力,负责:
- 多格式文档文本提取(PDF、Docx、Txt、Excel)
- OCR处理
- 表格提取
- 语言检测
- 质量评估
📊 依赖模块
6个模块依赖(86%复用率):
- ASL - AI智能文献(文献PDF提取)
- PKB - 个人知识库(知识库文档上传)
- DC - 数据清洗(Excel/Docx数据导入)
- SSA - 智能统计分析(数据导入)
- ST - 统计分析工具(数据导入)
- RVW - 稿件审查(稿件文档提取)
💡 核心功能
1. PDF提取
- Nougat:英文学术论文(高质量)
- PyMuPDF:中文PDF + 兜底方案(快速)
- 语言检测:自动识别中英文
- 质量评估:提取质量评分
2. Docx提取
- Mammoth:转Markdown
- python-docx:结构化读取
3. Txt提取
- 多编码支持:UTF-8、GBK等
- chardet:自动检测编码
4. Excel处理
- openpyxl:读取Excel
- pandas:数据处理
🏗️ 技术架构
Python微服务(FastAPI):
extraction_service/
├── main.py (509行) - FastAPI主服务
├── services/
│ ├── pdf_extractor.py (242行) - PDF提取总协调
│ ├── pdf_processor.py (280行) - PyMuPDF实现
│ ├── language_detector.py (120行) - 语言检测
│ ├── nougat_extractor.py (242行) - Nougat实现
│ ├── docx_extractor.py (253行) - Docx提取
│ └── txt_extractor.py (316行) - Txt提取(多编码)
└── requirements.txt
📚 API端点
POST /api/extract/pdf - PDF文本提取
POST /api/extract/docx - Docx文本提取
POST /api/extract/txt - Txt文本提取
POST /api/extract/excel - Excel表格提取
GET /health - 健康检查
🔗 相关文档
最后更新: 2025-11-06
维护人: 技术架构师