feat(ssa): Complete Phase 2A frontend integration - multi-step workflow end-to-end

Phase 2A: WorkflowPlannerService, WorkflowExecutorService, Python data quality, 6 bug fixes, DescriptiveResultView, multi-step R code/Word export, MVP UI reuse. V11 UI: Gemini-style, multi-task, single-page scroll, Word export. Architecture: Block-based rendering consensus (4 block types). New R tools: chi_square, correlation, descriptive, logistic_binary, mann_whitney, t_test_paired. Docs: dev summary, block-based plan, status updates, task list v2.0. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-20 23:09:27 +08:00
parent 23b422f758
commit 428a22adf2
62 changed files with 15416 additions and 299 deletions
--- a/docs/02-通用能力层/06-R统计引擎/01-R统计引擎架构与部署指南.md
+++ b/docs/02-通用能力层/06-R统计引擎/01-R统计引擎架构与部署指南.md
@@ -1,9 +1,9 @@
 # R 统计引擎架构与部署指南

-> **版本：** v1.0  
-> **创建日期：** 2026-02-19  
+> **版本：** v1.1  
+> **更新日期：** 2026-02-20  
 > **维护者：** SSA-Pro 开发团队  
-> **状态：** ✅ 生产就绪
+> **状态：** ✅ 生产就绪（Phase 2A 完成）

 ---

@@ -109,6 +109,38 @@ R 统计引擎采用 **Brain-Hand 分离架构**：
 }
 ```

+#### 2.2.1 inline 数据格式详解
+
+R 数据加载器 (`utils/data_loader.R`) 支持两种 JSON 数据格式：
+
+| 格式 | 说明 | 示例 |
+|------|------|------|
+| **行格式** | JSON 对象数组，每个对象是一行 | `[{"sex": 1, "age": 25}, {"sex": 2, "age": 30}]` |
+| **列格式** | JSON 对象，每个属性是一列 | `{"sex": [1, 2], "age": [25, 30]}` |
+
+> **推荐**：使用**行格式**，与 JavaScript/TypeScript 的数据处理习惯一致。
+
+**Node.js 调用示例：**
+```typescript
+// 推荐：行格式（Array of Objects）
+const data = [
+  { sex: 1, age: 25, bmi: 22.5 },
+  { sex: 2, age: 30, bmi: 24.1 },
+  // ...
+];
+
+const response = await axios.post('http://localhost:8082/api/v1/skills/ST_T_TEST_IND', {
+  data_source: {
+    type: 'inline',
+    data: data  // 直接传入数组
+  },
+  params: {
+    group_var: 'sex',
+    value_var: 'age'
+  }
+});
+```
+
 ### 2.3 安全设计

 | 安全措施 | 实现方式 |
@@ -241,8 +273,6 @@ ssa-r-statistics   1.0.1     xxxxxxxxxxxx   x minutes ago   1.81GB

 ```yaml
 # r-statistics-service/docker-compose.yml
-version: '3.8'
-
 services:
  ssa-r-service:
    build: .
@@ -253,12 +283,13 @@ services:
      - DEV_MODE=true
    volumes:
      # 开发环境挂载：支持热重载
+      - ./plumber.R:/app/plumber.R  # ⚠️ 重要：API 入口也需要挂载
      - ./tools:/app/tools
      - ./utils:/app/utils
      - ./tests:/app/tests
    restart: unless-stopped
    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]  # 容器内部仍是8080
+      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
@@ -271,6 +302,19 @@ cd r-statistics-service
 docker-compose up -d
 ```

+#### 4.1.1 热重载机制详解
+
+| 文件类型 | 热重载支持 | 说明 |
+|----------|-----------|------|
+| `tools/*.R` | ✅ 自动 | DEV_MODE=true 时每次请求重新加载 |
+| `utils/*.R` | ⚠️ 需重启 | 服务启动时加载，修改后需 `docker-compose restart` |
+| `plumber.R` | ⚠️ 需重启 | API 路由定义，修改后需 `docker-compose restart` |
+
+**最佳实践：**
+- 开发新工具时，只需修改 `tools/` 目录，无需重启
+- 修改 `utils/` 或 `plumber.R` 后，执行 `docker-compose restart`
+- 添加新的 API 端点后，需要 `docker-compose up -d --force-recreate`
+
 ### 4.2 生产环境 (SAE)

 ```yaml
@@ -335,11 +379,31 @@ GET /api/v1/tools
 ```json
 {
  "status": "ok",
-  "tools": ["t_test_ind", "anova_oneway"],
-  "count": 2
+  "tools": [
+    "chi_square",
+    "correlation", 
+    "descriptive",
+    "logistic_binary",
+    "mann_whitney",
+    "t_test_ind",
+    "t_test_paired"
+  ],
+  "count": 7
 }
 ```

+#### 已实现的统计工具（Phase 2A）
+
+| tool_code | 名称 | 场景 |
+|-----------|------|------|
+| `ST_T_TEST_IND` | 独立样本 T 检验 | 两组连续变量比较（正态） |
+| `ST_MANN_WHITNEY` | Mann-Whitney U | 两组连续变量比较（非参数） |
+| `ST_T_TEST_PAIRED` | 配对 T 检验 | 前后对比 |
+| `ST_CHI_SQUARE` | 卡方检验 | 分类变量关联 |
+| `ST_CORRELATION` | 相关分析 | Pearson/Spearman 相关 |
+| `ST_LOGISTIC_BINARY` | 二元 Logistic 回归 | 多因素分析 |
+| `ST_DESCRIPTIVE` | 描述性统计 | 基线表、数据概况 |
+
 ### 5.3 执行技能

 ```http
@@ -394,6 +458,59 @@ Content-Type: application/json
 }
 ```

+### 5.4 JIT 护栏检查（Phase 2A 新增）
+
+在执行核心统计工具前，调用此端点检验统计假设（正态性、方差齐性等）。
+
+```http
+POST /api/v1/guardrails/jit
+Content-Type: application/json
+```
+
+**请求体：**
+```json
+{
+  "data_source": {
+    "type": "inline",
+    "data": [...]
+  },
+  "tool_code": "ST_T_TEST_IND",
+  "params": {
+    "group_var": "sex",
+    "value_var": "age"
+  }
+}
+```
+
+**响应：**
+```json
+{
+  "status": "success",
+  "checks": [
+    {
+      "check_name": "正态性检验 (组: 1)",
+      "passed": true,
+      "p_value": 0.234,
+      "recommendation": "满足正态性"
+    },
+    {
+      "check_name": "方差齐性检验 (Levene)",
+      "passed": false,
+      "p_value": 0.012,
+      "recommendation": "建议使用 Welch 校正"
+    }
+  ],
+  "suggested_tool": "ST_MANN_WHITNEY",
+  "can_proceed": true,
+  "all_checks_passed": false
+}
+```
+
+**使用场景：**
+- 工作流执行器在调用核心统计方法前，先调用 JIT 护栏
+- 根据 `suggested_tool` 自动切换到更合适的方法
+- 将 `checks` 结果展示给用户
+
 ---

 ## 6. 开发指南
@@ -408,37 +525,106 @@ Content-Type: application/json
 #' @tool_code ST_MY_ANALYSIS
 #' @name 我的分析工具
 #' @version 1.0.0
+#' @description 工具描述
+#' @author SSA-Pro Team
+
+library(glue)
+library(ggplot2)
+library(base64enc)

-# 统一入口函数
 run_analysis <- function(input) {
-  # 加载数据
-  df <- load_input_data(input)
+  # ===== 初始化日志 =====
+  logs <- c()
+  log_add <- function(msg) { logs <<- c(logs, paste0("[", Sys.time(), "] ", msg)) }
  
-  # 参数
+  # ===== 数据加载 =====
+  log_add("开始加载输入数据")
+  df <- tryCatch(
+    load_input_data(input),
+    error = function(e) {
+      log_add(paste("数据加载失败:", e$message))
+      return(NULL)
+    }
+  )
+  
+  if (is.null(df)) {
+    return(make_error(ERROR_CODES$E100_INTERNAL_ERROR, details = "数据加载失败"))
+  }
+  log_add(glue("数据加载成功: {nrow(df)} 行, {ncol(df)} 列"))
+  
+  # ===== 参数提取 =====
  p <- input$params
+  my_var <- p$my_var
  
-  # 护栏检查
-  # ...
+  # ===== 参数校验 =====
+  if (!(my_var %in% names(df))) {
+    return(make_error(ERROR_CODES$E001_COLUMN_NOT_FOUND, col = my_var))
+  }
  
-  # 核心计算
-  # ...
+  # ===== 护栏检查 =====
+  guardrail_results <- list()
+  warnings_list <- c()
+  
+  sample_check <- check_sample_size(nrow(df), min_required = 10, action = ACTION_WARN)
+  guardrail_results <- c(guardrail_results, list(sample_check))
+  
+  guardrail_status <- run_guardrail_chain(guardrail_results)
+  if (guardrail_status$status == "blocked") {
+    return(list(status = "blocked", message = guardrail_status$reason, trace_log = logs))
+  }
+  
+  # ===== 核心计算 =====
+  log_add("执行分析...")
+  # result <- your_analysis_function(df, ...)
+  
+  # ===== 生成图表 =====
+  plot_base64 <- tryCatch({
+    p <- ggplot(df, aes(x = df[[my_var]])) + geom_histogram() + theme_minimal()
+    tmp_file <- tempfile(fileext = ".png")
+    ggsave(tmp_file, p, width = 7, height = 5, dpi = 100)
+    base64_str <- base64encode(tmp_file)
+    unlink(tmp_file)
+    paste0("data:image/png;base64,", base64_str)
+  }, error = function(e) NULL)
+  
+  # ===== 生成可复现代码 =====
+  reproducible_code <- glue('
+# SSA-Pro 自动生成代码
+# 工具: 我的分析工具
+# 时间: {Sys.time()}
+# ================================
+
+df <- read.csv("data.csv")
+# 你的分析代码...
+')
+  
+  # ===== 返回结果 =====
+  log_add("分析完成")
  
-  # 返回结果
  return(list(
    status = "success",
    message = "分析完成",
-    results = list(...)
+    warnings = if (length(warnings_list) > 0) warnings_list else NULL,
+    results = list(
+      # 统计结果（使用 jsonlite::unbox 保证单值不被包装成数组）
+      statistic = jsonlite::unbox(1.234),
+      p_value = jsonlite::unbox(0.05),
+      p_value_fmt = format_p_value(0.05)
+    ),
+    plots = if (!is.null(plot_base64)) list(plot_base64) else list(),
+    trace_log = logs,
+    reproducible_code = as.character(reproducible_code)
  ))
 }
 ```

-2. 重启服务（开发模式无需重启）
+2. **开发模式**：修改 `tools/` 下的文件后，无需重启，下次请求自动加载

 3. 测试：
 ```bash
 curl -X POST http://localhost:8082/api/v1/skills/ST_MY_ANALYSIS \
  -H "Content-Type: application/json" \
-  -d '{"data_source": {...}, "params": {...}}'
+  -d '{"data_source": {"type": "inline", "data": [{"x": 1}, {"x": 2}]}, "params": {"my_var": "x"}}'
 ```

 ### 6.2 工具命名规范
@@ -550,6 +736,122 @@ volumes:
 2. 健康检查是否通过
 3. 查看容器日志

+### Q6: 数据加载失败（inline 模式）
+
+**错误：** `内部错误: 数据加载失败`
+
+**原因：** 数据格式不正确，或数据为空
+
+**解决：**
+1. 确保 `data_source.data` 是有效的 JSON 数组
+2. 行格式：`[{"col1": val1}, {"col1": val2}]`
+3. 检查是否有空数据或全 NA 列
+
+### Q7: R 脚本语法错误
+
+**错误：** `unexpected symbol` 或 `lexical error`
+
+**常见原因：**
+1. `glue()` 字符串中使用 `\'` 转义（应直接使用 `'`）
+2. 中文注释编码问题
+3. 代码块中的花括号不匹配
+
+**解决：**
+```r
+# 错误：glue 中的转义
+glue("# Cramer\'s V = ...")  # ❌
+
+# 正确：直接使用单引号或避免
+glue("# Cramer V = ...")     # ✅
+```
+
+### Q8: JSON 序列化失败
+
+**错误：** `No method asJSON S3 class: table`
+
+**原因：** R 的 `table` 对象无法直接序列化为 JSON
+
+**解决：**
+```r
+# 错误
+observed = as.matrix(contingency_table)  # ❌ 可能保留 table 属性
+
+# 正确：显式转换为纯数值矩阵
+observed = matrix(
+  as.numeric(contingency_table),
+  nrow = nrow(contingency_table),
+  ncol = ncol(contingency_table)
+)  # ✅
+```
+
+### Q9: 新端点返回 404
+
+**原因：** 修改 `plumber.R` 后未重启服务
+
+**解决：**
+```bash
+# 修改 plumber.R 后必须重启
+docker-compose restart
+
+# 如果修改了 docker-compose.yml（如添加新 volume）
+docker-compose up -d --force-recreate
+```
+
+### Q10: 变量类型判断错误（missing value where TRUE/FALSE needed）
+
+**原因：** 对包含 NA 的数据进行布尔比较
+
+**解决：**
+```r
+# 错误
+if (var_type == "numeric") { ... }  # var_type 可能是 NA
+
+# 正确
+if (identical(var_type, "numeric")) { ... }  # ✅ 处理 NA
+```
+
+---
+
+## 9. 测试指南
+
+### 9.1 单工具测试
+
+```bash
+# 测试 T 检验
+curl -s -X POST "http://localhost:8082/api/v1/skills/ST_T_TEST_IND" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "data_source": {
+      "type": "inline",
+      "data": [
+        {"group": "A", "value": 23}, {"group": "A", "value": 25},
+        {"group": "B", "value": 30}, {"group": "B", "value": 32}
+      ]
+    },
+    "params": {"group_var": "group", "value_var": "value"}
+  }'
+```
+
+### 9.2 健康检查
+
+```bash
+curl -s http://localhost:8082/health | jq
+```
+
+### 9.3 端到端测试脚本
+
+项目提供了完整的端到端测试脚本：
+
+```bash
+cd docs/03-业务模块/SSA-智能统计分析/05-测试文档
+node run_e2e_test.js
+```
+
+测试覆盖：
+- 7 个统计工具
+- JIT 护栏检查
+- 数据加载（行格式/列格式）
+
 ---

 ## 附录：文件结构
@@ -557,17 +859,23 @@ volumes:
 ```
 r-statistics-service/
 ├── Dockerfile              # 生产镜像定义
-├── docker-compose.yml      # 开发环境编排
+├── docker-compose.yml      # 开发环境编排（含 volume 挂载）
 ├── renv.lock               # R 包版本锁定（备用）
 ├── .Rprofile               # R 启动配置（备用）
-├── plumber.R               # API 入口
+├── plumber.R               # API 入口（含 JIT 护栏端点）
 ├── utils/
-│   ├── data_loader.R       # 数据加载（预签名 URL）
-│   ├── guardrails.R        # 统计护栏
+│   ├── data_loader.R       # 数据加载（支持行格式/列格式）
+│   ├── guardrails.R        # 统计护栏 + JIT 检查
 │   ├── error_codes.R       # 错误映射
 │   └── result_formatter.R  # 结果格式化
-├── tools/
-│   └── t_test_ind.R        # 独立样本 T 检验
+├── tools/                  # 统计工具（Phase 2A: 7 个）
+│   ├── t_test_ind.R        # 独立样本 T 检验
+│   ├── t_test_paired.R     # 配对 T 检验
+│   ├── mann_whitney.R      # Mann-Whitney U 检验
+│   ├── chi_square.R        # 卡方检验
+│   ├── correlation.R       # 相关分析
+│   ├── logistic_binary.R   # 二元 Logistic 回归
+│   └── descriptive.R       # 描述性统计
 ├── tests/
 │   └── fixtures/
 │       └── normal_data.csv # 测试数据
@@ -577,4 +885,13 @@ r-statistics-service/

 ---

+## 更新日志
+
+| 版本 | 日期 | 更新内容 |
+|------|------|----------|
+| v1.1 | 2026-02-20 | Phase 2A 完成：7 个统计工具、JIT 护栏、热重载说明、常见问题补充 |
+| v1.0 | 2026-02-19 | 初始版本：架构设计、部署指南、T 检验工具 |
+
+---
+
 **文档结束**