main:添加核心文件并初始化项目

新增内容： - 创建基础项目结构。 - 添加 `.gitignore` 和 `.dockerignore` 文件。 - 编写 `pyproject.toml` 和依赖文件。 - 添加算法模块及示例算法。 - 实现核心功能模块（日志、错误处理、指标）。 - 添加开发和运行所需的相关脚本文件及文档。
2026-02-02 10:46:01 +08:00
commit 31af5e2286
54 changed files with 5726 additions and 0 deletions
--- a/docs/grafana-dashboard-guide.md
+++ b/docs/grafana-dashboard-guide.md
@@ -0,0 +1,237 @@
+# Grafana Dashboard 导入和使用指南
+
+## Dashboard 概述
+
+新的 dashboard 包含 10 个面板，全面展示应用的监控指标：
+
+### 第一行：核心性能指标
+1. **HTTP 请求速率 (QPS)** - 每秒请求数，按端点和方法分组
+2. **HTTP 请求延迟 (P50/P95/P99)** - 请求响应时间的百分位数
+
+### 第二行：关键指标
+3. **请求成功率** - 成功请求占比（仪表盘）
+4. **当前并发请求数** - 实时并发数（仪表盘）
+5. **HTTP 请求总数** - 累计请求数（统计卡片）
+6. **算法执行总数** - 累计算法调用数（统计卡片）
+
+### 第三行：算法性能
+7. **算法执行速率** - 每秒算法执行次数
+8. **算法执行延迟 (P50/P95/P99)** - 算法执行时间的百分位数
+
+### 第四行：分布分析
+9. **请求分布（按端点）** - 饼图展示各端点的请求占比
+10. **请求状态分布** - 饼图展示成功/失败请求占比
+
+## 导入步骤
+
+### 1. 配置 Prometheus 数据源
+
+首先确保 Prometheus 数据源已正确配置：
+
+1. 打开 Grafana：http://localhost:3000
+2. 登录（默认：admin/admin）
+3. 进入 **Configuration** → **Data Sources**
+4. 点击 **Add data source**
+5. 选择 **Prometheus**
+6. 配置：
+   - **Name**: `Prometheus`（必须是这个名称）
+   - **URL**: `http://prometheus:9090`（注意：使用服务名，不是 localhost）
+   - **Access**: Server (default)
+7. 点击 **Save & Test**，确保显示绿色的成功提示
+
+### 2. 导入 Dashboard
+
+有两种方式导入 dashboard：
+
+#### 方式 1：通过 JSON 文件导入（推荐）
+
+1. 在 Grafana 左侧菜单，点击 **Dashboards** → **Import**
+2. 点击 **Upload JSON file**
+3. 选择文件：`monitoring/grafana/dashboard.json`
+4. 在导入页面：
+   - **Name**: FunctionalScaffold 监控仪表板
+   - **Folder**: General（或创建新文件夹）
+   - **Prometheus**: 选择刚才配置的 Prometheus 数据源
+5. 点击 **Import**
+
+#### 方式 2：通过 JSON 内容导入
+
+1. 在 Grafana 左侧菜单，点击 **Dashboards** → **Import**
+2. 复制 `monitoring/grafana/dashboard.json` 的全部内容
+3. 粘贴到 **Import via panel json** 文本框
+4. 点击 **Load**
+5. 配置数据源并点击 **Import**
+
+### 3. 验证 Dashboard
+
+导入成功后，你应该看到：
+
+- ✅ 所有面板都正常显示
+- ✅ 有数据的面板显示图表和数值
+- ✅ 右上角显示自动刷新（5秒）
+- ✅ 时间范围默认为最近 1 小时
+
+## 生成测试数据
+
+如果 dashboard 中没有数据或数据很少，运行流量生成脚本：
+
+```bash
+# 启动流量生成器
+./scripts/generate_traffic.sh
+```
+
+这会持续发送请求到应用，生成监控数据。等待 1-2 分钟后，dashboard 中应该会显示丰富的图表。
+
+## Dashboard 功能
+
+### 自动刷新
+
+Dashboard 配置了自动刷新，默认每 5 秒更新一次。你可以在右上角修改刷新间隔：
+- 5s（默认）
+- 10s
+- 30s
+- 1m
+- 5m
+
+### 时间范围
+
+默认显示最近 1 小时的数据。你可以在右上角修改时间范围：
+- Last 5 minutes
+- Last 15 minutes
+- Last 30 minutes
+- Last 1 hour（默认）
+- Last 3 hours
+- Last 6 hours
+- Last 12 hours
+- Last 24 hours
+- 或自定义时间范围
+
+### 实时模式
+
+Dashboard 启用了 **Live** 模式（右上角的 Live 按钮），可以实时查看最新数据。
+
+### 交互功能
+
+- **缩放**：在时间序列图表上拖动选择区域可以放大
+- **图例点击**：点击图例可以隐藏/显示对应的数据系列
+- **Tooltip**：鼠标悬停在图表上查看详细数值
+- **面板全屏**：点击面板标题旁的图标可以全屏查看
+
+## 常见问题
+
+### 问题 1：数据源连接失败
+
+**错误信息**：`dial tcp [::1]:9090: connect: connection refused`
+
+**解决方案**：
+- 确保 Prometheus URL 使用 `http://prometheus:9090`（服务名）
+- 不要使用 `http://localhost:9090`（在容器内部无法访问）
+
+### 问题 2：面板显示 "No data"
+
+**可能原因**：
+1. 应用还没有收到任何请求
+2. Prometheus 还没有抓取到数据
+3. 时间范围选择不当
+
+**解决方案**：
+1. 发送一些测试请求：
+   ```bash
+   curl -X POST http://localhost:8111/invoke \
+     -H "Content-Type: application/json" \
+     -d '{"number": 17}'
+   ```
+2. 等待 15-30 秒让 Prometheus 抓取数据
+3. 调整时间范围为 "Last 5 minutes"
+4. 运行流量生成脚本：`./scripts/generate_traffic.sh`
+
+### 问题 3：延迟图表显示 "NaN" 或空值
+
+**原因**：直方图数据不足，无法计算百分位数
+
+**解决方案**：
+- 发送更多请求以积累足够的数据
+- 等待几分钟让数据积累
+- 使用流量生成脚本持续发送请求
+
+### 问题 4：数据源变量未正确设置
+
+**错误信息**：面板显示 "Datasource not found"
+
+**解决方案**：
+1. 确保 Prometheus 数据源的名称是 `Prometheus`
+2. 或者在 dashboard 设置中重新选择数据源：
+   - 点击右上角的齿轮图标（Dashboard settings）
+   - 进入 **Variables** 标签
+   - 编辑 `DS_PROMETHEUS` 变量
+   - 选择正确的 Prometheus 数据源
+
+## PromQL 查询说明
+
+Dashboard 使用的主要 PromQL 查询：
+
+### HTTP 请求速率
+```promql
+sum(rate(http_requests_total[1m])) by (endpoint, method)
+```
+
+### HTTP 请求延迟 P95
+```promql
+histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[1m])) by (le, endpoint, method))
+```
+
+### 请求成功率
+```promql
+sum(rate(http_requests_total{status="success"}[5m])) / sum(rate(http_requests_total[5m]))
+```
+
+### 算法执行速率
+```promql
+sum(rate(algorithm_executions_total[1m])) by (algorithm, status)
+```
+
+## 自定义 Dashboard
+
+你可以根据需要自定义 dashboard：
+
+1. **添加新面板**：点击右上角的 "Add panel" 按钮
+2. **编辑面板**：点击面板标题 → Edit
+3. **调整布局**：拖动面板调整位置和大小
+4. **保存更改**：点击右上角的保存图标
+
+## 导出和分享
+
+### 导出 Dashboard
+
+1. 点击右上角的分享图标
+2. 选择 **Export** 标签
+3. 点击 **Save to file** 下载 JSON 文件
+
+### 分享 Dashboard
+
+1. 点击右上角的分享图标
+2. 选择 **Link** 标签
+3. 复制链接分享给团队成员
+
+## 告警配置（可选）
+
+你可以为面板配置告警规则：
+
+1. 编辑面板
+2. 切换到 **Alert** 标签
+3. 点击 **Create alert rule from this panel**
+4. 配置告警条件和通知渠道
+
+## 相关资源
+
+- Grafana 官方文档：https://grafana.com/docs/
+- Prometheus 查询语言：https://prometheus.io/docs/prometheus/latest/querying/basics/
+- Dashboard 最佳实践：https://grafana.com/docs/grafana/latest/best-practices/
+
+## 技术支持
+
+如果遇到问题：
+1. 检查 Prometheus 是否正常运行：http://localhost:9090
+2. 检查应用 metrics 端点：http://localhost:8111/metrics
+3. 查看 Grafana 日志：`docker-compose logs grafana`
+4. 查看 Prometheus 日志：`docker-compose logs prometheus`
--- a/docs/metrics-guide.md
+++ b/docs/metrics-guide.md
@@ -0,0 +1,346 @@
+# 指标记录方案对比与使用指南
+
+## 问题背景
+
+在多实例部署场景下（Kubernetes、Serverless），原有的内存指标存储方案存在以下问题：
+
+1. **指标分散**：每个实例独立记录指标，无法聚合
+2. **数据丢失**：实例销毁后指标丢失
+3. **统计不准**：无法获得全局准确的指标视图
+
+## 解决方案对比
+
+### 方案1：Pushgateway（推荐）
+
+**原理：** 应用主动推送指标到 Pushgateway，Prometheus 从 Pushgateway 抓取
+
+**优点：**
+- ✅ Prometheus 官方支持，生态成熟
+- ✅ 实现简单，代码改动小
+- ✅ 适合短生命周期任务（Serverless、批处理）
+- ✅ 支持持久化，重启不丢失数据
+
+**缺点：**
+- ⚠️ 单点故障风险（可通过高可用部署解决）
+- ⚠️ 不适合超高频推送（每秒数千次）
+
+**适用场景：**
+- Serverless 函数
+- 批处理任务
+- 短生命周期容器
+- 实例数量动态变化的场景
+
+### 方案2：Redis + 自定义 Exporter
+
+**原理：** 应用将指标写入 Redis，自定义 Exporter 从 Redis 读取并转换为 Prometheus 格式
+
+**优点：**
+- ✅ 灵活可控，支持复杂聚合逻辑
+- ✅ Redis 高性能，支持高并发写入
+- ✅ 可以实现自定义的指标计算
+
+**缺点：**
+- ⚠️ 需要自己实现 Exporter，维护成本高
+- ⚠️ 增加了系统复杂度
+- ⚠️ Redis 需要额外的运维成本
+
+**适用场景：**
+- 需要自定义指标聚合逻辑
+- 超高频指标写入（每秒数万次）
+- 需要实时查询指标数据
+
+### 方案3：标准 Prometheus Pull 模式（不推荐）
+
+**原理：** Prometheus 从每个实例抓取指标，在查询时聚合
+
+**优点：**
+- ✅ Prometheus 标准做法
+- ✅ 无需额外组件
+
+**缺点：**
+- ❌ 需要服务发现机制（Kubernetes Service Discovery）
+- ❌ 短生命周期实例可能来不及抓取
+- ❌ 实例销毁后数据丢失
+
+**适用场景：**
+- 长生命周期服务
+- 实例数量相对固定
+- 有完善的服务发现机制
+
+## 使用指南
+
+### 方案1：Pushgateway（推荐）
+
+#### 1. 启动服务
+
+```bash
+cd deployment
+docker-compose up -d redis pushgateway prometheus grafana
+```
+
+#### 2. 修改代码
+
+在 `src/functional_scaffold/api/routes.py` 中：
+
+```python
+# 替换导入
+from functional_scaffold.core.metrics_pushgateway import (
+    track_request,
+    track_algorithm_execution,
+)
+
+# 使用方式不变
+@router.post("/invoke")
+@track_request("POST", "/invoke")
+async def invoke_algorithm(request: InvokeRequest):
+    # ... 业务逻辑
+```
+
+#### 3. 配置环境变量
+
+在 `.env` 文件中：
+
+```bash
+PUSHGATEWAY_URL=localhost:9091
+METRICS_JOB_NAME=functional_scaffold
+INSTANCE_ID=instance-1  # 可选，默认使用 HOSTNAME
+```
+
+#### 4. 验证
+
+```bash
+# 查看 Pushgateway 指标
+curl http://localhost:9091/metrics
+
+# 查看 Prometheus
+open http://localhost:9090
+
+# 查询示例
+http_requests_total{job="functional_scaffold"}
+```
+
+### 方案2：Redis + Exporter
+
+#### 1. 启动服务
+
+```bash
+cd deployment
+docker-compose up -d redis redis-exporter prometheus grafana
+```
+
+#### 2. 修改代码
+
+在 `src/functional_scaffold/api/routes.py` 中：
+
+```python
+# 替换导入
+from functional_scaffold.core.metrics_redis import (
+    track_request,
+    track_algorithm_execution,
+)
+
+# 使用方式不变
+@router.post("/invoke")
+@track_request("POST", "/invoke")
+async def invoke_algorithm(request: InvokeRequest):
+    # ... 业务逻辑
+```
+
+#### 3. 配置环境变量
+
+在 `.env` 文件中：
+
+```bash
+REDIS_HOST=localhost
+REDIS_PORT=6379
+REDIS_METRICS_DB=0
+REDIS_PASSWORD=  # 可选
+INSTANCE_ID=instance-1  # 可选
+```
+
+#### 4. 安装 Redis 依赖
+
+```bash
+pip install redis
+```
+
+或在 `requirements.txt` 中添加：
+
+```
+redis>=5.0.0
+```
+
+#### 5. 验证
+
+```bash
+# 查看 Redis 中的指标
+redis-cli
+> HGETALL metrics:request_counter
+
+# 查看 Exporter 输出
+curl http://localhost:8001/metrics
+
+# 查看 Prometheus
+open http://localhost:9090
+```
+
+## 性能对比
+
+| 指标 | Pushgateway | Redis + Exporter | 标准 Pull |
+|------|-------------|------------------|-----------|
+| 写入延迟 | ~5ms | ~1ms | N/A |
+| 查询延迟 | ~10ms | ~20ms | ~5ms |
+| 吞吐量 | ~1000 req/s | ~10000 req/s | ~500 req/s |
+| 内存占用 | 低 | 中 | 低 |
+| 复杂度 | 低 | 高 | 低 |
+
+## 迁移步骤
+
+### 从原有方案迁移到 Pushgateway
+
+1. **安装依赖**（如果需要）：
+   ```bash
+   pip install prometheus-client
+   ```
+
+2. **替换导入**：
+   ```python
+   # 旧代码
+   from functional_scaffold.core.metrics import track_request
+
+   # 新代码
+   from functional_scaffold.core.metrics_pushgateway import track_request
+   ```
+
+3. **配置环境变量**：
+   ```bash
+   export PUSHGATEWAY_URL=localhost:9091
+   ```
+
+4. **启动 Pushgateway**：
+   ```bash
+   docker-compose up -d pushgateway
+   ```
+
+5. **更新 Prometheus 配置**（已包含在 `monitoring/prometheus.yml`）
+
+6. **测试验证**：
+   ```bash
+   # 发送请求
+   curl -X POST http://localhost:8000/invoke -d '{"number": 17}'
+
+   # 查看指标
+   curl http://localhost:9091/metrics | grep http_requests_total
+   ```
+
+### 从原有方案迁移到 Redis
+
+1. **安装依赖**：
+   ```bash
+   pip install redis
+   ```
+
+2. **替换导入**：
+   ```python
+   # 旧代码
+   from functional_scaffold.core.metrics import track_request
+
+   # 新代码
+   from functional_scaffold.core.metrics_redis import track_request
+   ```
+
+3. **配置环境变量**：
+   ```bash
+   export REDIS_HOST=localhost
+   export REDIS_PORT=6379
+   ```
+
+4. **启动 Redis 和 Exporter**：
+   ```bash
+   docker-compose up -d redis redis-exporter
+   ```
+
+5. **测试验证**：
+   ```bash
+   # 发送请求
+   curl -X POST http://localhost:8000/invoke -d '{"number": 17}'
+
+   # 查看 Redis
+   redis-cli HGETALL metrics:request_counter
+
+   # 查看 Exporter
+   curl http://localhost:8001/metrics
+   ```
+
+## 常见问题
+
+### Q1: Pushgateway 会成为单点故障吗？
+
+A: 可以通过以下方式解决：
+- 部署多个 Pushgateway 实例（负载均衡）
+- 使用持久化存储（已配置）
+- 推送失败时降级到本地日志
+
+### Q2: Redis 方案的性能如何？
+
+A: Redis 单实例可以支持 10万+ QPS，对于大多数场景足够。如果需要更高性能，可以：
+- 使用 Redis Cluster
+- 批量写入（减少网络往返）
+- 使用 Pipeline
+
+### Q3: 如何在 Kubernetes 中使用？
+
+A:
+- **Pushgateway**: 部署为 Service，应用通过 Service 名称访问
+- **Redis**: 使用 StatefulSet 或托管 Redis 服务
+
+### Q4: 指标数据会丢失吗？
+
+A:
+- **Pushgateway**: 支持持久化，重启不丢失
+- **Redis**: 配置了 AOF 持久化，重启不丢失
+- **标准 Pull**: 实例销毁后丢失
+
+### Q5: 如何选择方案？
+
+建议：
+- **Serverless/短生命周期** → Pushgateway
+- **超高并发/自定义逻辑** → Redis
+- **长生命周期/K8s** → 标准 Pull（需配置服务发现）
+
+## 监控和告警
+
+### Grafana 仪表板
+
+访问 http://localhost:3000（admin/admin）
+
+已预配置的面板：
+- HTTP 请求总数
+- HTTP 请求延迟（P50/P95/P99）
+- 算法执行次数
+- 算法执行延迟
+- 错误率
+
+### 告警规则
+
+在 `monitoring/alerts/rules.yaml` 中配置：
+
+```yaml
+groups:
+  - name: functional_scaffold
+    rules:
+      - alert: HighErrorRate
+        expr: rate(http_requests_total{status="error"}[5m]) > 0.05
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "高错误率告警"
+          description: "错误率超过 5%"
+```
+
+## 参考资料
+
+- [Prometheus Pushgateway 文档](https://github.com/prometheus/pushgateway)
+- [Prometheus 最佳实践](https://prometheus.io/docs/practices/)
+- [Redis 官方文档](https://redis.io/documentation)
--- a/docs/metrics-improvement-summary.md
+++ b/docs/metrics-improvement-summary.md
@@ -0,0 +1,227 @@
+# Prometheus 指标记录问题修复总结
+
+## 问题描述
+
+Prometheus 中没有正常记录应用的访问数据。虽然 `/metrics` 端点可以访问，并且定义了所有指标类型，但这些指标都没有任何数据值。
+
+## 根本原因
+
+1. **HTTP 请求指标未记录**：`api/routes.py` 中的路由处理函数没有使用 `@track_request` 装饰器来记录 HTTP 请求指标
+2. **算法执行指标未记录**：`algorithms/base.py` 中的 `execute()` 方法没有调用 metrics 模块来记录算法执行指标
+
+## 解决方案
+
+### 1. 添加 HTTP 请求指标跟踪中间件
+
+**文件**：`src/functional_scaffold/main.py`
+
+**修改内容**：
+- 导入 metrics 相关的对象：`request_counter`, `request_latency`, `in_progress_requests`
+- 添加 `track_metrics` 中间件，自动跟踪所有 HTTP 请求
+
+**优点**：
+- 自动化：不需要在每个路由上手动添加装饰器
+- 统一：所有端点的指标记录逻辑一致
+- 易维护：新增端点自动获得指标跟踪能力
+
+**实现代码**：
+```python
+@app.middleware("http")
+async def track_metrics(request: Request, call_next):
+    """记录所有HTTP请求的指标"""
+    if not settings.metrics_enabled:
+        return await call_next(request)
+
+    # 跳过 /metrics 端点本身，避免循环记录
+    if request.url.path == "/metrics":
+        return await call_next(request)
+
+    in_progress_requests.inc()
+    start_time = time.time()
+    status = "success"
+
+    try:
+        response = await call_next(request)
+        if response.status_code >= 400:
+            status = "error"
+        return response
+    except Exception as e:
+        status = "error"
+        raise e
+    finally:
+        elapsed = time.time() - start_time
+        request_counter.labels(
+            method=request.method,
+            endpoint=request.url.path,
+            status=status
+        ).inc()
+        request_latency.labels(
+            method=request.method,
+            endpoint=request.url.path
+        ).observe(elapsed)
+        in_progress_requests.dec()
+```
+
+### 2. 添加算法执行指标记录
+
+**文件**：`src/functional_scaffold/algorithms/base.py`
+
+**修改内容**：
+- 在 `execute()` 方法中导入 `algorithm_counter` 和 `algorithm_latency`
+- 在 `finally` 块中记录算法执行指标
+
+**实现代码**：
+```python
+def execute(self, *args, **kwargs) -> Dict[str, Any]:
+    from ..core.metrics import algorithm_counter, algorithm_latency
+
+    start_time = time.time()
+    status = "success"
+
+    try:
+        # ... 算法执行逻辑 ...
+    except Exception as e:
+        status = "error"
+        # ... 错误处理 ...
+    finally:
+        elapsed_time = time.time() - start_time
+        algorithm_counter.labels(algorithm=self.name, status=status).inc()
+        algorithm_latency.labels(algorithm=self.name).observe(elapsed_time)
+```
+
+## 验证结果
+
+### 1. 应用 /metrics 端点
+
+修复后，`/metrics` 端点正常返回指标数据：
+
+```
+# HTTP 请求指标
+http_requests_total{endpoint="/healthz",method="GET",status="success"} 3.0
+http_requests_total{endpoint="/invoke",method="POST",status="success"} 2.0
+http_requests_total{endpoint="/readyz",method="GET",status="success"} 1.0
+
+# HTTP 请求延迟
+http_request_duration_seconds_sum{endpoint="/invoke",method="POST"} 0.0065615177154541016
+http_request_duration_seconds_count{endpoint="/invoke",method="POST"} 2.0
+
+# 算法执行指标
+algorithm_executions_total{algorithm="PrimeChecker",status="success"} 2.0
+algorithm_execution_duration_seconds_sum{algorithm="PrimeChecker"} 0.00023603439331054688
+algorithm_execution_duration_seconds_count{algorithm="PrimeChecker"} 2.0
+
+# 当前进行中的请求
+http_requests_in_progress 0.0
+```
+
+### 2. Prometheus 查询
+
+Prometheus 成功抓取并存储了指标数据：
+
+```bash
+# 查询 HTTP 请求总数
+curl 'http://localhost:9090/api/v1/query?query=http_requests_total'
+
+# 查询算法执行总数
+curl 'http://localhost:9090/api/v1/query?query=algorithm_executions_total'
+```
+
+## 可用指标
+
+修复后，以下指标可以在 Prometheus 和 Grafana 中使用：
+
+### HTTP 请求指标
+
+1. **http_requests_total** (Counter)
+   - 标签：`method`, `endpoint`, `status`
+   - 描述：HTTP 请求总数
+   - 用途：统计各端点的请求量、成功率
+
+2. **http_request_duration_seconds** (Histogram)
+   - 标签：`method`, `endpoint`
+   - 描述：HTTP 请求延迟分布
+   - 用途：分析请求响应时间、P50/P95/P99 延迟
+
+3. **http_requests_in_progress** (Gauge)
+   - 描述：当前正在处理的请求数
+   - 用途：监控并发请求数、负载情况
+
+### 算法执行指标
+
+1. **algorithm_executions_total** (Counter)
+   - 标签：`algorithm`, `status`
+   - 描述：算法执行总数
+   - 用途：统计算法调用量、成功率
+
+2. **algorithm_execution_duration_seconds** (Histogram)
+   - 标签：`algorithm`
+   - 描述：算法执行延迟分布
+   - 用途：分析算法性能、优化瓶颈
+
+## 使用示例
+
+### Prometheus 查询示例
+
+```promql
+# 每秒请求数 (QPS)
+rate(http_requests_total[5m])
+
+# 请求成功率
+sum(rate(http_requests_total{status="success"}[5m])) / sum(rate(http_requests_total[5m]))
+
+# P95 延迟
+histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
+
+# 算法执行失败率
+sum(rate(algorithm_executions_total{status="error"}[5m])) / sum(rate(algorithm_executions_total[5m]))
+```
+
+### 生成测试流量
+
+使用提供的脚本生成测试流量：
+
+```bash
+# 启动流量生成器
+./scripts/generate_traffic.sh
+
+# 在另一个终端查看实时指标
+watch -n 1 'curl -s http://localhost:8111/metrics | grep http_requests_total'
+```
+
+## Grafana 仪表板
+
+访问 Grafana 查看可视化指标：
+
+1. 打开浏览器访问：http://localhost:3000
+2. 登录（默认用户名/密码：admin/admin）
+3. 导入仪表板：`monitoring/grafana/dashboard.json`
+
+仪表板包含以下面板：
+- 请求速率（QPS）
+- 请求延迟（P50/P95/P99）
+- 错误率
+- 算法执行统计
+- 并发请求数
+
+## 注意事项
+
+1. **中间件顺序**：指标跟踪中间件应该在日志中间件之后注册，确保所有请求都被记录
+2. **/metrics 端点**：中间件会跳过 `/metrics` 端点本身，避免循环记录
+3. **错误状态**：HTTP 状态码 >= 400 会被标记为 `status="error"`
+4. **性能影响**：指标记录的性能开销极小（微秒级），不会影响应用性能
+
+## 后续优化建议
+
+1. **添加更多维度**：可以添加 `user_id`、`region` 等标签进行更细粒度的分析
+2. **自定义指标**：根据业务需求添加自定义指标（如缓存命中率、外部 API 调用次数等）
+3. **告警规则**：配置 Prometheus 告警规则，在指标异常时发送通知
+4. **长期存储**：考虑使用 Thanos 或 Cortex 进行长期指标存储和查询
+
+## 相关文件
+
+- `src/functional_scaffold/main.py` - HTTP 请求指标跟踪中间件
+- `src/functional_scaffold/algorithms/base.py` - 算法执行指标记录
+- `src/functional_scaffold/core/metrics.py` - 指标定义
+- `monitoring/prometheus.yml` - Prometheus 配置
+- `monitoring/grafana/dashboard.json` - Grafana 仪表板
+- `scripts/generate_traffic.sh` - 流量生成脚本
--- a/docs/swagger/README.md
+++ b/docs/swagger/README.md
@@ -0,0 +1,107 @@
+# Swagger 文档
+
+本目录包含自动生成的 OpenAPI 规范文档。
+
+## 生成文档
+
+运行以下命令生成或更新 OpenAPI 规范：
+
+```bash
+python scripts/export_openapi.py
+```
+
+这将生成 `openapi.json` 文件，包含完整的 API 规范。
+
+## 查看文档
+
+### 在线查看
+
+启动应用后，访问以下 URL：
+
+- **Swagger UI**: http://localhost:8000/docs
+- **ReDoc**: http://localhost:8000/redoc
+
+### 离线查看
+
+使用 Swagger Editor 或其他 OpenAPI 工具打开 `openapi.json` 文件。
+
+## API 规范
+
+### 端点列表
+
+#### 算法接口
+
+- `POST /invoke` - 同步调用算法
+  - 请求体: `{"number": integer}`
+  - 响应: 算法执行结果
+
+- `POST /jobs` - 异步任务接口（预留）
+  - 当前返回 501 Not Implemented
+
+#### 健康检查
+
+- `GET /healthz` - 存活检查
+  - 响应: `{"status": "healthy", "timestamp": float}`
+
+- `GET /readyz` - 就绪检查
+  - 响应: `{"status": "ready", "timestamp": float, "checks": {...}}`
+
+#### 监控
+
+- `GET /metrics` - Prometheus 指标
+  - 响应: Prometheus 文本格式
+
+### 数据模型
+
+#### InvokeRequest
+
+```json
+{
+  "number": 17
+}
+```
+
+#### InvokeResponse
+
+```json
+{
+  "request_id": "uuid",
+  "status": "success",
+  "result": {
+    "number": 17,
+    "is_prime": true,
+    "factors": [],
+    "algorithm": "trial_division"
+  },
+  "metadata": {
+    "algorithm": "PrimeChecker",
+    "version": "1.0.0",
+    "elapsed_time": 0.001
+  }
+}
+```
+
+#### ErrorResponse
+
+```json
+{
+  "error": "ERROR_CODE",
+  "message": "Error description",
+  "details": {},
+  "request_id": "uuid"
+}
+```
+
+## 更新文档
+
+当修改 API 接口后，需要重新生成文档：
+
+1. 修改代码（路由、模型等）
+2. 运行 `python scripts/export_openapi.py`
+3. 提交更新后的 `openapi.json`
+
+## 注意事项
+
+- `openapi.json` 是自动生成的，不要手动编辑
+- 所有 API 变更都应该在代码中完成，然后重新生成文档
+- 确保 Pydantic 模型包含完整的文档字符串和示例
--- a/docs/swagger/openapi.json
+++ b/docs/swagger/openapi.json
@@ -0,0 +1,404 @@
+{
+  "openapi": "3.1.0",
+  "info": {
+    "title": "FunctionalScaffold",
+    "description": "算法工程化 Serverless 脚手架 - 提供标准化的算法服务接口",
+    "version": "1.0.0"
+  },
+  "paths": {
+    "/invoke": {
+      "post": {
+        "tags": [
+          "Algorithm"
+        ],
+        "summary": "同步调用算法",
+        "description": "同步调用质数判断算法，立即返回结果",
+        "operationId": "invoke_algorithm_invoke_post",
+        "parameters": [
+          {
+            "name": "x-request-id",
+            "in": "header",
+            "required": false,
+            "schema": {
+              "anyOf": [
+                {
+                  "type": "string"
+                },
+                {
+                  "type": "null"
+                }
+              ],
+              "title": "X-Request-Id"
+            }
+          }
+        ],
+        "requestBody": {
+          "required": true,
+          "content": {
+            "application/json": {
+              "schema": {
+                "$ref": "#/components/schemas/InvokeRequest"
+              }
+            }
+          }
+        },
+        "responses": {
+          "200": {
+            "description": "成功",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/InvokeResponse"
+                }
+              }
+            }
+          },
+          "400": {
+            "description": "请求参数错误",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/ErrorResponse"
+                }
+              }
+            }
+          },
+          "500": {
+            "description": "服务器内部错误",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/ErrorResponse"
+                }
+              }
+            }
+          },
+          "422": {
+            "description": "Validation Error",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/HTTPValidationError"
+                }
+              }
+            }
+          }
+        }
+      }
+    },
+    "/healthz": {
+      "get": {
+        "tags": [
+          "Algorithm"
+        ],
+        "summary": "健康检查",
+        "description": "检查服务是否存活",
+        "operationId": "health_check_healthz_get",
+        "responses": {
+          "200": {
+            "description": "Successful Response",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/HealthResponse"
+                }
+              }
+            }
+          }
+        }
+      }
+    },
+    "/readyz": {
+      "get": {
+        "tags": [
+          "Algorithm"
+        ],
+        "summary": "就绪检查",
+        "description": "检查服务是否就绪",
+        "operationId": "readiness_check_readyz_get",
+        "responses": {
+          "200": {
+            "description": "Successful Response",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/ReadinessResponse"
+                }
+              }
+            }
+          }
+        }
+      }
+    },
+    "/jobs": {
+      "post": {
+        "tags": [
+          "Algorithm"
+        ],
+        "summary": "异步任务接口（预留）",
+        "description": "异步任务接口，当前版本未实现",
+        "operationId": "create_job_jobs_post",
+        "responses": {
+          "501": {
+            "description": "Successful Response",
+            "content": {
+              "application/json": {
+                "schema": {}
+              }
+            }
+          }
+        }
+      }
+    },
+    "/metrics": {
+      "get": {
+        "tags": [
+          "Monitoring"
+        ],
+        "summary": "Prometheus 指标",
+        "description": "导出 Prometheus 格式的监控指标",
+        "operationId": "metrics_metrics_get",
+        "responses": {
+          "200": {
+            "description": "Successful Response",
+            "content": {
+              "application/json": {
+                "schema": {}
+              }
+            }
+          }
+        }
+      }
+    }
+  },
+  "components": {
+    "schemas": {
+      "ErrorResponse": {
+        "properties": {
+          "error": {
+            "type": "string",
+            "title": "Error",
+            "description": "错误代码"
+          },
+          "message": {
+            "type": "string",
+            "title": "Message",
+            "description": "错误消息"
+          },
+          "details": {
+            "anyOf": [
+              {
+                "additionalProperties": true,
+                "type": "object"
+              },
+              {
+                "type": "null"
+              }
+            ],
+            "title": "Details",
+            "description": "错误详情"
+          },
+          "request_id": {
+            "anyOf": [
+              {
+                "type": "string"
+              },
+              {
+                "type": "null"
+              }
+            ],
+            "title": "Request Id",
+            "description": "请求ID"
+          }
+        },
+        "type": "object",
+        "required": [
+          "error",
+          "message"
+        ],
+        "title": "ErrorResponse",
+        "description": "错误响应",
+        "example": {
+          "details": {
+            "field": "number",
+            "value": "abc"
+          },
+          "error": "VALIDATION_ERROR",
+          "message": "number must be an integer",
+          "request_id": "550e8400-e29b-41d4-a716-446655440000"
+        }
+      },
+      "HTTPValidationError": {
+        "properties": {
+          "detail": {
+            "items": {
+              "$ref": "#/components/schemas/ValidationError"
+            },
+            "type": "array",
+            "title": "Detail"
+          }
+        },
+        "type": "object",
+        "title": "HTTPValidationError"
+      },
+      "HealthResponse": {
+        "properties": {
+          "status": {
+            "type": "string",
+            "title": "Status",
+            "description": "健康状态"
+          },
+          "timestamp": {
+            "type": "number",
+            "title": "Timestamp",
+            "description": "时间戳"
+          }
+        },
+        "type": "object",
+        "required": [
+          "status",
+          "timestamp"
+        ],
+        "title": "HealthResponse",
+        "description": "健康检查响应"
+      },
+      "InvokeRequest": {
+        "properties": {
+          "number": {
+            "type": "integer",
+            "title": "Number",
+            "description": "待判断的整数"
+          }
+        },
+        "type": "object",
+        "required": [
+          "number"
+        ],
+        "title": "InvokeRequest",
+        "description": "同步调用请求",
+        "example": {
+          "number": 17
+        }
+      },
+      "InvokeResponse": {
+        "properties": {
+          "request_id": {
+            "type": "string",
+            "title": "Request Id",
+            "description": "请求唯一标识"
+          },
+          "status": {
+            "type": "string",
+            "title": "Status",
+            "description": "处理状态"
+          },
+          "result": {
+            "additionalProperties": true,
+            "type": "object",
+            "title": "Result",
+            "description": "算法执行结果"
+          },
+          "metadata": {
+            "additionalProperties": true,
+            "type": "object",
+            "title": "Metadata",
+            "description": "元数据信息"
+          }
+        },
+        "type": "object",
+        "required": [
+          "request_id",
+          "status",
+          "result",
+          "metadata"
+        ],
+        "title": "InvokeResponse",
+        "description": "同步调用响应",
+        "example": {
+          "metadata": {
+            "algorithm": "PrimeChecker",
+            "elapsed_time": 0.001,
+            "version": "1.0.0"
+          },
+          "request_id": "550e8400-e29b-41d4-a716-446655440000",
+          "result": {
+            "algorithm": "trial_division",
+            "factors": [],
+            "is_prime": true,
+            "number": 17
+          },
+          "status": "success"
+        }
+      },
+      "ReadinessResponse": {
+        "properties": {
+          "status": {
+            "type": "string",
+            "title": "Status",
+            "description": "就绪状态"
+          },
+          "timestamp": {
+            "type": "number",
+            "title": "Timestamp",
+            "description": "时间戳"
+          },
+          "checks": {
+            "anyOf": [
+              {
+                "additionalProperties": {
+                  "type": "boolean"
+                },
+                "type": "object"
+              },
+              {
+                "type": "null"
+              }
+            ],
+            "title": "Checks",
+            "description": "各项检查结果"
+          }
+        },
+        "type": "object",
+        "required": [
+          "status",
+          "timestamp"
+        ],
+        "title": "ReadinessResponse",
+        "description": "就绪检查响应"
+      },
+      "ValidationError": {
+        "properties": {
+          "loc": {
+            "items": {
+              "anyOf": [
+                {
+                  "type": "string"
+                },
+                {
+                  "type": "integer"
+                }
+              ]
+            },
+            "type": "array",
+            "title": "Location"
+          },
+          "msg": {
+            "type": "string",
+            "title": "Message"
+          },
+          "type": {
+            "type": "string",
+            "title": "Error Type"
+          }
+        },
+        "type": "object",
+        "required": [
+          "loc",
+          "msg",
+          "type"
+        ],
+        "title": "ValidationError"
+      }
+    }
+  }
+}