main:新增 Kubernetes 部署配置及文档

变更内容:
- 添加 Kubernetes 部署配置文件,包括 API Deployment、Worker Deployment 和 Redis Deployment。
- 新增 Service 定义,支持 API、Metrics 和 Redis 的集群访问。
- 配置 ConfigMap,用于全局共享环境变量。
- 编写 Kubernetes 部署指南文档,包含快速部署步骤、建议配置及故障排查方法。
- 提升系统的可扩展性和容器编排能力,适配生产环境使用。
This commit is contained in:
2026-02-03 16:30:48 +08:00
parent b77e736790
commit c7626723a3
5 changed files with 502 additions and 247 deletions

View File

@@ -25,6 +25,7 @@
| [API 参考](docs/api-reference.md) | 完整的 API 文档 | | [API 参考](docs/api-reference.md) | 完整的 API 文档 |
| [监控指南](docs/monitoring.md) | 监控和告警配置 | | [监控指南](docs/monitoring.md) | 监控和告警配置 |
| [API 规范](docs/api/README.md) | OpenAPI 规范说明 | | [API 规范](docs/api/README.md) | OpenAPI 规范说明 |
| [Kubernetes 部署](docs/kubernetes-deployment.md) | K8s 集群部署指南 |
| [日志集成(Loki)](docs/loki-quick-reference.md) | 日志收集部署说明 | | [日志集成(Loki)](docs/loki-quick-reference.md) | 日志收集部署说明 |
## 快速开始 ## 快速开始

View File

@@ -1,33 +1,70 @@
# Kubernetes 部署配置
# 包含ConfigMap、API Deployment、Worker Deployment、Redis Deployment
---
# ConfigMap - 共享配置
apiVersion: v1
kind: ConfigMap
metadata:
name: functional-scaffold-config
labels:
app: functional-scaffold
data:
APP_ENV: "production"
LOG_LEVEL: "INFO"
LOG_FORMAT: "json"
METRICS_ENABLED: "true"
# Redis 配置(指向集群内 Redis 服务)
REDIS_HOST: "functional-scaffold-redis"
REDIS_PORT: "6379"
REDIS_DB: "0"
# 异步任务配置
MAX_CONCURRENT_JOBS: "10"
JOB_RESULT_TTL: "1800"
WEBHOOK_MAX_RETRIES: "3"
WEBHOOK_TIMEOUT: "10"
# Worker 配置
WORKER_POLL_INTERVAL: "1.0"
JOB_QUEUE_KEY: "job:queue"
JOB_CONCURRENCY_KEY: "job:concurrency"
JOB_LOCK_TTL: "300"
JOB_MAX_RETRIES: "3"
JOB_EXECUTION_TIMEOUT: "300"
---
# API Deployment - HTTP 服务
apiVersion: apps/v1 apiVersion: apps/v1
kind: Deployment kind: Deployment
metadata: metadata:
name: functional-scaffold name: functional-scaffold-api
labels: labels:
app: functional-scaffold app: functional-scaffold
component: api
spec: spec:
replicas: 3 replicas: 3
selector: selector:
matchLabels: matchLabels:
app: functional-scaffold app: functional-scaffold
component: api
template: template:
metadata: metadata:
labels: labels:
app: functional-scaffold app: functional-scaffold
component: api
spec: spec:
containers: containers:
- name: functional-scaffold - name: api
image: functional-scaffold:latest image: functional-scaffold:latest
imagePullPolicy: IfNotPresent imagePullPolicy: IfNotPresent
ports: ports:
- containerPort: 8000 - containerPort: 8000
name: http name: http
env: env:
- name: APP_ENV - name: RUN_MODE
value: "production" value: "api"
- name: LOG_LEVEL envFrom:
value: "INFO" - configMapRef:
- name: METRICS_ENABLED name: functional-scaffold-config
value: "true"
resources: resources:
requests: requests:
memory: "256Mi" memory: "256Mi"
@@ -51,3 +88,116 @@ spec:
periodSeconds: 10 periodSeconds: 10
timeoutSeconds: 3 timeoutSeconds: 3
failureThreshold: 3 failureThreshold: 3
---
# Worker Deployment - 异步任务处理
apiVersion: apps/v1
kind: Deployment
metadata:
name: functional-scaffold-worker
labels:
app: functional-scaffold
component: worker
spec:
replicas: 2
selector:
matchLabels:
app: functional-scaffold
component: worker
template:
metadata:
labels:
app: functional-scaffold
component: worker
spec:
containers:
- name: worker
image: functional-scaffold:latest
imagePullPolicy: IfNotPresent
env:
- name: RUN_MODE
value: "worker"
envFrom:
- configMapRef:
name: functional-scaffold-config
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
# Worker 没有 HTTP 端口,使用命令探针
livenessProbe:
exec:
command:
- python
- -c
- "import redis; r = redis.Redis(host='functional-scaffold-redis'); r.ping()"
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
---
# Redis Deployment - 任务队列和状态存储
apiVersion: apps/v1
kind: Deployment
metadata:
name: functional-scaffold-redis
labels:
app: functional-scaffold
component: redis
spec:
replicas: 1
selector:
matchLabels:
app: functional-scaffold
component: redis
template:
metadata:
labels:
app: functional-scaffold
component: redis
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
name: redis
command:
- redis-server
- --appendonly
- "yes"
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
livenessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
readinessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
volumeMounts:
- name: redis-data
mountPath: /data
volumes:
- name: redis-data
emptyDir: {}

View File

@@ -1,9 +1,15 @@
# Kubernetes Service 配置
# 包含API Service、Metrics Service、Redis Service
---
# API Service - 对外暴露 HTTP 服务
apiVersion: v1 apiVersion: v1
kind: Service kind: Service
metadata: metadata:
name: functional-scaffold name: functional-scaffold-api
labels: labels:
app: functional-scaffold app: functional-scaffold
component: api
spec: spec:
type: ClusterIP type: ClusterIP
ports: ports:
@@ -13,13 +19,21 @@ spec:
name: http name: http
selector: selector:
app: functional-scaffold app: functional-scaffold
component: api
--- ---
# Metrics Service - Prometheus 抓取指标
apiVersion: v1 apiVersion: v1
kind: Service kind: Service
metadata: metadata:
name: functional-scaffold-metrics name: functional-scaffold-metrics
labels: labels:
app: functional-scaffold app: functional-scaffold
component: api
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"
spec: spec:
type: ClusterIP type: ClusterIP
ports: ports:
@@ -29,3 +43,24 @@ spec:
name: metrics name: metrics
selector: selector:
app: functional-scaffold app: functional-scaffold
component: api
---
# Redis Service - 内部 Redis 服务
apiVersion: v1
kind: Service
metadata:
name: functional-scaffold-redis
labels:
app: functional-scaffold
component: redis
spec:
type: ClusterIP
ports:
- port: 6379
targetPort: 6379
protocol: TCP
name: redis
selector:
app: functional-scaffold
component: redis

View File

@@ -0,0 +1,307 @@
# Kubernetes 部署指南
本文档介绍如何在 Kubernetes 集群中部署 FunctionalScaffold 服务。
## 架构概览
```
┌─────────────────┐
│ Ingress/LB │
└────────┬────────┘
┌────────▼────────┐
│ API Service │
│ (ClusterIP) │
└────────┬────────┘
┌──────────────┼──────────────┐
│ │ │
┌──────▼──────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ API Pod 1 │ │ API Pod 2 │ │ API Pod 3 │
└─────────────┘ └───────────┘ └───────────┘
┌────────▼────────┐
│ Redis Service │
└────────┬────────┘
┌──────────────┼──────────────┐
│ │ │
┌──────▼──────┐ ┌─────▼─────┐ │
│ Worker Pod 1│ │Worker Pod2│ │
└─────────────┘ └───────────┘ │
┌──────▼──────┐
│ Redis Pod │
└─────────────┘
```
## 组件说明
| 组件 | 副本数 | 说明 |
|------|--------|------|
| **API Deployment** | 3 | HTTP 服务,处理同步请求和任务创建 |
| **Worker Deployment** | 2 | 异步任务处理,从 Redis 队列消费任务 |
| **Redis Deployment** | 1 | 任务队列和状态存储 |
| **ConfigMap** | - | 共享配置管理 |
## 快速部署
```bash
# 部署所有资源
kubectl apply -f deployment/kubernetes/deployment.yaml
kubectl apply -f deployment/kubernetes/service.yaml
# 查看部署状态
kubectl get pods -l app=functional-scaffold
kubectl get svc -l app=functional-scaffold
```
## 配置文件说明
### deployment.yaml
包含以下资源:
#### ConfigMap
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: functional-scaffold-config
data:
APP_ENV: "production"
LOG_LEVEL: "INFO"
REDIS_HOST: "functional-scaffold-redis"
# ... 更多配置
```
主要配置项:
| 配置项 | 默认值 | 说明 |
|--------|--------|------|
| `APP_ENV` | production | 运行环境 |
| `LOG_LEVEL` | INFO | 日志级别 |
| `REDIS_HOST` | functional-scaffold-redis | Redis 服务地址 |
| `MAX_CONCURRENT_JOBS` | 10 | 最大并发任务数 |
| `JOB_EXECUTION_TIMEOUT` | 300 | 任务执行超时(秒) |
#### API Deployment
- **副本数**: 3
- **资源限制**: 256Mi-512Mi 内存250m-500m CPU
- **健康检查**: `/healthz`(存活)、`/readyz`(就绪)
- **环境变量**: `RUN_MODE=api`
#### Worker Deployment
- **副本数**: 2
- **资源限制**: 256Mi-512Mi 内存250m-500m CPU
- **健康检查**: exec 探针检查 Redis 连接
- **环境变量**: `RUN_MODE=worker`
#### Redis Deployment
- **副本数**: 1
- **资源限制**: 128Mi-256Mi 内存100m-200m CPU
- **持久化**: AOF 模式appendonly yes
- **存储**: emptyDir开发环境
### service.yaml
| Service | 类型 | 端口 | 说明 |
|---------|------|------|------|
| `functional-scaffold-api` | ClusterIP | 80 → 8000 | API 服务 |
| `functional-scaffold-metrics` | ClusterIP | 8000 | Prometheus 指标 |
| `functional-scaffold-redis` | ClusterIP | 6379 | Redis 服务 |
## 生产环境建议
### 1. 使用外部 Redis
生产环境建议使用托管 Redis 服务(如阿里云 Redis、AWS ElastiCache
```yaml
# 修改 ConfigMap
data:
REDIS_HOST: "r-xxxxx.redis.rds.aliyuncs.com"
REDIS_PORT: "6379"
REDIS_PASSWORD: "" # 使用 Secret 管理
```
### 2. 使用 Secret 管理敏感信息
```yaml
apiVersion: v1
kind: Secret
metadata:
name: functional-scaffold-secrets
type: Opaque
stringData:
REDIS_PASSWORD: "your-password"
DATABASE_URL: "postgresql://..."
```
在 Deployment 中引用:
```yaml
envFrom:
- configMapRef:
name: functional-scaffold-config
- secretRef:
name: functional-scaffold-secrets
```
### 3. 配置 HPA 自动扩缩容
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: functional-scaffold-api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: functional-scaffold-api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
```
### 4. 配置 PDB 保证可用性
```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: functional-scaffold-api-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: functional-scaffold
component: api
```
### 5. 使用 PVC 持久化 Redis 数据
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: redis-data-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
```
## 监控集成
### Prometheus 抓取配置
`functional-scaffold-metrics` Service 已添加 Prometheus 注解:
```yaml
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"
```
### ServiceMonitor如使用 Prometheus Operator
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: functional-scaffold
spec:
selector:
matchLabels:
app: functional-scaffold
component: api
endpoints:
- port: metrics
path: /metrics
interval: 30s
```
## 常用命令
```bash
# 查看所有资源
kubectl get all -l app=functional-scaffold
# 查看 Pod 日志
kubectl logs -l app=functional-scaffold,component=api -f
kubectl logs -l app=functional-scaffold,component=worker -f
# 扩缩容
kubectl scale deployment functional-scaffold-api --replicas=5
kubectl scale deployment functional-scaffold-worker --replicas=3
# 滚动更新
kubectl set image deployment/functional-scaffold-api \
api=functional-scaffold:v2.0.0
# 回滚
kubectl rollout undo deployment/functional-scaffold-api
# 查看部署历史
kubectl rollout history deployment/functional-scaffold-api
# 进入 Pod 调试
kubectl exec -it <pod-name> -- /bin/sh
# 端口转发(本地调试)
kubectl port-forward svc/functional-scaffold-api 8000:80
```
## 故障排查
### Pod 启动失败
```bash
# 查看 Pod 事件
kubectl describe pod <pod-name>
# 查看 Pod 日志
kubectl logs <pod-name> --previous
```
### Redis 连接失败
```bash
# 检查 Redis Service
kubectl get svc functional-scaffold-redis
# 测试 Redis 连接
kubectl run redis-test --rm -it --image=redis:7-alpine -- \
redis-cli -h functional-scaffold-redis ping
```
### Worker 不消费任务
```bash
# 检查 Worker 日志
kubectl logs -l component=worker -f
# 检查 Redis 队列
kubectl exec -it <redis-pod> -- redis-cli LLEN job:queue
```
## 相关文档
- [快速入门](getting-started.md)
- [监控指南](monitoring.md)
- [并发控制](concurrency-control.md)
- [日志集成](loki-quick-reference.md)

View File

@@ -1,238 +0,0 @@
# Loki 日志收集系统集成 - 实施总结
## 实施完成
已成功集成 Grafana Loki 日志收集系统到 FunctionalScaffold 项目。
## 新增文件
### 1. 监控配置文件
| 文件 | 说明 |
|------|------|
| `monitoring/loki.yaml` | Loki 服务配置7天保留期10MB/s速率限制|
| `monitoring/promtail.yaml` | Promtail 日志采集配置(支持 Docker stdio 和文件两种模式)|
### 2. Grafana Provisioning
| 文件 | 说明 |
|------|------|
| `monitoring/grafana/datasources/prometheus.yaml` | Prometheus 数据源自动配置 |
| `monitoring/grafana/datasources/loki.yaml` | Loki 数据源自动配置 |
| `monitoring/grafana/dashboards/provider.yaml` | Dashboard 自动加载配置 |
| `monitoring/grafana/dashboards/logs-dashboard.json` | 日志监控仪表板 |
| `monitoring/grafana/dashboards/dashboard.json` | 原有监控仪表板(已移动)|
### 3. 文档和脚本
| 文件 | 说明 |
|------|------|
| `docs/loki-integration.md` | Loki 使用完整文档(包含查询示例、故障排查等)|
| `scripts/verify_loki.sh` | Loki 集成验证脚本 |
## 修改文件
### 1. Docker Compose 配置
**文件**: `deployment/docker-compose.yml`
**变更**:
- 添加 `loki` 服务(端口 3100
- 添加 `promtail` 服务(端口 9080
- 更新 `app` 服务:
- 添加日志文件配置环境变量
- 添加 `app_logs` 卷挂载
- 添加 Promtail 标签
- 更新 `grafana` 服务:
- 修改 provisioning 卷挂载结构
- 添加对 Loki 的依赖
- 添加 `loki_data``app_logs`
### 2. 应用代码
**文件**: `src/functional_scaffold/core/logging.py`
**变更**:
- 添加 `file_path` 参数支持
- 实现 `RotatingFileHandler`100MB5个备份
- 支持同时输出到控制台和文件
**文件**: `src/functional_scaffold/config.py`
**变更**:
- 添加 `log_file_enabled` 配置(默认 False
- 添加 `log_file_path` 配置(默认 `/var/log/app/app.log`
**文件**: `src/functional_scaffold/main.py`
**变更**:
- 更新 `setup_logging()` 调用,传入文件路径参数
## 架构特点
### 1. 双模式日志收集
**模式 1: Docker stdio 收集(默认)**
- ✅ 无需修改应用代码
- ✅ 自动收集容器标准输出
- ✅ 性能影响极小
- ✅ 推荐用于生产环境
**模式 2: 文件收集(备用)**
- ✅ 日志持久化到文件
- ✅ 支持日志轮转
- ✅ 适合需要本地日志的场景
- ⚙️ 需要设置 `LOG_FILE_ENABLED=true`
### 2. 自动化配置
- ✅ Grafana 数据源自动加载
- ✅ Dashboard 自动加载
- ✅ 无需手动配置
### 3. 结构化日志
- ✅ JSON 格式日志
- ✅ 自动提取字段level, logger, request_id 等)
- ✅ 支持 LogQL 查询
## 使用方式
### 快速启动
```bash
cd deployment
docker-compose up -d
```
### 访问服务
- **Grafana**: http://localhost:3000 (admin/admin)
- **Loki API**: http://localhost:3100
- **Promtail**: http://localhost:9080
### 查看日志
**方式 1: Grafana 日志仪表板**
1. 访问 http://localhost:3000
2. 进入 "日志监控" 仪表板
**方式 2: Grafana Explore**
1. 访问 http://localhost:3000/explore
2. 选择 Loki 数据源
3. 输入查询: `{job="functional-scaffold-app"}`
### 验证集成
```bash
./scripts/verify_loki.sh
```
## LogQL 查询示例
```logql
# 查询所有日志
{job="functional-scaffold-app"}
# 查询错误日志
{job="functional-scaffold-app", level="ERROR"}
# 按 request_id 过滤
{job="functional-scaffold-app"} | json | request_id = "abc123"
# 统计日志量
sum by (level) (count_over_time({job="functional-scaffold-app"}[5m]))
```
## 配置说明
### 日志保留期
默认 7 天,可在 `monitoring/loki.yaml` 中修改:
```yaml
limits_config:
retention_period: 168h # 7 天
```
### 日志文件模式
`deployment/docker-compose.yml` 中启用:
```yaml
environment:
- LOG_FILE_ENABLED=true
- LOG_FILE_PATH=/var/log/app/app.log
```
### 日志级别
`deployment/docker-compose.yml` 中调整:
```yaml
environment:
- LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL
```
## 监控指标
Loki 集成后,可以在 Grafana 中查看:
- **日志流**: 实时日志流
- **日志量趋势**: 按时间和级别统计
- **日志级别分布**: INFO/WARNING/ERROR 分布
- **错误日志**: 只显示 ERROR 级别
## 故障排查
### 看不到日志
1. 检查服务状态: `docker-compose ps`
2. 查看 Promtail 日志: `docker-compose logs promtail`
3. 验证容器标签: `docker inspect <container> | grep Labels`
4. 查询 Loki API: `curl http://localhost:3100/loki/api/v1/label/job/values`
### Docker socket 权限问题
```bash
sudo chmod 666 /var/run/docker.sock
```
### 日志量过大
1. 调整保留期为 3 天
2. 降低摄入速率限制
3. 添加日志过滤规则
详细故障排查请参考 `docs/loki-integration.md`
## 性能影响
- **CPU**: < 5% 额外开销
- **内存**: Loki ~200MB, Promtail ~50MB
- **磁盘**: 取决于日志量7天约 1-5GB
- **网络**: 本地通信影响极小
## 下一步
可选的增强功能
1. **告警规则**: 配置基于日志的告警
2. **日志导出**: 定期导出日志到对象存储
3. **多租户**: 配置 Loki 多租户模式
4. **长期存储**: 配置 S3/OSS 作为后端存储
## 参考文档
- 完整使用文档: `docs/loki-integration.md`
- Loki 官方文档: https://grafana.com/docs/loki/latest/
- LogQL 查询语言: https://grafana.com/docs/loki/latest/logql/
## 总结
**完成**: Loki 日志收集系统已成功集成
**测试**: 可通过 `./scripts/verify_loki.sh` 验证
**文档**: 提供完整的使用和故障排查文档
**生产就绪**: 支持双模式收集配置灵活
集成已完成可以开始使用 Loki 进行日志收集和分析