main:删除 Grafana 仪表板配置文件

更新内容:
- 移除 `dashboard.json` 文件,清理不再需要的 Grafana 仪表板配置。
- 简化项目目录结构,删除多余的监控配置以优化维护。
This commit is contained in:
2026-02-02 18:40:16 +08:00
parent 3e1d850954
commit 683bf8a6ca
20 changed files with 2103 additions and 18 deletions

258
monitoring/README.md Normal file
View File

@@ -0,0 +1,258 @@
# Monitoring 目录说明
本目录包含所有监控和日志收集相关的配置文件。
## 目录结构
```
monitoring/
├── alerts/ # Prometheus 告警规则
│ └── rules.yaml # 告警规则配置
├── grafana/ # Grafana 配置
│ ├── datasources/ # 数据源自动配置
│ │ ├── prometheus.yaml # Prometheus 数据源
│ │ └── loki.yaml # Loki 数据源
│ └── dashboards/ # 仪表板自动加载
│ ├── provider.yaml # Dashboard provider 配置
│ ├── dashboard.json # 指标监控仪表板
│ └── logs-dashboard.json # 日志监控仪表板
├── loki.yaml # Loki 日志存储配置
├── promtail.yaml # Promtail 日志采集配置
└── prometheus.yml # Prometheus 指标收集配置
```
## 配置文件说明
### Prometheus 配置
**文件**: `prometheus.yml`
Prometheus 指标收集配置,包括:
- 抓取间隔: 5 秒
- 目标: app 服务的 `/metrics` 端点
- 告警规则: 从 `alerts/` 目录加载
### Loki 配置
**文件**: `loki.yaml`
Loki 日志存储配置,包括:
- 存储方式: 本地文件系统
- 日志保留期: 7 天
- 摄入速率限制: 10MB/s
- 自动压缩和清理
**关键配置**:
```yaml
limits_config:
retention_period: 168h # 7 天
ingestion_rate_mb: 10 # 10MB/s
```
### Promtail 配置
**文件**: `promtail.yaml`
Promtail 日志采集配置,支持两种模式:
**模式 1: Docker stdio 收集(默认)**
- 通过 Docker API 自动发现容器
- 过滤带有 `logging=promtail` 标签的容器
- 自动解析 JSON 日志
**模式 2: 文件收集(备用)**
-`/var/log/app/*.log` 读取日志文件
- 支持日志轮转
- 需要设置 `LOG_FILE_ENABLED=true`
### Grafana Provisioning
**数据源** (`grafana/datasources/`)
自动配置 Grafana 数据源:
- `prometheus.yaml`: Prometheus 数据源(默认)
- `loki.yaml`: Loki 数据源
**仪表板** (`grafana/dashboards/`)
自动加载 Grafana 仪表板:
- `provider.yaml`: Dashboard provider 配置
- `dashboard.json`: 指标监控仪表板HTTP 请求、算法执行等)
- `logs-dashboard.json`: 日志监控仪表板(日志流、错误日志等)
### 告警规则
**文件**: `alerts/rules.yaml`
Prometheus 告警规则,包括:
- 高错误率告警
- 高延迟告警
- 服务不可用告警
## 修改配置
### 调整日志保留期
编辑 `loki.yaml`:
```yaml
limits_config:
retention_period: 72h # 改为 3 天
```
重启 Loki:
```bash
cd deployment
docker-compose restart loki
```
### 调整指标抓取间隔
编辑 `prometheus.yml`:
```yaml
global:
scrape_interval: 10s # 改为 10 秒
```
重启 Prometheus:
```bash
cd deployment
docker-compose restart prometheus
```
### 添加新的告警规则
编辑 `alerts/rules.yaml`,添加新规则:
```yaml
groups:
- name: my_alerts
rules:
- alert: MyAlert
expr: my_metric > 100
for: 5m
labels:
severity: warning
annotations:
summary: "我的告警"
```
重启 Prometheus:
```bash
cd deployment
docker-compose restart prometheus
```
### 添加新的仪表板
1. 在 Grafana UI 中创建仪表板
2. 导出为 JSON
3. 保存到 `grafana/dashboards/my-dashboard.json`
4. 重启 Grafana或等待自动重载
```bash
cd deployment
docker-compose restart grafana
```
## 验证配置
### 检查 Prometheus 配置
```bash
# 访问 Prometheus UI
open http://localhost:9090
# 检查目标状态
open http://localhost:9090/targets
# 检查告警规则
open http://localhost:9090/alerts
```
### 检查 Loki 配置
```bash
# 检查 Loki 健康状态
curl http://localhost:3100/ready
# 查询标签
curl -s "http://localhost:3100/loki/api/v1/label/job/values" | jq
```
### 检查 Grafana 配置
```bash
# 访问 Grafana UI
open http://localhost:3000
# 检查数据源
curl -s -u admin:admin http://localhost:3000/api/datasources | jq
# 检查仪表板
curl -s -u admin:admin http://localhost:3000/api/search | jq
```
## 故障排查
### Prometheus 无法抓取指标
1. 检查 app 服务是否运行: `docker-compose ps app`
2. 检查 metrics 端点: `curl http://localhost:8111/metrics`
3. 查看 Prometheus 日志: `docker-compose logs prometheus`
### Loki 无法接收日志
1. 检查 Promtail 是否运行: `docker-compose ps promtail`
2. 查看 Promtail 日志: `docker-compose logs promtail`
3. 检查容器标签: `docker inspect <container> | grep Labels`
### Grafana 数据源未加载
1. 检查 provisioning 目录挂载: `docker-compose config | grep grafana -A 10`
2. 查看 Grafana 日志: `docker-compose logs grafana`
3. 手动重启 Grafana: `docker-compose restart grafana`
## 相关文档
- [Loki 集成文档](../docs/loki-integration.md) - 完整的 Loki 使用文档
- [Loki 快速参考](../docs/loki-quick-reference.md) - 常用命令和查询
- [Loki 实施总结](../docs/loki-implementation-summary.md) - 实施细节和架构说明
- [Prometheus 官方文档](https://prometheus.io/docs/)
- [Loki 官方文档](https://grafana.com/docs/loki/latest/)
- [Grafana 官方文档](https://grafana.com/docs/grafana/latest/)
## 性能建议
### 日志量控制
- 调整日志级别为 WARNING 或 ERROR
- 过滤掉不必要的日志(如健康检查)
- 减少日志保留期
### 指标优化
- 增加抓取间隔(如 15s 或 30s
- 减少指标基数(避免高基数标签)
- 定期清理旧数据
### 存储优化
- 监控磁盘使用: `docker-compose exec loki du -sh /loki`
- 定期备份重要数据
- 考虑使用对象存储S3/OSS作为后端
## 总结
本目录包含完整的监控和日志收集配置:
**Prometheus** - 指标收集和告警
**Loki** - 日志存储和查询
**Promtail** - 日志采集
**Grafana** - 可视化和仪表板
所有配置都支持自动加载,无需手动配置。

View File

@@ -0,0 +1,292 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {
"type": "loki",
"uid": "Loki"
},
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"dedupStrategy": "none",
"enableLogDetails": true,
"prettifyLogMessage": false,
"showCommonLabels": false,
"showLabels": false,
"showTime": true,
"sortOrder": "Descending",
"wrapLogMessage": false
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "Loki"
},
"editorMode": "code",
"expr": "{job=\"functional-scaffold-app\"}",
"queryType": "range",
"refId": "A"
}
],
"title": "日志流 (实时)",
"type": "logs"
},
{
"datasource": {
"type": "loki",
"uid": "Loki"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"tooltip": false,
"viz": false,
"legend": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 10
},
"id": 2,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "Loki"
},
"editorMode": "code",
"expr": "sum by (level) (count_over_time({job=\"functional-scaffold-app\"}[1m]))",
"queryType": "range",
"refId": "A"
}
],
"title": "日志量趋势(按级别)",
"type": "timeseries"
},
{
"datasource": {
"type": "loki",
"uid": "Loki"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 10
},
{
"color": "red",
"value": 50
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 10
},
"id": 3,
"options": {
"orientation": "auto",
"reduceOptions": {
"values": false,
"calcs": [
"lastNotNull"
],
"fields": ""
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "9.5.3",
"targets": [
{
"datasource": {
"type": "loki",
"uid": "Loki"
},
"editorMode": "code",
"expr": "sum by (level) (count_over_time({job=\"functional-scaffold-app\"}[$__range]))",
"queryType": "range",
"refId": "A"
}
],
"title": "日志级别分布",
"type": "gauge"
},
{
"datasource": {
"type": "loki",
"uid": "Loki"
},
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 18
},
"id": 4,
"options": {
"dedupStrategy": "none",
"enableLogDetails": true,
"prettifyLogMessage": false,
"showCommonLabels": false,
"showLabels": false,
"showTime": true,
"sortOrder": "Descending",
"wrapLogMessage": false
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "Loki"
},
"editorMode": "code",
"expr": "{job=\"functional-scaffold-app\", level=\"ERROR\"}",
"queryType": "range",
"refId": "A"
}
],
"title": "错误日志",
"type": "logs"
}
],
"refresh": "5s",
"schemaVersion": 38,
"style": "dark",
"tags": ["logs", "loki"],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "",
"value": ""
},
"hide": 0,
"label": "Request ID",
"name": "request_id",
"options": [
{
"selected": true,
"text": "",
"value": ""
}
],
"query": "",
"skipUrlSync": false,
"type": "textbox"
}
]
},
"time": {
"from": "now-15m",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "日志监控",
"uid": "logs-dashboard",
"version": 0,
"weekStart": ""
}

View File

@@ -0,0 +1,13 @@
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /etc/grafana/provisioning/dashboards
foldersFromFilesStructure: true

View File

@@ -0,0 +1,11 @@
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
url: http://loki:3100
isDefault: false
editable: false
jsonData:
maxLines: 1000

View File

@@ -0,0 +1,11 @@
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: false
jsonData:
timeInterval: "5s"

39
monitoring/loki.yaml Normal file
View File

@@ -0,0 +1,39 @@
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
limits_config:
retention_period: 168h # 7 天
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
compactor:
working_directory: /loki/compactor
shared_store: filesystem
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h

71
monitoring/promtail.yaml Normal file
View File

@@ -0,0 +1,71 @@
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
# 场景 1: Docker stdio 收集(主要方式)
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
filters:
- name: label
values: ["logging=promtail"]
relabel_configs:
- source_labels: ['__meta_docker_container_name']
regex: '/(.*)'
target_label: 'container'
- source_labels: ['__meta_docker_container_label_logging_jobname']
target_label: 'job'
- source_labels: ['__meta_docker_container_id']
target_label: '__path__'
replacement: '/var/lib/docker/containers/$1/*.log'
pipeline_stages:
- json:
expressions:
log: log
stream: stream
time: time
- json:
source: log
expressions:
level: levelname
logger: name
message: message
request_id: request_id
- labels:
level:
logger:
- output:
source: log
# 场景 2: Log 文件收集(备用)
- job_name: app_files
static_configs:
- targets:
- localhost
labels:
job: functional-scaffold-app-files
__path__: /var/log/app/*.log
pipeline_stages:
- json:
expressions:
timestamp: asctime
level: levelname
logger: name
message: message
request_id: request_id
- timestamp:
source: timestamp
format: "2006-01-02 15:04:05,000"
- labels:
level:
logger:
- output:
source: message