Skip to content

Monitoring

Current stack

ToolStatusPurpose
Prometheus / node_exporterDeployedHost metrics (CPU, memory, disk, network)
Fail2banDeployedIntrusion detection, SSH brute-force protection
Uptime KumaNot deployedExternal uptime monitoring, status page
GrafanaNot deployedMetrics visualisation and dashboards
Promtail / LokiPartialLog shipping (Promtail container deployed; Loki not yet)

Dashboard widgets

Metrics are surfaced in the ops dashboard via dedicated widgets:

WidgetData sourceAPI route
Prometheusnode_exporter scrape/api/widgets/prometheus
Fail2banlog / socket/api/widgets/fail2ban
Uptime KumaREST API/api/widgets/uptime-kuma

See Widgets for the full widget reference.


node_exporter

Scrapes host metrics from the web VPS. The Prometheus widget polls node_exporter directly from the Next.js container.

Terminal window
# Environment variable
PROMETHEUS_URL=http://<host>:9100/metrics

Metrics exposed: CPU usage, memory, disk I/O, filesystem usage, network throughput.


Fail2ban

Monitors /var/log/auth.log and other log sources. Bans IPs that exceed failed authentication thresholds.

The Fail2ban widget reads ban counts and recently banned IPs via the fail2ban socket or log scrape.


Uptime Kuma

[placeholder — not yet deployed. See Uptime Kuma for planned deployment.]

Target monitors once deployed:

  • web.level147.net — ops dashboard (HTTP 200)
  • docs.level147.net — docs site (HTTP 200)
  • Gitea internal health endpoint
  • Woodpecker CI

Alerting

[placeholder — define alerting channels and thresholds]

Proposed alerting rules:

  • Dashboard unreachable > 5 minutes → immediate
  • Disk usage > 85% → warn; > 95% → critical
  • Memory usage > 90% for 10 minutes → warn
  • Failed SSH logins spike (> 50/min) → immediate
  • CI pipeline failure → notify via [channel TBD]

Log retention

Log typeRetention
Docker container logs14 days (Docker log driver)
Woodpecker pipeline logs90 days
Fail2ban logs30 days
System auth logs30 days

Health check endpoints

ServiceEndpointExpected response
Ops dashboardhttp://localhost:3000/api/health200 OK
Cloudflare tunnel$CLOUDFLARED_METRICS_URLmetrics text