Monitoring

Current stack

Tool	Status	Purpose
Prometheus / node_exporter	Deployed	Host metrics (CPU, memory, disk, network)
Fail2ban	Deployed	Intrusion detection, SSH brute-force protection
Uptime Kuma	Integrated	External uptime monitoring, status page
Grafana	Not deployed	Metrics visualisation and dashboards
Promtail / Loki	Partial	Log shipping (Promtail container deployed; Loki not yet)

Metrics are surfaced in the ops dashboard via dedicated widgets:

Widget	Data source	API route
Host Resources	node_exporter scrape	`/api/widgets/node-exporter`
Infrastructure	Prometheus / node_exporter	`/api/widgets/prometheus-hosts`
Security	Fail2ban via node_exporter textfile	`/api/widgets/security`
Uptime Kuma	REST API	`/api/widgets/uptime-kuma`

See Widgets for the full widget reference.

Scrapes host metrics from the web VPS. The Prometheus widget polls node_exporter directly from the Next.js container.

# Environment variable
PROMETHEUS_URL=http://<host>:9100/metrics

Metrics exposed: CPU usage, memory, disk I/O, filesystem usage, network throughput.

Monitors /var/log/auth.log and other log sources. Bans IPs that exceed failed authentication thresholds.

The Fail2ban widget reads ban counts and recently banned IPs via the fail2ban socket or log scrape.

Integrated via the Uptime Kuma widget (/api/widgets/uptime-kuma). The dashboard shows live monitor status from the Uptime Kuma status page API.

Target monitors:

[placeholder — define alerting channels and thresholds]

Proposed alerting rules:

Service	Endpoint	Expected response
Ops dashboard	`http://localhost:3000/api/health`	`200 OK`
Cloudflare tunnel	`$CLOUDFLARED_METRICS_URL`	metrics text

v0.1.0 · d0d7a20 · 2026-06-26