This guide documents the comprehensive monitoring setup across the homelab infrastructure, including servers, workstations, and family devices.
The homelab monitoring stack provides:
https://influxdb.speicher.familyhttps://grafana.speicher.familyhttps://seq.speicher.familyhttps://dozzle.speicher.familyhttps://beszel.speicher.familyhttps://uptime.dratspiker.comAll Linux servers (lucille3, lucille4, lucille5, loose-seal) run:
beszel-agent:
image: henrygd/beszel-agent
container_name: beszel-agent
restart: unless-stopped
network_mode: host
volumes:
1. **Dozzle Agent**:
```yaml
dozzle-agent:
image: amir20/dozzle:latest
container_name: dozzle-agent
restart: unless-stopped
ports:
- "7007:7007"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
environment:
DOZZLE_MODE: agent
NAS02 uses QNAP-specific monitoring:
Deploy Beszel agent on macOS:
## Install via Homebrew
brew tap henrygd/tap
brew install beszel-agent
## Configure and start
beszel-agent serve --port 45876 --key "your-key-here"
The family MacBook requires comprehensive monitoring for safety and support:
brew install osquery
1. **Screen Time Integration**:
2. **Network Monitoring**:
3. **Log Forwarding**:
- Native macOS parental controls
- API integration for usage tracking
- Real-time alerts for violations
- Little Snitch for application firewall
- DNS query logging via Pi-hole
- SSL inspection for content filtering
```bash
## Forward system logs to Seq
log stream --predicate 'process == "family-safety"' | \
nc -u seq.speicher.family 12201
Configure Docker logging driver:
{
"log-driver": "gelf",
"log-opts": {
"gelf-address": "udp://seq.speicher.family:12201",
"tag": "{{.Name}}/{{.ID}}"
}
}
For non-containerized applications:
## Using logger for simple messages
logger -t myapp -p local0.info "Application message" | \
nc -u seq.speicher.family 12201
## Using structured logging libraries
## Python: structlog with GELF handler
## Node.js: winston with GELF transport
## Go: logrus with GELF hook
Infrastructure Overview:
Server Detail Dashboards:
Family Safety Dashboard:
Service Performance:
All servers at a glance
Resource utilization trends
Service health status
Network traffic patterns
CPU, Memory, Disk metrics
Container resource usage
Network connections
Process monitoring
Screen time usage
Application activity
Web browsing patterns
Alert history
Response times
Error rates
Request volumes
Database performance
https://grafana.speicher.familySELECT mean("cpu_usage")
FROM "system_metrics"
WHERE "host" = 'lucille4'
GROUP BY time(5m)
High CPU Usage:
Low Disk Space:
Service Down:
Family Safety Alerts:
Threshold: >90% for 5 minutes
Targets: All servers
Threshold: <10% free
Targets: All servers and NAS
Check: HTTP response code
Frequency: Every 60 seconds
Inappropriate content access
Exceeded screen time
Unauthorized application use
Daily:
Weekly:
Monthly:
Review alert summary
Check dashboard anomalies
Verify all agents running
Review log retention
Update dashboard layouts
Test alert channels
Prune old metrics data
Update monitoring thresholds
Review and optimize queries
Metric Retention:
Log Retention:
Query Optimization:
High-frequency: 7 days
Medium-frequency: 30 days
Low-frequency: 1 year
Application logs: 30 days
Security logs: 90 days
Audit logs: 1 year
Use time ranges in queries
Aggregate data for long periods
Index frequently searched fields
Verify network connectivity:
Check agent logs:
Validate agent key:
telnet beszel.speicher.family 45876
docker logs beszel-agent
journalctl -u beszel-agent
Check data source:
Validate collection:
Review retention policy:
Verify InfluxDB is receiving data
Test with manual query
Ensure agents are running
Check network connectivity
Verify data hasn't been pruned
Check downsampling rules