monitoring

Monitoring a Homelab with Prometheus and Grafana

Implementing monitoring in a homelab using Prometheus, Grafana, Node Exporter, and cAdvisor for system and container visibility.

This post covers how monitoring and observability are implemented in my homelab using Prometheus and Grafana.

Once multiple services are running, visibility becomes critical. Without monitoring, it is difficult to understand system behaviour, detect issues early, or troubleshoot effectively.

Why Monitoring Matters

Monitoring provides insight into:

System performance (CPU, memory, disk)
Service health and availability
Resource usage trends over time

Rather than reacting to failures, monitoring allows for early detection and proactive troubleshooting.

Monitoring Stack Overview

The monitoring stack consists of:

Prometheus → metrics collection and storage
Node Exporter → host-level metrics
cAdvisor → container-level metrics
Grafana → visualisation and dashboards

Each component plays a specific role in collecting and presenting data.

Architecture

Metrics flow through the system as follows:

Host System        Docker Containers
     ↓                     ↓
 Node Exporter         cAdvisor
         \             /
          \           /
           Prometheus
               ↓
            Grafana

Prometheus Configuration

Prometheus scrapes metrics from defined targets.

Example configuration:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['172.17.0.1:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

Key points:

Node Exporter runs on the host
cAdvisor runs as a container
Prometheus scrapes both at regular intervals

Docker Networking Considerations

One issue encountered was how containers access host services.

On Linux, host.docker.internal is not always available, so:

The Docker bridge gateway (172.17.0.1) was used
Prometheus connects to Node Exporter via this address

This is an important detail when designing monitoring setups in Docker environments.

Grafana Dashboards

Grafana is used to visualise metrics collected by Prometheus.

Dashboards include:

CPU usage (rate-based calculations)
Memory utilisation
Disk usage by mount point
Container resource usage

Prebuilt dashboards were used initially and then customised to focus on relevant metrics.

Example Metric Query

CPU usage is calculated using PromQL:

100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)

This shows the percentage of CPU actively being used.

Observability Approach

The goal is not just to collect data, but to make it actionable.

This includes:

Highlighting critical metrics clearly
Avoiding overly complex dashboards
Focusing on signals that indicate real issues

For example:

High disk usage on storage volumes
Sustained CPU load spikes
Container resource exhaustion

Issues Encountered

Several practical issues came up during setup:

Docker networking and host access
Missing or stale metrics
Incorrect Prometheus scrape targets
Dashboard data not updating correctly

These required debugging across multiple layers, including containers, networking, and configuration.

Key Learnings

Monitoring is essential once systems grow beyond a few services
Prometheus requires careful configuration of scrape targets
Docker networking can introduce unexpected complexity
Dashboards should prioritise clarity over quantity

What’s Next

The next step is exploring storage and data management, including RAID, filesystem design, and ensuring data integrity.