Monitoring a Homelab with Prometheus and Grafana

Implementing monitoring in a homelab using Prometheus, Grafana, Node Exporter, and cAdvisor for system and container visibility.

This post covers how monitoring and observability are implemented in my homelab using Prometheus and Grafana.

Once multiple services are running, visibility becomes critical. Without monitoring, it is difficult to understand system behaviour, detect issues early, or troubleshoot effectively.


Why Monitoring Matters

Monitoring provides insight into:

  • System performance (CPU, memory, disk)
  • Service health and availability
  • Resource usage trends over time

Rather than reacting to failures, monitoring allows for early detection and proactive troubleshooting.


Monitoring Stack Overview

The monitoring stack consists of:

  • Prometheus → metrics collection and storage
  • Node Exporter → host-level metrics
  • cAdvisor → container-level metrics
  • Grafana → visualisation and dashboards

Each component plays a specific role in collecting and presenting data.


Architecture

Metrics flow through the system as follows:

Host System        Docker Containers
     ↓                     ↓
 Node Exporter         cAdvisor
         \             /
          \           /
           Prometheus
               ↓
            Grafana

Prometheus Configuration

Prometheus scrapes metrics from defined targets.

Example configuration:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['172.17.0.1:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

Key points:

  • Node Exporter runs on the host
  • cAdvisor runs as a container
  • Prometheus scrapes both at regular intervals

Docker Networking Considerations

One issue encountered was how containers access host services.

On Linux, host.docker.internal is not always available, so:

  • The Docker bridge gateway (172.17.0.1) was used
  • Prometheus connects to Node Exporter via this address

This is an important detail when designing monitoring setups in Docker environments.


Grafana Dashboards

Grafana is used to visualise metrics collected by Prometheus.

Dashboards include:

  • CPU usage (rate-based calculations)
  • Memory utilisation
  • Disk usage by mount point
  • Container resource usage

Prebuilt dashboards were used initially and then customised to focus on relevant metrics.


Example Metric Query

CPU usage is calculated using PromQL:

100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)

This shows the percentage of CPU actively being used.


Observability Approach

The goal is not just to collect data, but to make it actionable.

This includes:

  • Highlighting critical metrics clearly
  • Avoiding overly complex dashboards
  • Focusing on signals that indicate real issues

For example:

  • High disk usage on storage volumes
  • Sustained CPU load spikes
  • Container resource exhaustion

Issues Encountered

Several practical issues came up during setup:

  • Docker networking and host access
  • Missing or stale metrics
  • Incorrect Prometheus scrape targets
  • Dashboard data not updating correctly

These required debugging across multiple layers, including containers, networking, and configuration.


Key Learnings

  • Monitoring is essential once systems grow beyond a few services
  • Prometheus requires careful configuration of scrape targets
  • Docker networking can introduce unexpected complexity
  • Dashboards should prioritise clarity over quantity

What’s Next

The next step is exploring storage and data management, including RAID, filesystem design, and ensuring data integrity.