Setting Up Monitoring with Prometheus and Grafana - Part 1

Introduction

A monitoring stack is only useful when it is deployed consistently and remains easy to maintain over time. Manual installations and one-off configurations tend to drift quickly, which makes upgrades and troubleshooting harder than they need to be.

In this series, we build the stack component by component using Ansible roles. This first post focuses only on Prometheus Node Exporter, which is a lightweight collector that exports and exposes a wide variety of hardware- and kernel-related metrics. We will go over the installation process, running as a hardened systemd service, and exposing the metrics on port 9100 for later scraping by Prometheus.

Node Exporter

It is recommended to run Node Exporter directly on the host itself instead of a container since the exporter needs to access kernel-related metrics. Although it is possible to run in a container, the /proc, /sys and other directories have to be mounted for it to function properly. In this post, we will focus on host-based set up for the exporter.

The role is broken down into four phases:

Preparation: Create user, group and working directory
Installation: Download and install the Node Exporter binary
Configure: Create and manage systemd service for the exporter
Cleanup: Remove temporary artifacts

Core Variables

Rather than hard-coding installation details into the role, we keep the behavior driven by defaults/main.yml. That way, we get a clean base role that can be reused across hosts and environments with only variable overrides.

The core variables for this role are defined as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
---
# Core configuration (commonly overridden)
node_exporter_version: "1.10.2"
node_exporter_user: "node_exporter"
node_exporter_group: "node_exporter"
node_exporter_work_dir: "/tmp/node_exporter"

node_exporter_arch: "linux-amd64"
node_exporter_archive_name: "node_exporter-{{ node_exporter_version }}.{{ node_exporter_arch }}.tar.gz"

node_exporter_port: 9100
node_exporter_listen_address: "{{ ansible_host }}:{{ node_exporter_port }}"

Preparation

It is worth setting up the environment before directly jumping into the installation. That way, we avoid cluttering our host and maintain a clean state. The environment will also help us in achieving:

Security: The Node Exporter should not run as root or with escalated privileges. Instead, we will create a system user and group for the exporter.
Isolation: All operations such as download, extractions, etc. will reside in a temporary work directory, keeping the host system clean and unaffected.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
---
- name: Create node_exporter group
  ansible.builtin.group:
    name: "{{ node_exporter_group }}"
    system: true

- name: Create node_exporter user
  ansible.builtin.user:
    name: "{{ node_exporter_user }}"
    group: "{{ node_exporter_group }}"
    system: true
    shell: /usr/sbin/nologin
    create_home: false

- name: Create Node Exporter work directory
  ansible.builtin.file:
    path: "{{ node_exporter_work_dir }}"
    state: directory
    mode: "0755"

These tasks ensure Node Exporter runs as a non-privileged system user and the work directory provides the isolation layer by keeping all temporary files contained.

Installation

Download

We can use Ansible’s built-in get_url module for downloading the release archive:

1
2
3
4
5
6
---
- name: Get Node Exporter release
  ansible.builtin.get_url:
    url: "{{ node_exporter_release_url }}"
    dest: "{{ node_exporter_archive_path }}"
    mode: "0644"

As an added layer of safety, especially in production, add a checksum value to verfy the downloaded archive

The values for url and dest are derived variables, defined in the defaults/main.yml as:

1
2
node_exporter_release_url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/{{ node_exporter_archive_name }}"
node_exporter_archive_path: "{{ node_exporter_work_dir }}/{{ node_exporter_archive_name }}"

Unarchiving

The downloaded release is a Tar file, so we cannot use it directly. Instead, we will be using Ansible’s built-in unarchive module:

1
2
3
4
5
- name: Unarchive Node Exporter
  ansible.builtin.unarchive:
    src: "{{ node_exporter_archive_path }}"
    dest: "{{ node_exporter_work_dir }}"
    remote_src: true

Setting remote_src: true tells Ansible that the archive already exists on the remote host, so the unarchiving process happens on the remote host itself.

Installation

The final step in this phase is installing the exporter binary. We defined our helper derived variables in defaults/main.yml:

1
2
3
4
node_exporter_extracted_name: "node_exporter-{{ node_exporter_version }}.{{ node_exporter_arch }}"
node_exporter_extracted_dir_path: "{{ node_exporter_work_dir }}/{{ node_exporter_extracted_name }}"
node_exporter_extracted_bin_path: "{{ node_exporter_extracted_dir_path }}/node_exporter"
node_exporter_bin_path: "/usr/local/bin/node_exporter"

/usr/local/bin/ would make Node Exporter available system-wide and is the recommended installation path

We copy the exporter from our temp work directory to the /usr/local/bin directory:

1
2
3
4
5
6
7
- name: Install the Node Exporter binary
  ansible.builtin.copy:
    src: "{{ node_exporter_extracted_bin_path }}"
    dest: "{{ node_exporter_bin_path }}"
    remote_src: true
    mode: "0755"
  notify: Restart node_exporter

Setting mode: "0755" ensures the binary is executable by all users while remaining writable only by the owner.

Notice the notify part at the end of the task. A change in binary ensures the service is only restarted when the binary actually changes. The handler Restart node_exporter is implemented in handlers/main.yml:

1
2
3
4
5
---
- name: Restart node_exporter
  ansible.builtin.systemd:
    name: "{{ node_exporter_service_name }}"
    state: restarted

Configure

At this point, we managed to download and extract the exporter’s binary. The next step is to create systemd unit for the exporter. Using systemd ensures the exporter is managed consistently, starts on boot, and can be controlled via standard system tooling.

This can be done through a two-step process:

Create the systemd unit file
Enable and start the service

Systemd Unit File

The systemd unit file will be implemented as a jinja template. The file templates/node_exporter.j2.service:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User={{ node_exporter_user }}
Group={{ node_exporter_group }}
Type=simple
NoNewPrivileges=yes
PrivateTmp=yes
ProtectSystem=full
ProtectHome=yes
ExecStart={{ node_exporter_bin_path }} --web.listen-address={{ node_exporter_listen_address }}

[Install]
WantedBy=multi-user.target

Additional systemd options such as NoNewPrivileges and ProtectSystem are used to provide basic service hardening and reduce the potential attack surface.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
---
- name: Create Node Exporter systemd service
  ansible.builtin.template:
    src: node_exporter.j2.service
    dest: "{{ node_exporter_unit_path }}"
    mode: "0644"
  notify:
    - Restart node_exporter

- name: Enable and start {{ node_exporter_service_name }}
  ansible.builtin.systemd:
    name: "{{ node_exporter_service_name }}"
    enabled: true
    state: started
    daemon_reload: true

The variables needed here can be defined as:

1
2
node_exporter_unit_path: "/etc/systemd/system/{{ node_exporter_service_name }}.service"
node_exporter_service_name: "node_exporter"

Cleanup

The final stage for this role is to perform cleanup. Since all intermediate artifacts are stored in a single working directory, it can be removed entirely through a single task:

1
2
3
4
5
---
- name: Remove Node Exporter work directory
  ansible.builtin.file:
    path: "{{ node_exporter_work_dir }}"
    state: absent

Main.yml

The role ties the phases together in tasks/main.yml:

1
2
3
4
5
---
- import_tasks: prepare.yml
- import_tasks: install.yml
- import_tasks: configure.yml
- import_tasks: cleanup.yml

This now finally completes our Node Exporter role.

Example Playbook

The final step is to actually use our role and run it on our target hosts. To do that, we will create playbooks/node_exporter.yml:

1
2
3
4
5
6
7
8
---
- name: Set up Node Exporter on target nodes
  hosts: target_hosts
  become: true
  roles:
    - node_exporter
  tags:
    - node_exporter

Then we run our playbook using the command:

1
ansible-playbook -i inventory playbooks/node_exporter.yml --tags=node_exporter

Installing a Different Version

You can override the version, 1.10.2, without doing any modifications to the role. For example, if we want to use 1.10.0 instead, then we override the node_exporter_version and node_exporter_checksum values:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
- name: Install Node Exporter
  hosts: vm-1
  become: true
  gather_facts: true
  vars:
    node_exporter_version: 1.10.0
    node_exporter_checksum: "sha256:6c2a4d886fc82743f7f97b8e0a0cd80ab0199939edce3cf0b9b36dad8d48d9b4"
  roles:
    - node_exporter
  tags:
    - node_exporter

Verification

Once the playbook finishes execution, we can verify the exporter is successfully configured and setup by:

Checking the service is running

1
sudo systemctl status node_exporter

Confirm the exporter is listening on port 9100

1
ss -ltn | grep 9100 

Query the endpoint

1
curl http://<ansible_host>:9100/metrics

We can also create a dedicated Ansible task for performing the verification for us, making it more suitable for a production environment.

Conclusion

In this post, we built a reusable Ansible role for deploying Node Exporter. By structuring the role into clear phases and driving it through variables, we ensure maintainability and consistency across environments. The role is also written to be easily adjustable by modifying variables without changing the internal implementation.

The complete source code for this post can be found on GitHub.

In the next post, we will build on this foundation by deploying Prometheus and configuring it to scrape metrics from our exporters.