[{"content":"Introduction In the previous post, we discussed how to use the AWS VPN client to connect to private subnets in a VPC. We used Entra ID to perform federated user-based authentication. This technique is useful for enterprises using federated authentication, but there is another way to achieve the same result using AWS Systems Manager Session Manager (SSM) and port forwarding.\nArchitecture The diagram above shows the main components used for the SSM port forwarding session setup. The diagram shows the following components:\nThe bastion host EC2 instance that is located in the private subnet and that is used to initiate the port forwarding session with the attached IAM role The EC2 instance security group that allows inbound traffic only from the VPC CIDR range The RDS instance in the private subnet that will be used as the target for the port forwarding session The RDS security group that allows inbound traffic on the DB port (5432) only from the bastion host EC2 instance The SSM session manager document that will be used to initiate the port forwarding session The NAT gateway that will be used to route traffic from the private subnet to the internet The development local machine that will be used to connect to the RDS instance RDS DB Connection with SSM Port Forwarding The following parameters are required for the port forwarding session:\nAWS Region (e.g. eu-central-1) EC2 instance id (e.g. i-04e29d6b82ca52fbc) RDS host (e.g. mytestdb-instance-1.clzjs8sy98st.eu-central-1.rds.amazonaws.com) RDS port (e.g. 5432) RDS username (e.g. postgres) RDS password (e.g. arn:aws:secretsmanager:eu-central-1:123456789012:secret:rds!) RDS cluster identifier (e.g. cluster-506ac9d4-c6f0-5421-911f-85dec405f14a-A12ncL) RDS DB name (e.g. mytestdb) Install the session manager aws cli plugin locally and start a new session. Keep the session running and execute further commands from a new terminal session:\n1 2 3 4 5 $ aws ssm start-session --target i-04e29d6b82ca52fbc --document-name AWS-StartPortForwardingSessionToRemoteHost --parameters \u0026#39;{\u0026#34;portNumber\u0026#34;:[\u0026#34;5432\u0026#34;],\u0026#34;localPortNumber\u0026#34;:[\u0026#34;1053\u0026#34;],\u0026#34;host\u0026#34;:[\u0026#34;mytestdb-instance-1.clzjs8sy98st.eu-central-1.rds.amazonaws.com\u0026#34;]}\u0026#39; Starting session with SessionId: 429fec6e-29cc-422b-892f-9b8a6973c131-a5xit52zpidhlt7bb95rglflzy Port 1053 opened for sessionId 429fec6e-29cc-422b-892f-9b8a6973c131-a5xit52zpidhlt7bb95rglflzy. Waiting for connections... Export the correct AWS region and RDS parameters in a new terminal session to connect from a local client:\n1 2 3 4 5 6 export AWS_REGION=\u0026#34;eu-central-1\u0026#34; export RDS_HOST=\u0026#34;localhost\u0026#34; export RDS_PORT=\u0026#34;5432\u0026#34; export RDS_USERNAME=\u0026#34;postgres\u0026#34; export RDS_DB=\u0026#34;mytestdb\u0026#34; export PGPASSWORD=\u0026#34;$(aws secretsmanager get-secret-value --secret-id \u0026#39;arn:aws:secretsmanager:eu-central-1:123456789012:secret:rds!cluster-506ac9d4-c6f0-5421-911f-85dec405f14a-A12ncL\u0026#39; --query SecretString --output text | jq -r \u0026#39;.password\u0026#39;)\u0026#34; The PGPASSWORD environment variable is automatically picked up by psql when connecting to a PostgreSQL server. Connect using psql to localhost and port 5432 that match the forwarded session that was initiated:\n1 2 3 4 5 6 7 8 9 10 11 12 13 $ psql -h $RDS_HOST -p $RDS_PORT \u0026#34;dbname=$RDS_DB user=$RDS_USERNAME\u0026#34; psql-18 (18.3 (Homebrew), server 17.7) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off, ALPN: postgresql) Type \u0026#34;help\u0026#34; for help. mytestdb=\u0026gt; \\dt List of tables Schema | Name | Type | Owner --------+-------------+-------+---------- public | dummy_table | table | postgres (1 row) mytestdb=\u0026gt; The connection was established, and the describe table command shows the proper output.\nConclusion This post demonstrates how to use AWS Systems Manager Session Manager to establish a port forwarding session to a remote DB host. This is useful for scenarios where you need to connect to a remote DB instance from a local development machine for debugging or testing purposes. No ssh connection or public internet-facing endpoints are required for this scenario which makes it a practical solution with minimal overhead.\nReferences:\nhttps://aws.amazon.com/blogs/mt/use-port-forwarding-in-aws-systems-manager-session-manager-to-connect-to-remote-hosts/ ","date":"2026-04-30T00:00:00Z","image":"https://habibiops.com/p/aws-systems-manager-session-manage-port-forwarding/assets/ssm-cover.drawio.svg","permalink":"https://habibiops.com/p/aws-systems-manager-session-manage-port-forwarding/","title":"AWS Systems Manager Session Manager Port Forwarding"},{"content":"Introduction In the previous blog posts part 1 and part 2, we have Node Exporter and Prometheus set up. We can already monitor our VMs and get valuable information. However, running PromQL queries repeatedly is not practical for day-to-day monitoring or visualization.\nFor that, we will deploy Grafana and import a pre-built dashboard so that we can visualize the scraped metrics. By the end of this post, we will have:\nA reusable Grafana Ansible role Provisioning of the data source and dashboards Docker volume for persisting the Grafana data Prerequisites For the demo in this post to work, you need to have:\nOne or more target VMs Node Exporter must be preconfigured and set up Docker to be installed and available on the Prometheus server community.docker to be installed An existing Prometheus instance Architecture Overview For simplicity, we are going to focus only on Grafana and its immediate components:\nA dedicated host for Grafana with an internal IP Grafana runs as a Docker container and exposes the port 3000 A Docker volume will mount onto the container for persisting grafana-related data Once all is done, we will have the full architecture setup:\nGrafana Similar to Prometheus, Grafana can be installed directly on the host or deployed as a container. At this point, it is clear that we will deploy Grafana as a container. The Ansible role will be broken down into two phases:\nPrepare: Create the directories, render the configuration template and create the Docker volume Deploy: Pull, run the image and mount the created volume onto the container Variables The variables for this role will be split into five categories:\nPaths and provisioning Prometheus integration Container integration Storage and volumes Endpoint 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 --- grafana_provisioning_dir: \u0026#34;/opt/grafana/provisioning\u0026#34; grafana_datasources_dir: \u0026#34;{{ grafana_provisioning_dir }}/datasources\u0026#34; grafana_prometheus_datasource: \u0026#34;{{ grafana_datasources_dir }}/prometheus.yml\u0026#34; # prometheus variables grafana_prometheus_host: \u0026#34;prometheus\u0026#34; grafana_prometheus_port: 9090 grafana_dashboards_dir: \u0026#34;{{ grafana_provisioning_dir }}/dashboards\u0026#34; grafana_dashboard_provider_file: \u0026#34;{{ grafana_dashboards_dir }}/dashboard_provider.yml\u0026#34; grafana_node_exporter_dashboard_file: \u0026#34;{{ grafana_dashboards_dir }}/node_exporter_full.json\u0026#34; grafana_image_name: \u0026#34;grafana/grafana\u0026#34; grafana_image_tag: \u0026#34;13.0\u0026#34; grafana_image: \u0026#34;{{ grafana_image_name }}:{{ grafana_image_tag }}\u0026#34; grafana_container_name: \u0026#34;grafana-instance\u0026#34; grafana_listen_port: 3000 grafana_container_ports: - \u0026#34;{{ grafana_listen_port }}:{{ grafana_listen_port }}\u0026#34; grafana_volume_name: \u0026#34;grafana-data\u0026#34; grafana_container_volumes: - \u0026#34;{{ grafana_volume_name }}:/var/lib/grafana\u0026#34; - \u0026#34;{{ grafana_datasources_dir }}:/etc/grafana/provisioning/datasources:ro\u0026#34; - \u0026#34;{{ grafana_dashboards_dir }}:/etc/grafana/provisioning/dashboards:ro\u0026#34; grafana_prometheus_datasource_url: \u0026#34;http://{{ grafana_prometheus_host }}:{{ grafana_prometheus_port }}\u0026#34; grafana_prometheus_datasource_name: \u0026#34;Prometheus\u0026#34; grafana_endpoint_url: \u0026#34;http://{{ ansible_host }}:{{ grafana_listen_port }}\u0026#34; Preparation Our focus for this phase is:\nCreating config directories, which we will later bind-mount into the container Render the data source and dashboard provisioning templates Create the Docker volume, which will persist Grafana’s internal state (database, users, plugins). Dashboards and data sources will be provisioned via bind mounts Dashboard Template We define the dashboard provider template in templates/dashboards.yml.j2:\n1 2 3 4 5 6 7 8 9 10 11 apiVersion: 1 providers: - name: \u0026#34;default\u0026#34; orgId: 1 folder: \u0026#34;Infrastructure\u0026#34; type: file disableDeletion: false updateIntervalSeconds: 30 options: path: /etc/grafana/provisioning/dashboards Prometheus Data Source Template We define the template in templates/prometheus_datasource.yml.j2 so that we can pre-include our Prometheus data source:\n1 2 3 4 5 6 7 8 9 apiVersion: 1 datasources: - name: \u0026#34;{{ grafana_prometheus_datasource_name }}\u0026#34; type: prometheus access: proxy url: \u0026#34;{{ grafana_prometheus_datasource_url }}\u0026#34; isDefault: true editable: false Later in this series, we will further extend the template to include an Alert data source.\nDeployment The deployment phase consists of two steps:\nPulling the Docker image Running the container with the volume and bind-mounts included This can be done using these two tasks:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 --- - name: Pull Grafana Docker image community.docker.docker_image: name: \u0026#34;{{ grafana_image }}\u0026#34; source: pull register: grafana_docker_image_pull - name: Run Grafana container community.docker.docker_container: name: \u0026#34;{{ grafana_container_name }}\u0026#34; image: \u0026#34;{{ grafana_image }}\u0026#34; state: started restart_policy: always recreate: \u0026#34;{{ grafana_docker_image_pull.changed }}\u0026#34; ports: \u0026#34;{{ grafana_container_ports }}\u0026#34; volumes: \u0026#34;{{ grafana_container_volumes }}\u0026#34; Main.yml The role ties the phases together in tasks/main.yml:\n1 2 3 4 5 6 --- - name: Run the host-preparation tasks ansible.builtin.import_tasks: prepare.yml - name: Run the Docker setup ansible.builtin.import_tasks: deploy.yml This completes the Grafana role.\nExample Playbook The final step is to actually use our role and run it on our Prometheus host. To do that, we will create playbooks/grafana.yml:\n1 2 3 4 5 6 7 8 --- - name: Set up Grafana hosts: grafana become: true roles: - grafana tags: - grafana Then we run our playbook using the command:\n1 ansible-playbook -i inventory playbooks/grafana.yml --tags=grafana Localhost Deployment You can perform a safe, local deployment for testing purposes:\n1 2 3 4 5 6 7 8 - name: Install Grafana on localhost hosts: localhost become: false gather_facts: true roles: - grafana tags: - grafana Then we can run the playbook using the command:\n1 ansible-playbook playbooks/grafana.yml --tags=grafana You can now access Grafana from the browser at 127.0.0.1:3000 with the default credentials:\n1 2 username: admin password: admin Verification Once the playbook finishes execution, we can verify Grafana is up and running by:\nChecking running containers: 1 docker ps Querying the endpoint 1 curl http://\u0026lt;ansible_host\u0026gt;:3000/api/health If all goes well, you should get:\n1 2 3 4 5 { \u0026#34;database\u0026#34;: \u0026#34;ok\u0026#34;, \u0026#34;version\u0026#34;: \u0026#34;13.0.1\u0026#34;, \u0026#34;commit\u0026#34;: \u0026#34;\u0026lt;hash\u0026gt;\u0026#34; } We can also create a dedicated Ansible task for performing the verification for us, making it more suitable for a production environment.\nConclusion In this post, we built a reusable Ansible role for deploying Grafana as a Docker container while also provisioning the data sources and Node Exporter Full dashboard.\nThe complete source code for this post can be found on GitHub.\nSo far, we have a basic monitoring stack setup that can be easily deployed on demand. But we are still missing a few key components to transform it into a more production-level solution. One of these critical components is Alerts, and that will be covered in the next post.\n","date":"2026-04-18T00:00:00Z","image":"https://habibiops.com/p/setting-up-monitoring-with-prometheus-and-grafana-part-3-grafana/assets/cover_hu_14894fc04654c623.png","permalink":"https://habibiops.com/p/setting-up-monitoring-with-prometheus-and-grafana-part-3-grafana/","title":"Setting Up Monitoring with Prometheus and Grafana - Part 3 - Grafana"},{"content":"Introduction In the previous post, we have covered how to deploy Node Exporter using our custom-built Ansible role. Node Exporter is only the first piece of the puzzle in building a monitoring stack solution. The next critical component in the monitoring stack is Prometheus.\nBy default, Prometheus uses local storage for persisting metrics data. These metrics are stored in a built-in Time Series Database (TSDB). Prometheus also supports integration with various remote storage solutions. For this post, we will focus on local storage and use a Docker volume to persist metrics data even when the container is restarted.\nBy the end of this post, you will have:\nA reusable Prometheus Ansible role Persistent metrics storage using Docker volumes Automatic discovery of Node Exporter targets Prerequisites For the demo in this post to work, you need to have:\nOne or more target VMs Node Exporter must be preconfigured and set up Docker to be installed and available on the Prometheus server community.docker to be installed Architecture Overview We will first look at the architecture, then move on to implementation.\nAnsible host that runs the deployment Target VMs that have Node Exporter installed and running as a systemd service A Prometheus host configured to monitor the target VMs Prometheus follows a pull-based model, periodically scraping metrics from configured targets. The diagram below illustrates the setup:\nPrometheus Prometheus can be installed directly on the host or deployed as a container. By now, it\u0026rsquo;s recommended to use containerized deployments, and this is what we will be doing. Our Prometheus Ansible role will be broken down into two main phases:\nPrepare: Create the directories, render the Prometheus configuration template and create the Docker volume Deploy: Pull, run the image and mount the created volume onto the container Core Variables Just as in the Node Exporter, we will utilize variables to make our role flexible and adaptable without changing the internal implementation. The main variables are:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 prometheus_provisioning_dir: \u0026#34;/opt/prometheus\u0026#34; prometheus_config_dir: \u0026#34;{{ prometheus_provisioning_dir }}/config\u0026#34; prometheus_config_file: \u0026#34;{{ prometheus_config_dir }}/prometheus.yml\u0026#34; prometheus_port: 9090 prometheus_image_name: \u0026#34;prom/prometheus\u0026#34; prometheus_image_tag: \u0026#34;v3\u0026#34; prometheus_image: \u0026#34;{{ prometheus_image_name }}:{{ prometheus_image_tag }}\u0026#34; prometheus_container_name: \u0026#34;prometheus-instance\u0026#34; prometheus_volume_name: \u0026#34;prometheus-data\u0026#34; prometheus_container_ports: - \u0026#34;{{ prometheus_port }}:{{ prometheus_port }}\u0026#34; prometheus_container_volumes: - \u0026#34;{{ prometheus_volume_name }}:/prometheus\u0026#34; - \u0026#34;{{ prometheus_config_dir }}:/etc/prometheus:ro\u0026#34; We mount the configuration as read-only (/etc/prometheus:ro) to prevent accidental modification from inside the container.\nPreparation Our focus for this phase is:\nCreating config directories, which we will later bind-mount into the container Render the Prometheus configuration template Create the Docker volume, which will store Prometheus\u0026rsquo; TSDB 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 --- - name: Create Prometheus provisioning directories ansible.builtin.file: path: \u0026#34;{{ item }}\u0026#34; state: directory mode: \u0026#34;0755\u0026#34; loop: - \u0026#34;{{ prometheus_provisioning_dir }}\u0026#34; - \u0026#34;{{ prometheus_config_dir }}\u0026#34; - name: Render Prometheus configuration ansible.builtin.template: src: prometheus.yml.j2 dest: \u0026#34;{{ prometheus_config_file }}\u0026#34; mode: \u0026#34;0644\u0026#34; register: prometheus_config - name: Create Prometheus data volume community.docker.docker_volume: name: \u0026#34;{{ prometheus_volume_name }}\u0026#34; Prometheus Config Jinja Template We will be using the following templates/prometheus.yml.j2 template:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ... scrape_configs: - job_name: \u0026#34;prometheus\u0026#34; metrics_path: \u0026#34;/metrics\u0026#34; scrape_interval: 15s static_configs: - targets: [\u0026#34;{{ ansible_host }}:{{ prometheus_port }}\u0026#34;] - job_name: \u0026#34;nodes\u0026#34; metrics_path: \u0026#34;/metrics\u0026#34; scrape_interval: 15s static_configs: - targets: {% for host in groups[\u0026#39;targets\u0026#39;] %} - \u0026#34;{{ hostvars[host][\u0026#39;ansible_host\u0026#39;] }}:{{ prometheus_node_exporter_port }}\u0026#34; {% endfor %} There are two scrape configs:\nPrometheus, allowing it to expose and monitor its own internal metrics nodes or the target VMs. The targets are dynamically generated as we loop over the inventory groups['targets'] and we set the metrics endpoint. The metrics endpoint has the format of \u0026lt;ansible_host\u0026gt;:\u0026lt;node_exporter_port\u0026gt;. Each target exposes its metrics via an HTTP endpoint (typically /metrics), which Prometheus periodically scrapes. This is defined in the configuration file via metrics_path: \u0026quot;/metrics\u0026quot;.\nBelow is an example of a rendered prometheus.yml template:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: \u0026#34;prometheus\u0026#34; metrics_path: \u0026#34;/metrics\u0026#34; scrape_interval: 15s static_configs: - targets: [\u0026#34;prometheus:9090\u0026#34;] - job_name: \u0026#34;nodes\u0026#34; metrics_path: \u0026#34;/metrics\u0026#34; scrape_interval: 15s static_configs: - targets: - \u0026#34;vm-1:9100\u0026#34; - \u0026#34;vm-2:9100\u0026#34; Rendering of the hosts depends on how ansible_host is defined in your inventory. On localhost, this will be resolved to 127.0.0.1\nDeployment The deployment phase consists of two steps:\nPulling the Docker image Running the container with the volume and bind-mounts included This can be done using these two tasks:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 --- - name: Pull Prometheus image community.docker.docker_image: name: \u0026#34;{{ prometheus_image }}\u0026#34; source: pull register: prometheus_image_pull - name: Run Prometheus container community.docker.docker_container: name: \u0026#34;{{ prometheus_container_name }}\u0026#34; image: \u0026#34;{{ prometheus_image }}\u0026#34; state: started restart_policy: always recreate: \u0026#34;{{ prometheus_config.changed or prometheus_image_pull.changed }}\u0026#34; ports: \u0026#34;{{ prometheus_container_ports }}\u0026#34; volumes: \u0026#34;{{ prometheus_container_volumes }}\u0026#34; Main.yml The role ties the phases together in tasks/main.yml:\n1 2 3 4 5 6 --- - name: Run Prepare tasks ansible.builtin.import_tasks: prepare.yml - name: Run Deploy tasks ansible.builtin.import_tasks: deploy.yml This completes our Prometheus role.\nExample Playbook The final step is to actually use our role and run it on our Prometheus host. To do that, we will create playbooks/prometheus.yml:\n1 2 3 4 5 6 7 8 --- - name: Set up Prometheus hosts: prometheus become: true roles: - prometheus tags: - prometheus Then we run our playbook using the command:\n1 ansible-playbook -i inventory playbooks/prometheus.yml --tags=prometheus Localhost Deployment You can perform a safe, local deployment in case you do not have a remote host:\n1 2 3 4 5 6 7 8 9 10 11 - name: Install Prometheus on localhost hosts: localhost become: false gather_facts: true vars: prometheus_provisioning_dir: \u0026#34;./tmp/prometheus\u0026#34; ansible_host: \u0026#34;127.0.0.1\u0026#34; roles: - prometheus tags: - prometheus Since we\u0026rsquo;re running on localhost, we don\u0026rsquo;t need to explicitly specify an inventory file. So we can run our playbook directly:\n1 ansible-playbook playbooks/prometheus.yml --tags=prometheus You can now access Prometheus from the browser at 127.0.0.1:9090\nVerification Once the playbook finishes execution, we can verify Prometheus is up and running by:\nChecking running containers: 1 docker ps Querying the endpoint 1 curl http://\u0026lt;ansible_host\u0026gt;:9090/-/healthy If all goes well, you should get Prometheus Server is Healthy. as an output.\nWe can also create a dedicated Ansible task for performing the verification for us, making it more suitable for a production environment.\nWe can also access the dashboard from our browser and verify that our targets are being monitored:\nConclusion In this post, we built a reusable Ansible role for deploying Prometheus as a Docker container. By structuring the role into clear phases and driving it through variables, we ensure maintainability and consistency across environments. The role is also written to be easily adjustable by modifying variables without changing the internal implementation.\nThe complete source code for this post can be found on GitHub.\nBy now, we have Node Exporter installed on the target hosts and Prometheus pulling the metrics. The next step is to visualize and explore these metrics using Grafana.\n","date":"2026-04-05T00:00:00Z","image":"https://habibiops.com/p/setting-up-monitoring-with-prometheus-and-grafana-part-2-prometheus/assets/cover_hu_ccdb4872f170a25f.png","permalink":"https://habibiops.com/p/setting-up-monitoring-with-prometheus-and-grafana-part-2-prometheus/","title":"Setting Up Monitoring with Prometheus and Grafana - Part 2 - Prometheus"},{"content":"Introduction A monitoring stack is only useful when it is deployed consistently and remains easy to maintain over time. Manual installations and one-off configurations tend to drift quickly, which makes upgrades and troubleshooting harder than they need to be.\nIn this series, we build the stack component by component using Ansible roles. This first post focuses only on Prometheus Node Exporter, which is a lightweight collector that exports and exposes a wide variety of hardware- and kernel-related metrics. We will go over the installation process, running as a hardened systemd service, and exposing the metrics on port 9100 for later scraping by Prometheus.\nNode Exporter It is recommended to run Node Exporter directly on the host itself instead of a container since the exporter needs to access kernel-related metrics. Although it is possible to run in a container, the /proc, /sys and other directories have to be mounted for it to function properly. In this post, we will focus on host-based set up for the exporter.\nThe role is broken down into four phases:\nPreparation: Create user, group and working directory Installation: Download and install the Node Exporter binary Configure: Create and manage systemd service for the exporter Cleanup: Remove temporary artifacts Core Variables Rather than hard-coding installation details into the role, we keep the behavior driven by defaults/main.yml. That way, we get a clean base role that can be reused across hosts and environments with only variable overrides.\nThe core variables for this role are defined as:\n1 2 3 4 5 6 7 8 9 10 11 12 --- # Core configuration (commonly overridden) node_exporter_version: \u0026#34;1.10.2\u0026#34; node_exporter_user: \u0026#34;node_exporter\u0026#34; node_exporter_group: \u0026#34;node_exporter\u0026#34; node_exporter_work_dir: \u0026#34;/tmp/node_exporter\u0026#34; node_exporter_arch: \u0026#34;linux-amd64\u0026#34; node_exporter_archive_name: \u0026#34;node_exporter-{{ node_exporter_version }}.{{ node_exporter_arch }}.tar.gz\u0026#34; node_exporter_port: 9100 node_exporter_listen_address: \u0026#34;{{ ansible_host }}:{{ node_exporter_port }}\u0026#34; Preparation It is worth setting up the environment before directly jumping into the installation. That way, we avoid cluttering our host and maintain a clean state. The environment will also help us in achieving:\nSecurity: The Node Exporter should not run as root or with escalated privileges. Instead, we will create a system user and group for the exporter. Isolation: All operations such as download, extractions, etc. will reside in a temporary work directory, keeping the host system clean and unaffected. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 --- - name: Create node_exporter group ansible.builtin.group: name: \u0026#34;{{ node_exporter_group }}\u0026#34; system: true - name: Create node_exporter user ansible.builtin.user: name: \u0026#34;{{ node_exporter_user }}\u0026#34; group: \u0026#34;{{ node_exporter_group }}\u0026#34; system: true shell: /usr/sbin/nologin create_home: false - name: Create Node Exporter work directory ansible.builtin.file: path: \u0026#34;{{ node_exporter_work_dir }}\u0026#34; state: directory mode: \u0026#34;0755\u0026#34; These tasks ensure Node Exporter runs as a non-privileged system user and the work directory provides the isolation layer by keeping all temporary files contained.\nInstallation Download We can use Ansible\u0026rsquo;s built-in get_url module for downloading the release archive:\n1 2 3 4 5 6 --- - name: Get Node Exporter release ansible.builtin.get_url: url: \u0026#34;{{ node_exporter_release_url }}\u0026#34; dest: \u0026#34;{{ node_exporter_archive_path }}\u0026#34; mode: \u0026#34;0644\u0026#34; As an added layer of safety, especially in production, add a checksum value to verfy the downloaded archive\nThe values for url and dest are derived variables, defined in the defaults/main.yml as:\n1 2 node_exporter_release_url: \u0026#34;https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/{{ node_exporter_archive_name }}\u0026#34; node_exporter_archive_path: \u0026#34;{{ node_exporter_work_dir }}/{{ node_exporter_archive_name }}\u0026#34; Unarchiving The downloaded release is a Tar file, so we cannot use it directly. Instead, we will be using Ansible\u0026rsquo;s built-in unarchive module:\n1 2 3 4 5 - name: Unarchive Node Exporter ansible.builtin.unarchive: src: \u0026#34;{{ node_exporter_archive_path }}\u0026#34; dest: \u0026#34;{{ node_exporter_work_dir }}\u0026#34; remote_src: true Setting remote_src: true tells Ansible that the archive already exists on the remote host, so the unarchiving process happens on the remote host itself.\nInstallation The final step in this phase is installing the exporter binary. We defined our helper derived variables in defaults/main.yml:\n1 2 3 4 node_exporter_extracted_name: \u0026#34;node_exporter-{{ node_exporter_version }}.{{ node_exporter_arch }}\u0026#34; node_exporter_extracted_dir_path: \u0026#34;{{ node_exporter_work_dir }}/{{ node_exporter_extracted_name }}\u0026#34; node_exporter_extracted_bin_path: \u0026#34;{{ node_exporter_extracted_dir_path }}/node_exporter\u0026#34; node_exporter_bin_path: \u0026#34;/usr/local/bin/node_exporter\u0026#34; /usr/local/bin/ would make Node Exporter available system-wide and is the recommended installation path\nWe copy the exporter from our temp work directory to the /usr/local/bin directory:\n1 2 3 4 5 6 7 - name: Install the Node Exporter binary ansible.builtin.copy: src: \u0026#34;{{ node_exporter_extracted_bin_path }}\u0026#34; dest: \u0026#34;{{ node_exporter_bin_path }}\u0026#34; remote_src: true mode: \u0026#34;0755\u0026#34; notify: Restart node_exporter Setting mode: \u0026quot;0755\u0026quot; ensures the binary is executable by all users while remaining writable only by the owner.\nNotice the notify part at the end of the task. A change in binary ensures the service is only restarted when the binary actually changes. The handler Restart node_exporter is implemented in handlers/main.yml:\n1 2 3 4 5 --- - name: Restart node_exporter ansible.builtin.systemd: name: \u0026#34;{{ node_exporter_service_name }}\u0026#34; state: restarted Configure At this point, we managed to download and extract the exporter\u0026rsquo;s binary. The next step is to create systemd unit for the exporter. Using systemd ensures the exporter is managed consistently, starts on boot, and can be controlled via standard system tooling.\nThis can be done through a two-step process:\nCreate the systemd unit file Enable and start the service Systemd Unit File The systemd unit file will be implemented as a jinja template. The file templates/node_exporter.j2.service:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [Unit] Description=Prometheus Node Exporter Wants=network-online.target After=network-online.target [Service] User={{ node_exporter_user }} Group={{ node_exporter_group }} Type=simple NoNewPrivileges=yes PrivateTmp=yes ProtectSystem=full ProtectHome=yes ExecStart={{ node_exporter_bin_path }} --web.listen-address={{ node_exporter_listen_address }} [Install] WantedBy=multi-user.target Additional systemd options such as NoNewPrivileges and ProtectSystem are used to provide basic service hardening and reduce the potential attack surface.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 --- - name: Create Node Exporter systemd service ansible.builtin.template: src: node_exporter.j2.service dest: \u0026#34;{{ node_exporter_unit_path }}\u0026#34; mode: \u0026#34;0644\u0026#34; notify: - Restart node_exporter - name: Enable and start {{ node_exporter_service_name }} ansible.builtin.systemd: name: \u0026#34;{{ node_exporter_service_name }}\u0026#34; enabled: true state: started daemon_reload: true The variables needed here can be defined as:\n1 2 node_exporter_unit_path: \u0026#34;/etc/systemd/system/{{ node_exporter_service_name }}.service\u0026#34; node_exporter_service_name: \u0026#34;node_exporter\u0026#34; Cleanup The final stage for this role is to perform cleanup. Since all intermediate artifacts are stored in a single working directory, it can be removed entirely through a single task:\n1 2 3 4 5 --- - name: Remove Node Exporter work directory ansible.builtin.file: path: \u0026#34;{{ node_exporter_work_dir }}\u0026#34; state: absent Main.yml The role ties the phases together in tasks/main.yml:\n1 2 3 4 5 --- - import_tasks: prepare.yml - import_tasks: install.yml - import_tasks: configure.yml - import_tasks: cleanup.yml This now finally completes our Node Exporter role.\nExample Playbook The final step is to actually use our role and run it on our target hosts. To do that, we will create playbooks/node_exporter.yml:\n1 2 3 4 5 6 7 8 --- - name: Set up Node Exporter on target nodes hosts: target_hosts become: true roles: - node_exporter tags: - node_exporter Then we run our playbook using the command:\n1 ansible-playbook -i inventory playbooks/node_exporter.yml --tags=node_exporter Installing a Different Version You can override the version, 1.10.2, without doing any modifications to the role. For example, if we want to use 1.10.0 instead, then we override the node_exporter_version and node_exporter_checksum values:\n1 2 3 4 5 6 7 8 9 10 11 - name: Install Node Exporter hosts: vm-1 become: true gather_facts: true vars: node_exporter_version: 1.10.0 node_exporter_checksum: \u0026#34;sha256:6c2a4d886fc82743f7f97b8e0a0cd80ab0199939edce3cf0b9b36dad8d48d9b4\u0026#34; roles: - node_exporter tags: - node_exporter Verification Once the playbook finishes execution, we can verify the exporter is successfully configured and setup by:\nChecking the service is running 1 sudo systemctl status node_exporter Confirm the exporter is listening on port 9100 1 ss -ltn | grep 9100 Query the endpoint 1 curl http://\u0026lt;ansible_host\u0026gt;:9100/metrics We can also create a dedicated Ansible task for performing the verification for us, making it more suitable for a production environment.\nConclusion In this post, we built a reusable Ansible role for deploying Node Exporter. By structuring the role into clear phases and driving it through variables, we ensure maintainability and consistency across environments. The role is also written to be easily adjustable by modifying variables without changing the internal implementation.\nThe complete source code for this post can be found on GitHub.\nIn the next post, we will build on this foundation by deploying Prometheus and configuring it to scrape metrics from our exporters.\n","date":"2026-03-25T00:00:00Z","image":"https://habibiops.com/p/setting-up-monitoring-with-prometheus-and-grafana-part-1-node-exporter/assets/cover.svg","permalink":"https://habibiops.com/p/setting-up-monitoring-with-prometheus-and-grafana-part-1-node-exporter/","title":"Setting Up Monitoring with Prometheus and Grafana - Part 1 - Node Exporter"},{"content":"Introduction Some enterprise customers do not allow SSH access to their AWS EC2 instances for security reasons. However, they still need to access their EC2 instances from their corporate network or other remote locations for debugging purposes. Setting up a bastion host inside a public subnet is unfortunately not an option in this case.\nOne approach is using AWS Systems Manager (SSM) Session Manager to establish a secure connection between the EC2 instance and the developer\u0026rsquo;s client. Another approach is using AWS Client VPN to establish a VPN tunnel between the private subnets where the instance is running and any developer machine. One advantage of using AWS Client VPN is that it allows you to control access to the instance based on the identity of the developer using Entra ID for authentication, for example - including MFA. Another advantage is the target network association on the subnet level with and the authorization rules that would drop any traffic that does not match those predefined firewall rules.\nIn this post, we will see how to set up an AWS Client VPN endpoint using Terraform and how to manage access using Entra ID. We will also demonstrate how to use an internal bastion host that is located inside the private subnets without having a public IP address as a secure gateway to access other internal EC2 instances.\nPrerequisites We will be using Entra ID for authentication and user management. The first step would be to create a new enterprise application inside the Azure Portal Microsoft Entra admin center.\nIAM SAML Identity Provider The details are described in an official Microsoft Entra ID article here. Here is a high-level summary of the necessary steps:\nNew enterprise application (AWS ClientVPN) Setup Single Sing-On (SSO) with SAML Identifier (Entity ID): urn:amazon:webservices:clientvpn Reply URL (Assertion Consumer Service URL): http://127.0.0.1:35001 Attributes \u0026amp; Claims: FirstName: user.givenname LastName: user.surname memberOf: user.groups (important for using authorization rules using groups and their ids) Unique User Identifier: user.userprincipalname SAML Signing Certificate / Signing Options: Sign SAML response and assertion (otherwise authentication will be rejected by AWS) AWS mentions how to set up federated authentication here including the details needed for the user attributes and claims. After creating the enterprise application, you will need to download Federation Metadata XML file and then create a new identity provider in the AWS IAM section of the AWS console. The downloaded metadata XML file should be uploaded to the newly created identity provider.\nAWS only allows one SAML identity provider per Client VPN endpoint, and Entra ID only allows one enterprise application with the same Unique User Identifier per Azure tenant. This means you cannot create multiple SAML identity providers in the same Azure tenant if you have multiple AWS Client VPN endpoints. If multiple AWS Client VPN endpoints are planned, then you will need to create multiple Entra ID groups in the same enterprise application and use the group IDs in the authorization rules to have separate access to the different private subnets. More on this topic in the Authorization Rules section.\nServer Certificate The \u0026ldquo;Authentication Info\u0026rdquo; section requires a \u0026ldquo;Server certificate ARN\u0026rdquo; when creating the client VPN endpoint. The certificate could be a self-signed certificate for our use case since we will not be using certificate-based authentication but federated user-based authentication. Make sure the following required X509v3 extensions are present in the server certificate:\nX509v3 Extended Key Usage: TLS Web Server Authentication X509v3 Key Usage: Digital Signature, Key Encipherment Here is an example of certificate generation using easy-rsa:\n1 2 3 4 5 6 7 8 9 git clone https://github.com/OpenVPN/easy-rsa.git cd easy-rsa/easyrsa3/ ./easyrsa init-pki # create a new CA ./easyrsa build-ca nopass # create a new server certificate ./easyrsa build-server-full ec2vpn.myawesomecompany.internal nopass # will not be needed for this case but is mentioned here for completeness ./easyrsa build-client-full ec2vpnclient1.myawesomecompany.internal nopass This can now be added inside AWS Certificate Manager (ACM), and the ARN can be used as the Server certificate ARN.\nClient VPN Endpoint Setup Create a client VPN endpoint with the following main settings:\nAssociation type: VPC and then choose the VPC ID. client IPv4 CIDR block (The address range cannot overlap with the target network address range, the VPC address range, or any of the routes that will be associated with the client VPN endpoint): For example, use 172.20.0.0/22 if your VPC CIDR block is 10.0.0.0/16. Authentication / Server certificate ARN: previously created server certificate ARN. Authentication / Use user-based authentication / Federated authentication / SAML provider ARN: previously created inside the IAM identity provider. Connection protocol / Tunnel mode: Full-tunnel if all client traffic should be routed through the VPN tunnel or use Split-tunnel if only traffic matching client VPN endpoint routes should be routed through the VPN tunnel. Target Network Associations After creating the client VPN endpoint, you will need to associate it with one or more subnets that should be accessible from the client VPN endpoint. There is one limitation here, however: only one subnet association per availability zone is allowed per client VPN endpoint.\nIf you have multiple private subnets in the same availability zone, or separate private and database subnets in the same availability zone, for example, then you need to use a private subnet as a central bastion host to reach the other subnets or to connect to the database.\nAuthorization Rules Authorization rules act as firewall rules to grant specific clients access to the specified network. You should have an authorization rule for each network you want to grant access to. Since we have set up Entra ID for authentication and added the SAML identity provider, we can use the Entra ID group IDs as the access group IDs in the authorization rules.\nLet\u0026rsquo;s add an authorization rule to access the private subnet that hosts our EC2 instance:\nDestination network to enable access: 10.0.2.0/24 Grant access to: Allow access to users in a specific access group Access group ID: 07fc77bc-fa54-4b61-931e-2be27857671b Only users in the specified access group will be able to access the private subnet. All other traffic coming from other users connected to the client VPN endpoint will be dropped. This way, we can control access to the private subnets based on the identity of the user inside Entra ID.\nRoute Table Depending on whether you are using full-tunnel or split-tunnel, you will need to add an extra route to enable internet access on your local machine while connected to the client VPN endpoint. If you are using split-tunnel, then you can skip this step. If you are using full-tunnel, then you need to add an extra route where the destination is 0.0.0.0/0.\nCreate route:\nRoute destination: 0.0.0.0/0 Subnet ID for target network association: select any associated subnet that has outgoing internet connectivity using a NAT gateway Architecture Diagram We have explained all the steps above in detail. Here is a high-level architecture diagram of the proposed solution:\nVPN users are managed using Entra ID: users in the allowed access group will be able to access the private subnets. there is one bastion host inside the private subnets that can be reached when connected using the AWS Client VPN. the private subnets have outgoing internet connectivity using a NAT gateway. the bastion host can reach other internal EC2 instances such as the backend VM or the DB using the private IP address as long as the security group allows it. EC2 instances in the public subnet such as the load balancer VM do not need to be accessible from the internet via ssh. No ssh port is open to the public internet, and all traffic is routed through the VPN tunnel and then internally via the bastion host (jumphost). Automation with Terraform The GitHub repository for this blog post is available here. We will go over the main resources that were used in the module. The aws_ec2_client_vpn_endpoint resource is the main resource that we will use to create the client VPN endpoint. The network associations, authorization rules, and routes are created using the following resources:\naws_ec2_client_vpn_network_association aws_ec2_client_vpn_authorization_rule aws_ec2_client_vpn_route The examples folder contains an example of a new vpc with a new client VPN endpoint. The VPC contains three private subnets that are associated with the client VPN endpoint. The user has to provide a public ssh key, the server certificate ARN, and the SAML identity provider ARN. The Entra ID group ID is also used inside the authorization rules section.\nConclusion In this post, we have seen how to set up an AWS Client VPN endpoint and how to manage access using Entra ID. Federated user-based authentication is an important requirement in most enterprises for compliance reasons. The solution to use an internal bastion host that is located inside the private subnets without having a public IP address that can be reached from the client VPN endpoint was discussed. From that point on, using the bastion host as a secure gateway to access other internal EC2 instances meets the requirements of most organizations.\nReferences:\nConfigure AWS ClientVPN for Single sign-on with Microsoft Entra ID Single sign-on — SAML 2.0-based federated authentication — in Client VPN How to Integrate AWS Client VPN with Azure Active Directory easy-rsa - Simple shell based CA utility AWS Client VPN authorization rules cloudposse / terraform-aws-ec2-client-vpn The Incredibly Pedantic AWS Client VPN Documentation ","date":"2026-03-18T00:00:00Z","image":"https://habibiops.com/p/aws-client-vpn-endpoint-setup/assets/ec2vpn-cover.drawio.svg","permalink":"https://habibiops.com/p/aws-client-vpn-endpoint-setup/","title":"AWS Client VPN Endpoint Setup"},{"content":"Introduction OpenStack is a mature, open-source platform for building private and public clouds. In this post, we’ll take our first deep dive into provisioning OpenStack resources using Terraform.OpenStack automation resources are surprisingly scarce, especially in the context of modern DevOps workflows. This guide aims to fill that gap by walking through a practical hands-on example. We’ll cover several OpenStack components and their interactions. Specifically, you’ll learn how to:\nConfigure and authenticate Terraform with the OpenStack provider. Create a compute instance and manage associated resources. Understand the relationship between OpenStack services and Terraform. What We Will Build By the end of this guide, Terraform will provision:\na compute instance a security group allowing SSH access an SSH keypair reference a block storage volume attached to the instance The infrastructure will be organized using three Terraform modules:\nsecurity compute storage Setup You can follow along on your local machine or any virtual machine. The only prerequisites are:\nAn existing OpenStack deployment. Terraform installed on your system. Once installed, you’re ready to start building your infrastructure as code.\nConfiguring Terraform for OpenStack The first step in our project is creating a provider.tf file, which sets up the OpenStack provider and handles authentication with your private cloud. At a high level, there are two key components:\nTerraform block – defines required providers and versions. OpenStack provider block – handles authentication. Terraform Block Add the following to your provider.tf:\n1 2 3 4 5 6 7 8 terraform { required_providers { openstack = { source = \u0026#34;terraform-provider-openstack/openstack\u0026#34; version = \u0026#34;\u0026gt;= 3.4.0\u0026#34; } } } This ensures Terraform installs the OpenStack provider when you run terraform init. At the time of writing, version 3.4.0 is the latest stable release.\nInitialize your project with:\n1 terraform init You should see output similar to:\n1 2 3 Initializing the backend ... Terraform has been successfully initialized! Authentication Next, add the OpenStack provider block in provider.tf:\n1 2 3 4 5 6 7 8 provider \u0026#34;openstack\u0026#34; { tenant_name = \u0026#34;\u0026lt;project-name\u0026gt;\u0026#34; auth_url = \u0026#34;\u0026lt;auth-url\u0026gt;\u0026#34; region = \u0026#34;\u0026lt;region\u0026gt;\u0026#34; application_credential_id = \u0026#34;\u0026lt;cred-id\u0026gt;\u0026#34; application_credential_secret = \u0026#34;\u0026lt;cred-secret\u0026gt;\u0026#34; } We are using Application Credentials instead of a username and password. These credentials are project-scoped and linked to a user, which is why a username isn’t required.\nThe following diagram illustrates the authentication flow in OpenStack using application credentials:\nOpenStack Components for a VM Creating a single functioning VM in OpenStack requires several components to be in place. At a high level, three major resources are needed:\nSecurity: SSH: Assigns the SSH key to connect to the VM Security Group/s: firewall rules to control traffic (SSH, ICMP etc.). Block Storage: Additional persistent storage attached to the VM (optional). Compute Instance: OS Image: The operating system template for the VM. Flavor: Defines CPU, RAM, and disk configuration of the VM. Network: One or more network interfaces to attach to the VM Regarding the network, a default one already exists in the project. So there\u0026rsquo;s no need to provision/create it in Terraform for this example.\nProject Structure Each main component will be implemented as a separate module. We will be covering each of these components as we go.\nSecurity The security module has no dependencies on other modules, so we create it first. In OpenStack, the Security Groups consist of two resources:\nThe Security Group itself. One or more Security Rules. In Terraform, they would be translated to the following resources:\nopenstack_networking_secgroup_v2 openstack_networking_secgroup_rule_v2 Defining the Security Group Security Group is probably the simplest resource to create in Terraform, as it only takes two parameters:\nname description The code snippet below creates an external_access Security Group, which will later hold the rules to allow external SSH connections:\n1 2 3 4 resource \u0026#34;openstack_networking_secgroup_v2\u0026#34; \u0026#34;tf_external_access\u0026#34; { name = var.secgroup_tf_external_access_name description = var.secgroup_tf_external_access_description } All the variables referenced here are defined in the variables.tf file.\nDefining the Security Rules Now that we have our Security Group defined, we can proceed to create the rules. Every Security Rule is its own separate resource. We only need one rule in our example, so one resource is enough:\n1 2 3 4 5 6 7 8 9 resource \u0026#34;openstack_networking_secgroup_rule_v2\u0026#34; \u0026#34;ipv4_ingress_rule\u0026#34; { direction = \u0026#34;ingress\u0026#34; # allow incoming traffic ethertype = \u0026#34;IPv4\u0026#34; protocol = \u0026#34;tcp\u0026#34; port_range_min = 22 port_range_max = 22 remote_ip_prefix = \u0026#34;0.0.0.0/0\u0026#34; security_group_id = openstack_networking_secgroup_v2.tf_external_access.id } Notice the security_group_id parameter that links our rule to the previously created Security Group. This creates an implicit dependency for Terraform and allows it to know how which resources are dependent on each other.\nSSH The SSH keypair is defined in the security/keypair.tf file as a resource:\n1 2 3 4 resource \u0026#34;openstack_compute_keypair_v2\u0026#34; \u0026#34;tf_demo_keypair\u0026#34; { name = var.ssh_keypair_name public_key = file(var.ssh_pubkey_filename) } The user has to provide the public key file name, and a new key pair will be created inside the project.\nCompute The openstack_compute_instance_v2 resource is responsible for creating one or more OpenStack compute instances. As a reminder, we need to define the following:\nFlavor OS image SSH keypair Security Group Network This can be done using the code snippet below:\n1 2 3 4 5 6 7 8 9 10 11 12 resource \u0026#34;openstack_compute_instance_v2\u0026#34; \u0026#34;tf-hello-compute\u0026#34; { name = var.compute_instance_name image_name = var.compute_image_name flavor_name = var.compute_flavor_name key_pair = var.keypair security_groups = var.security_groups network { name = var.network_name } } security_groups is defined as a variable. At this stage it does not yet have a value as it will be provided by the security module. The modules are not aware of one another. This dependency will be resolved later when the modules are connected in the root configuration.\nThe diagram below illustrates how the compute resources fit into the overall architecture:\nNow onto our storage module.\nStorage The volume is created using openstack_blockstorage_volume_v3:\n1 2 3 4 5 resource \u0026#34;openstack_blockstorage_volume_v3\u0026#34; \u0026#34;volume-1\u0026#34; { size = var.volume_size volume_type = var.volume_type name = var.volume_name } We only need the above three variables. But there\u0026rsquo;s still something missing, it is not attached to anything. Right now, the volume just exists in a detached state.\nA block storage volume in OpenStack is independent of compute instances. The attachment is handled by the openstack_compute_volume_attach_v2 resource, which links a volume to a compute instance. This can be done as:\n1 2 3 4 resource \u0026#34;openstack_compute_volume_attach_v2\u0026#34; \u0026#34;va_1\u0026#34; { instance_id = var.instance_id volume_id = openstack_blockstorage_volume_v3.volume-1.id } The diagram below helps to visualize the storage module:\nThe attachment acts as the bridge between our compute instance and the block storage.\nModule Outputs Each module exposes outputs that can be consumed by other modules. This allows Terraform to automatically build the dependency graph. For example:\n1 2 3 4 5 6 7 8 output \u0026#34;compute_instance\u0026#34; { value = { id = openstack_compute_instance_v2.tf-hello-compute.id name = openstack_compute_instance_v2.tf-hello-compute.name image_name = openstack_compute_instance_v2.tf-hello-compute.image_name flavor_name = openstack_compute_instance_v2.tf-hello-compute.flavor_name } } These outputs are later referenced in main.tf when connecting modules.\nAll outputs are purposely defined to be in a JSON format. This makes it easier to consume them programmatically if we ever need to.\nLinking the Modules Right now, the modules exist independently and are unaware of one another. To piece everything together, we create main.tf file at the root of the project. The main.tf file will act as an entrypoint and ties all modules together:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 module \u0026#34;security\u0026#34; { source = \u0026#34;./security\u0026#34; ssh_keypair_name = \u0026#34;tf-demo-keypair\u0026#34; ssh_pubkey_filename = \u0026#34;ssh_key.pub\u0026#34; } module \u0026#34;storage\u0026#34; { source = \u0026#34;./storage\u0026#34; instance_id = module.compute.compute_instance.id volume_name = \u0026#34;tf-volume\u0026#34; volume_type = \u0026#34;ceph\u0026#34; } module \u0026#34;compute\u0026#34; { source = \u0026#34;./compute\u0026#34; keypair = module.security.keypair security_groups = [module.security.tf_external_access.name] compute_flavor_name = \u0026#34;small\u0026#34; compute_image_name = \u0026#34;Ubuntu 22.04 LTS\u0026#34; network_name = \u0026#34;external_network\u0026#34; } The diagram below gives us a bird\u0026rsquo;s eye view of the entire architecture that we have so far, showing the relation and interaction between each component:\nTerraform Plan and Apply Now that we finally have everything set, we\u0026rsquo;re ready to continue with the provisioning and run our first terraform plan command. Terraform would output all the resources that would be created, modified and/or deleted.\nSince this is a new project, we expect to have only additions.\nThe output is far too long to display here, so only the summary will be shown\nYou should expect to have:\n1 Plan : 8 to add, 0 to change, 0 to destroy. Review the plan output, check each resource and see if you can relate which resource is to which module. Terraform even tells you this info, for example:\n1 2 3 4 5 6 7 8 # module.storage.openstack_compute_volume_attach_v2.va_1 will be created + resource \u0026#34;openstack_compute_volume_attach_v2\u0026#34; \u0026#34;va_1\u0026#34; { + device = (known after apply) + id = (known after apply) + instance_id = (known after apply) + region = (known after apply) + volume_id = (known after apply) } Apply When all is set and done, we can go ahead and run:\n1 terraform apply You can review the plan again, and if all\u0026rsquo;s set, you can go ahead and type yes, and Terraform would start creating the resources one after the other. In the end, you should get an output similar to this one:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 infrastructure = { \u0026#34;compute\u0026#34; = { \u0026#34;compute_instance\u0026#34; = { \u0026#34;flavor_name\u0026#34; = \u0026#34;tiny\u0026#34; \u0026#34;id\u0026#34; = \u0026#34;\u0026lt;redacted\u0026gt;\u0026#34; \u0026#34;image_name\u0026#34; = \u0026#34;Ubuntu 22.04\u0026#34; \u0026#34;name\u0026#34; = \u0026#34;tf-hello-compute\u0026#34; } } \u0026#34;security\u0026#34; = { \u0026#34;keypair\u0026#34; = \u0026#34;dk-aout\u0026#34; \u0026#34;tf_external_access\u0026#34; = { \u0026#34;id\u0026#34; = \u0026#34;\u0026lt;redacted\u0026gt;\u0026#34; \u0026#34;name\u0026#34; = \u0026#34;tf_external_access\u0026#34; } } \u0026#34;storage\u0026#34; = { \u0026#34;blockstorage\u0026#34; = { \u0026#34;device\u0026#34; = \u0026#34;/dev/vdb\u0026#34; \u0026#34;id\u0026#34; = \u0026#34;\u0026lt;redacted\u0026gt;\u0026#34; \u0026#34;instance_id\u0026#34; = \u0026#34;\u0026lt;redacted\u0026gt;\u0026#34; \u0026#34;name\u0026#34; = \u0026#34;tf_ceph_volume\u0026#34; } } } Conclusion We\u0026rsquo;ve covered a lot in this post, from the basics of integrating Terraform with the OpenStack provider, to provisioning an OpenStack project and creating independent modules with their own output. There are still plenty of improvements to be made though. For example:\nCredentials are currently stored in plaintext in the provider.tf file. Terraform is being executed directly from a local or remote machine instead of a containerized environment. There is no CI/CD pipeline in place. State management and secret handling are not yet addressed. We will be covering the topics step by step in future posts, working towards a fully automated, secure and robust Terraform workflow for managing infrastructure in a private OpenStack cloud.\nNote: This GitHub repo contains the complete source code used in this post.\n","date":"2026-03-08T00:00:00Z","image":"https://habibiops.com/p/provisioning-openstack-project-with-terraform/assets/cover_hu_175303c73fe4eb0d.png","permalink":"https://habibiops.com/p/provisioning-openstack-project-with-terraform/","title":"Provisioning OpenStack Project with Terraform"},{"content":"Introduction In the previous post, we had a deep dive over the basics of fio and the different ways it can be fine-tuned to get a proper benchmark. That was also one of the main issues we faced before: there are plenty of combinations and it\u0026rsquo;s not feasible to test all of them. In this post, we\u0026rsquo;ll be building on top of our previous knowledge and go over how we can automate this process. Mainly, we will go over:\nAuto-generating fio commands. Parsing all the json files and converting them to a condensed csv summaries. Plotting the csv file to visualize and analyze the results. Auto-generating FIO Commands You might have noticed by now that there are so many different options and combinations to test. What if we want to tinker with the number of jobs and/or iodepth? To see if our disk performance scales linearly and if so, to what extent? How would a different blocksize impact our IO performance? Would the IO run perform consistently for longer periods of time?\nWe can keep on running one command after the other, but that becomes a pain, error-prone and boring too quickly. For that, we will implement a Python script to generate the commands for us. One approach that I rely on is to have a JSON config file, specifying the different parameters and options that we\u0026rsquo;d like to test. The config file would look something like this:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 { \u0026#34;benchmark_type\u0026#34;: \u0026#34;time\u0026#34;, \u0026#34;volumes\u0026#34;: [ \u0026#34;/dev/vdb\u0026#34; ], \u0026#34;ioengines\u0026#34;: [ \u0026#34;libaio\u0026#34; ], \u0026#34;blocksize\u0026#34;: [ \u0026#34;4k\u0026#34;, \u0026#34;8k\u0026#34;, \u0026#34;16k\u0026#34;, \u0026#34;32k\u0026#34;, \u0026#34;64k\u0026#34;, \u0026#34;128k\u0026#34;, \u0026#34;256k\u0026#34;, \u0026#34;512k\u0026#34;, \u0026#34;1024k\u0026#34;, \u0026#34;2048k\u0026#34;, \u0026#34;4096k\u0026#34; ], \u0026#34;numjobs\u0026#34;: [ 1, 2, 4, 8, 16, 32 ], \u0026#34;iodepths\u0026#34;: [ 1, 2, 4, 8, 16, 32 ], \u0026#34;direct\u0026#34;: [ 1 ], \u0026#34;ramp_time\u0026#34;: 5, \u0026#34;runtime\u0026#34;: [ 30 ], \u0026#34;rw\u0026#34;: [ \u0026#34;read\u0026#34;, \u0026#34;randread\u0026#34;, \u0026#34;write\u0026#34;, \u0026#34;randwrite\u0026#34;, \u0026#34;rw\u0026#34;, \u0026#34;randrw\u0026#34; ] } With this config, a total of 2376 different commands will be generated! So be mindful when setting up the config and be reasonable what to test. Otherwise, the test may take several days to finish.\nParsing the JSON Config File The method below will load the JSON file and return a JSON object, or a dict in Python:\n1 2 3 4 5 6 7 8 9 import json import itertools CONFIG_FILE = \u0026#34;config.json\u0026#34; def load_config(path): with open(path, \u0026#34;r\u0026#34;) as f: return json.load(f) Generate Commands from JSON Config Once the JSON config is loaded, we need to pass it to the generate_fio_commands method. The method can be divided into two parts:\nGenerating the combinations using itertools.product. Looping over the list and building the fio commands. Using itertools.product Python has a built-in itertools to generate the Cartesian product of our fio parameters. We\u0026rsquo;ll be taking advantage of it and using it as follows:\n1 2 3 4 5 6 7 8 9 def generate_fio_commands(cfg): combinations = itertools.product( cfg[\u0026#34;volumes\u0026#34;], cfg[\u0026#34;ioengines\u0026#34;], cfg[\u0026#34;blocksize\u0026#34;], cfg[\u0026#34;numjobs\u0026#34;], cfg[\u0026#34;iodepths\u0026#34;], cfg[\u0026#34;rw\u0026#34;], ) This will generate an array, or a list, containing all the combinations. This simplifies a lot of the work for us, as we would just need to loop over it and start building our command.\nYou can check the official documentation for further info\nBuilding the FIO Commands Lastly, we need to loop over the generated list and construct our commands:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 def generate_fio_commands(cfg): ... for volume, ioengine, blocksize, numjobs, iodepth, rw in combinations: name = f\u0026#34;{rw}_{blocksize}_{numjobs}_{iodepth}\u0026#34; cmd = ( \u0026#34;sudo fio \u0026#34; f\u0026#34;--name=name \u0026#34; f\u0026#34;--filename={volume} \u0026#34; f\u0026#34;--ioengine={ioengine} \u0026#34; f\u0026#34;--bs={blocksize} \u0026#34; f\u0026#34;--numjobs={numjobs} \u0026#34; f\u0026#34;--iodepth={iodepth} \u0026#34; f\u0026#34;--rw={rw} \u0026#34; f\u0026#34;--direct={cfg[\u0026#39;direct\u0026#39;]} \u0026#34; f\u0026#34;--ramp_time={cfg[\u0026#39;ramp_time\u0026#39;]} \u0026#34; f\u0026#34;--runtime={cfg[\u0026#39;runtime\u0026#39;]} \u0026#34; f\u0026#34;--time_based \u0026#34; f\u0026#34;--output-format=json \u0026#34; f\u0026#34;\u0026gt; {name}.json\u0026#34; ) commands.append(cmd) return commands Implementing the main Method All we need now is to implement the main method and put everything together:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 def main(): config = load_config(CONFIG_FILE) commands = generate_fio_commands(config) with open(\u0026#34;assets/fio_commands.sh\u0026#34;, \u0026#34;w\u0026#34;) as f: # add the shell header line f.write(\u0026#34;#!/bin/bash\\n\u0026#34;) for cmd in commands: f.write(f\u0026#34;{cmd}\\n\\n\u0026#34;) print(f\u0026#34;\\nGenerated {len(commands)} fio commands\u0026#34;) if __name__ == \u0026#34;__main__\u0026#34;: main() And now we\u0026rsquo;re done! The script will generate a total of 2376 fio commands and write them to a valid fio_commands.sh file. The only \u0026ldquo;manual\u0026rdquo; step that we have to do is to run the generated fio_commands.sh file. Though we can technically extend our Python script to run that bash script, but for now, that will suffice.\nParsing FIO Output The last piece of the puzzle is to parse the JSON output. Just as before, we\u0026rsquo;re going to break our script into different sections. What we mainly want to do is:\nLoad the JSON output (same as before, so will be skipped). Extract the key metrics. Print the results/summary. Extracting the Key Metrics The main values we care about from the JSON output are mainly:\niops latency bandwidth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 def parse_fio_json(path): # load JSON output data = load_output(path) results = [] for job in data.get(\u0026#34;jobs\u0026#34;, []): job_name = job.get(\u0026#34;jobname\u0026#34;, \u0026#34;unknown\u0026#34;) for io_op in (\u0026#34;read\u0026#34;, \u0026#34;write\u0026#34;): stats = job.get(io_op, {}) # ignore and move to the next op if not stats or stats.get(\u0026#34;io_bytes\u0026#34;, 0) == 0: continue # MB/s bandwidth_mbps = stats.get(\u0026#34;bw\u0026#34;, 0) / 1024 iops = stats.get(\u0026#34;iops\u0026#34;, 0) latency_ns = stats.get(\u0026#34;lat_ns\u0026#34;, {}).get(\u0026#34;mean\u0026#34;, 0) results.append({ \u0026#34;job\u0026#34;: job_name, \u0026#34;io_op\u0026#34;: io_op, \u0026#34;bandwidth_mbps\u0026#34;: bandwidth_mbps, \u0026#34;iops\u0026#34;: iops, \u0026#34;latency_ns\u0026#34;: latency_ns, }) return results Print the Results We implement our main method to display the summary of our fio output:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 def main(): results = parse_fio_json(OUTPUT_FILE) for r in results: print( f\u0026#34;Job: {r[\u0026#39;job\u0026#39;]}\\n\u0026#34; f\u0026#34;Mode: {r[\u0026#39;io_op\u0026#39;]}\\n\u0026#34; f\u0026#34;BW: {r[\u0026#39;bandwidth_mbps\u0026#39;]:\u0026gt;.2f} MB/s\\n\u0026#34; f\u0026#34;IOPS: {r[\u0026#39;iops\u0026#39;]:\u0026gt;.2f}\\n\u0026#34; f\u0026#34;Latency: {r[\u0026#39;latency_ns\u0026#39;] / 1e6:\u0026gt;.2f} ms\u0026#34; ) if __name__ == \u0026#34;__main__\u0026#34;: main() Results To keep things a bit simpler, I\u0026rsquo;ve generated only 3 commands and have plotted them. The code is still written in a way that\u0026rsquo;s generic to handle any number of commands, to parse through the results, and plot them as well.\nHere\u0026rsquo;s the config.json that I\u0026rsquo;ve used:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 { \u0026#34;benchmark_type\u0026#34;: \u0026#34;time\u0026#34;, \u0026#34;volumes\u0026#34;: [ \u0026#34;/dev/vdb\u0026#34; ], \u0026#34;ioengines\u0026#34;: [ \u0026#34;libaio\u0026#34; ], \u0026#34;blocksize\u0026#34;: [ \u0026#34;4k\u0026#34;, \u0026#34;64k\u0026#34;, \u0026#34;1024k\u0026#34; ], \u0026#34;numjobs\u0026#34;: [ 1 ], \u0026#34;iodepths\u0026#34;: [ 1 ], \u0026#34;direct\u0026#34;: [ 1 ], \u0026#34;ramp_time\u0026#34;: 5, \u0026#34;runtime\u0026#34;: [ 30 ], \u0026#34;rw\u0026#34;: [ \u0026#34;read\u0026#34; ] } Basically testing small, medium, and large blocksizes and how our I/O behaves accordingly. The plot before summarizes the results for us:\nInitially, we can notice that 64k seems to be the sweetspot for the blocksize, having achieved the highest throughput while also maintaining decent IOPs and minimal sacrifice in terms of latency.\nConclusion By now, you should be able to automate your I/O benchmarking with fio and even adapt the scripts to your needs. The config.json file provides plenty of flexibility to test and try out all sorts of combinations. The complete source code for this post can be found on GitHub.\nIt\u0026rsquo;s possible to generate nearly all tests and experiments for all the I/O operations (read, write, rw, etc.) and compare the performance of each configuration to better fine-tune and optimize for your use case. The work can be expanded even further to a full CI/CD pipeline, having each step mentioned in the post (generating commands, executing commands, parsing, etc.) as a separate stage. This concludes our I/O benchmarking, having covered the basics and advanced techniques for automating our tests.\n","date":"2026-02-22T00:00:00Z","image":"https://habibiops.com/p/io-benchmarking-with-fio-part-2-automation/assets/fio_perf_comparison_hu_1101ce12796577a5.png","permalink":"https://habibiops.com/p/io-benchmarking-with-fio-part-2-automation/","title":"I/O Benchmarks with FIO - Part 2 - Automation"},{"content":"Introduction Accessing AWS resources from outside AWS using only temporary credentials and without creating an IAM user with an access key is a problem that everyone who frequently works with AWS faces at one point. AWS IAM Roles Anywhere helps to solve this issue by enabling applications running on servers on premises to access AWS resources with AWS temporary credentials. It relies on public key infrastructure (PKI) for establishing trust between AWS and the workloads having certificates issued by your own private certificate authority (CA). You create a trust anchor by referencing your own certificate authority (CA) and then configure a role to trust IAM Roles Anywhere by mentioning the trust anchor in the trust policy. In addition, you place a condition in the trust policy to restrict the access to only a specific common name (CN) of a certificate.\nI recently had a use case where an application running inside a pod on an external Kubernetes cluster needed to access AWS resources. Something similar to IAM roles for service accounts (IRSA) would have been ideal, but IRSA cannot work on pods running on external Kubernetes clusters. A very straightforward solution to this problem is to use an IAM user with an access key, but this is not ideal and the security department will not be happy having static credentials being used by the application. Let\u0026rsquo;s see how we to set up IAM Roles Anywhere to be used from the application running inside a Kubernetes pod.\nIAM Roles Anywhere Setup inside AWS We have to do two steps inside AWS to configure IAM Roles Anywhere:\nEstablish trust: create a trust anchor. Configure IAM roles properly: properly adjust the trust policy and permissions. In order to establish trust, we can set up a Certificate Authority (CA) using AWS Certificate Manager Private Certificate Authority or use an existing company CA to create a trust anchor. We had an existing CA, so we will be using it in the following steps.\nFirst step: create a trust anchor, choose external certificate bundle, and paste the PEM-encoded certificate bundle in the field.\nThe second step is to configure the IAM role properly. You can attach the necessary permissions to the role directly. The trust policy will dictate the conditions under which the role can be assumed. Here is an example trust policy that can be used for our use case:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 { \u0026#34;Version\u0026#34;: \u0026#34;2012-10-17\u0026#34;, \u0026#34;Statement\u0026#34;: [ { \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Principal\u0026#34;: { \u0026#34;Service\u0026#34;: \u0026#34;rolesanywhere.amazonaws.com\u0026#34; }, \u0026#34;Action\u0026#34;: [ \u0026#34;sts:AssumeRole\u0026#34;, \u0026#34;sts:TagSession\u0026#34;, \u0026#34;sts:SetSourceIdentity\u0026#34; ], \u0026#34;Condition\u0026#34;: { \u0026#34;StringEquals\u0026#34;: { \u0026#34;aws:PrincipalTag/x509Subject/CN\u0026#34;: \u0026#34;myAwesomePodApp\u0026#34;, \u0026#34;aws:PrincipalTag/x509Issuer/O\u0026#34;: \u0026#34;My Awesome Org\u0026#34;, \u0026#34;aws:PrincipalTag/x509SAN/DNS\u0026#34;: \u0026#34;myAwesomePodApp.com\u0026#34;, \u0026#34;aws:PrincipalTag/x509Subject/O\u0026#34;: \u0026#34;My Awesome Org\u0026#34;, \u0026#34;aws:PrincipalTag/x509Subject/ST\u0026#34;: \u0026#34;AwesomeState\u0026#34;, \u0026#34;aws:PrincipalTag/x509Subject/C\u0026#34;: \u0026#34;AwesomeCountry\u0026#34;, \u0026#34;aws:PrincipalTag/x509Issuer/ST\u0026#34;: \u0026#34;AwesomeState\u0026#34;, \u0026#34;aws:PrincipalTag/x509Issuer/C\u0026#34;: \u0026#34;AwesomeCountry\u0026#34;, \u0026#34;aws:PrincipalTag/x509Issuer/CN\u0026#34;: \u0026#34;awesome-internal-ca\u0026#34; }, \u0026#34;ArnEquals\u0026#34;: { \u0026#34;aws:SourceArn\u0026#34;: \u0026#34;arn:aws:rolesanywhere:eu-central-1:123456789012:trust-anchor/c3281693-76c9-48ea-8162-8d48d22c203f\u0026#34; } } } ] } Notice that the Condition part can list all the details of the certificate that the pod is using such as the common name (CN), the organization (O), the DNS name (DNS), etc. You can decide which parts of the certificate are important for your use case but as a minimum include the CN and DNS details.\nAfter setting up the trust anchor and the IAM role properly, the application holding a valid certificate signed by the CA can assume the role to access AWS resources if the certificate matches the conditions in the trust policy. For example, if the certificate has the common name myAwesomePodApp, and the DNS name myAwesomePodApp.com, is signed by the CA awesome-internal-ca, and all the other specified conitions inside the trust policy, the application can assume the role.\nIAM Roles Anywhere Setup on Kubernetes Pods To use IAM Roles Anywhere and get temporary credentials, we need to install the official credential helper tool. The helper tool manages the process of calling the endpoint to get session credentials and automatically refreshing them before they expire by using the certificate and associated private key. The helper tool has three commands that are interesting for us:\n1 2 3 credential-process Retrieve AWS credentials in the appropriate format for external credential processes serve Serve AWS credentials through a local endpoint update Updates a profile in the AWS credentials file with new AWS credentials The first command credential-process would retrieve the credentials by providing the certificate and private key. The trust anchor arn along the profile arn can be fetched by clicking on the respective details in the IAM Roles Anywhere console. For example, the below command would retrieve the temporary credentials when run from the terminal:\n1 2 3 4 5 6 ./aws_signing_helper credential-process \\ --certificate ./myAwesomePodApp.crt \\ --private-key ./myAwesomePodApp.key \\ --trust-anchor-arn arn:aws:rolesanywhere:eu-central-1:123456789012:trust-anchor/c3281693-76c9-48ea-8162-8d48d22c203f \\ --profile-arn arn:aws:rolesanywhere:eu-central-1:123456789012:profile/0b3ced02-1f68-4b63-892d-f9ab0117cbfd \\ --role-arn arn:aws:iam::123456789012:role/myAwesomePodAppIAMRole The update command would obtain the temporary credentials and write them directly to the AWS credentials file. The serve command is a very interesting option for us because it allows us to run a local endpoint that vends temporary security credentials from IAM Roles Anywhere acting like a local token vending machine endpoint. This endpoint can be used by applications running inside the Kubernetes cluster or other VMs to obtain temporary credentials. Our legacy application can benefit from the serve command by consuming the temporary credentials using that endpoint.\nHere is an example of the serve command with identical parameters to the command above:\n1 2 3 4 5 6 ./aws_signing_helper serve \\ --certificate ./myAwesomePodApp.crt \\ --private-key ./myAwesomePodApp.key \\ --trust-anchor-arn arn:aws:rolesanywhere:eu-central-1:123456789012:trust-anchor/c3281693-76c9-48ea-8162-8d48d22c203f \\ --profile-arn arn:aws:rolesanywhere:eu-central-1:123456789012:profile/0b3ced02-1f68-4b63-892d-f9ab0117cbfd \\ --role-arn arn:aws:iam::123456789012:role/myAwesomePodAppIAMRole The serve command starts a local server that listens on 127.0.0.1 at the default port 9911. We can then use the local endpoint as follows with the aws cli:\n1 export AWS_EC2_METADATA_SERVICE_ENDPOINT=http://127.0.0.1:9911 After that, you can run the aws cli commands as usual. This is an example if the aws cli is not available and the aws sdk cannot be used by the application for some reason:\n1 2 3 4 export AWS_EC2_METADATA_SERVICE_ENDPOINT=http://127.0.0.1:9911 export IAM_ROLE=myAwesomePodAppIAMRole TOKEN=$(curl -X PUT \u0026#34;$AWS_EC2_METADATA_SERVICE_ENDPOINT/latest/api/token\u0026#34; -H \u0026#34;X-aws-ec2-metadata-token-ttl-seconds: 21600\u0026#34;) curl -H \u0026#34;X-aws-ec2-metadata-token: $TOKEN\u0026#34; $AWS_EC2_METADATA_SERVICE_ENDPOINT/latest/meta-data/iam/security-credentials/$IAM_ROLE Notice that it implements the same URIs and request headers as IMDSv2. The command will also automatically refresh the credentials five minutes before the previous ones expire.\nArchitecture Diagram Here is the architecture diagram of the entire solution:\nWe have two pods running in the external Kubernetes cluster outside AWS:\nThe application pod. It uses the temporary credentials that are fetched by the sidecar container. The AWS Signing Helper Pod running as a sidecar container. It uses the certificate and private key mounted as a secret. It uses the parameters needed to call the credential helper tool mounted as a config map. We discuss the Kubernetes manifest files below.\nAWS Signing Helper Pod The sidecar container for the credential helper tool is a simple container image that uses the official AWS CLI image as the base image:\n1 2 3 4 5 6 7 8 9 10 11 12 FROM public.ecr.aws/aws-cli/aws-cli:2.11.18 ARG IAMRA_VERSION=\u0026#34;1.7.3\u0026#34; ENV IAMRA_VERSION=$IAMRA_VERSION WORKDIR /usr/local/bin RUN curl -O \u0026#34;https://rolesanywhere.amazonaws.com/releases/${IAMRA_VERSION}/X86_64/Linux/aws_signing_helper\u0026#34; RUN chmod a+x aws_signing_helper USER 1000 For convenience, we will store the built image in ECR and call it iamra_aws_signing_helper.\nCertificate Secret Manifest File Here is the manifest file that stores the certificate and private key:\n1 2 3 4 5 6 7 8 9 10 11 12 --- apiVersion: v1 kind: Secret metadata: name: iamra-myapp-cert namespace: myapp type: kubernetes.io/tls data: tls.crt: | LS0tLS1... tls.key: | LS0tLS1... Config Map Manifest File Here is the manifest file that stores the parameters needed to call the credential helper tool:\n1 2 3 4 5 6 7 8 9 10 --- apiVersion: v1 kind: ConfigMap metadata: name: iamra-myapp-conf namespace: myapp data: IAMRA_ROLE_ARN: \u0026#34;arn:aws:iam::123456789012:role/myAwesomePodAppIAMRole\u0026#34; IAMRA_TRUST_ANCHOR_ARN: \u0026#34;arn:aws:rolesanywhere:eu-central-1:123456789012:trust-anchor/c3281693-76c9-48ea-8162-8d48d22c203f\u0026#34; IAMRA_PROFILE_ARN: \u0026#34;arn:aws:rolesanywhere:eu-central-1:123456789012:profile/0b3ced02-1f68-4b63-892d-f9ab0117cbfd\u0026#34; Application Pod Manifest File Here is the manifest file that runs the application pod and the sidecar container:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 --- apiVersion: v1 kind: Pod metadata: name: iamra-myapp-pod namespace: myapp spec: containers: - name: myAwesomeApp image: 123456789012.dkr.ecr.eu-central-1.amazonaws.com/my_awesome_legacy_app:2.1.34 command: [ \u0026#39;sh\u0026#39;, \u0026#39;-c\u0026#39; ] args: - aws sts get-caller-identity; aws s3 ls; # access whatever you want here ./my_app do_something_with_s3_data; # process the data aws s3 cp app_data s3://my-bucket/app_data; # upload to S3 imagePullPolicy: Always env: - name: AWS_EC2_METADATA_SERVICE_ENDPOINT value: \u0026#34;http://127.0.0.1:9911/\u0026#34; - name: AWS_REGION value: \u0026#34;eu-central-1\u0026#34; - name: iamra-aws-signing-helper image: 123456789012.dkr.ecr.eu-central-1.amazonaws.com/iamra_aws_signing_helper:1.7.3 command: [ \u0026#34;aws_signing_helper\u0026#34; ] args: - \u0026#34;serve\u0026#34; - \u0026#34;--certificate\u0026#34; - \u0026#34;/mnt/credentials/tls.crt\u0026#34; - \u0026#34;--private-key\u0026#34; - \u0026#34;/mnt/credentials/tls.key\u0026#34; - \u0026#34;--role-arn\u0026#34; - \u0026#34;$(ROLE_ARN)\u0026#34; - \u0026#34;--trust-anchor-arn\u0026#34; - \u0026#34;$(TA_ARN)\u0026#34; - \u0026#34;--profile-arn\u0026#34; - \u0026#34;$(PROFILE_ARN)\u0026#34; env: - name: ROLE_ARN valueFrom: configMapKeyRef: key: IAMRA_ROLE_ARN name: iamra-myapp-conf - name: TA_ARN valueFrom: configMapKeyRef: key: IAMRA_TRUST_ANCHOR_ARN name: iamra-myapp-conf - name: PROFILE_ARN valueFrom: configMapKeyRef: key: IAMRA_PROFILE_ARN name: iamra-myapp-conf volumeMounts: - mountPath: /mnt/credentials name: credentials resources: requests: memory: \u0026#34;256Mi\u0026#34; cpu: \u0026#34;512m\u0026#34; limits: memory: \u0026#34;512Mi\u0026#34; cpu: \u0026#34;1024m\u0026#34; imagePullPolicy: Always volumes: - name: credentials secret: secretName: iamra-myapp-cert The iamra-aws-signing-helper container runs the credential helper tool with the parameters from the config map. The application container myAwesomeApp sets the environment variable AWS_EC2_METADATA_SERVICE_ENDPOINT to the local endpoint and uses the aws cli to access S3 resources. The legacy application now can access AWS resources without having to use static credentials. The credential helper tool will automatically refresh the credentials five minutes before expiration time.\nConclusion In this post, we configured IAM Roles Anywhere and set up the credential helper tool as a sidecar container in a Kubernetes pod to allow a legacy application to access AWS resources without using static credentials from an IAM user. We used the application certificate signed by our own private CA to fetch the temporary credentials from IAM Roles Anywhere. The application could then consume these credentials and access the needed AWS resources.\nReferences:\naws-samples/aws-iam-ra-for-kubernetes Sidecar Containers Introducing fine-grained IAM roles for service accounts Diving into IAM Roles for Service Accounts ","date":"2026-02-13T00:00:00Z","image":"https://habibiops.com/p/aws-iamra-using-sidecar-kubernetes-containers/assets/iamra-k8s-cover_hu_432cdbbdd6eb9b4b.png","permalink":"https://habibiops.com/p/aws-iamra-using-sidecar-kubernetes-containers/","title":"AWS IAM Roles Anywhere Using Sidecar Kubernetes Containers"},{"content":"Introduction Performance is often times an afterthought when it comes to Software Development and DevOps, kind of like how security is. No one seems to consider or care about it until suddenly everything\u0026rsquo;s too slow and taking forever to work.\nMore often than not, performance can be boiled down to I/O since storage is frequently the weakest link in the chain. Understanding how I/O works and possible options to fine-tune it can make wonders on your infrastructure. This is exactly what we\u0026rsquo;ll be discussing in this post today.\nWe\u0026rsquo;ll be going over the basics of I/O benchmarking and trying to emulate various workloads while using the open source tool fio, short for Flexible I/O tester.\nSetup Luckily we don\u0026rsquo;t need a fancy or complicated setup to do our benchmarking. All we need is:\nOne Linux VM with a distro of your choice. Personally I use Ubuntu 24.04 LTS. An HDD volume attached to the VM. fio for I/O benchmarking. You can run the commands below to set up and mount your volume:\n1 2 3 4 5 6 7 sudo mkdir /mnt/dev # check where your volume\u0026#39;s located lsblk # any filesystem would work sudo mkfs.ext4 /dev/vdb sudo mount /dev/vdb /mnt/dev Journaling and existing I/O workloads will affect the benchmark. So make sure to run on a clean system to get accurate measures\nInstallation FIO can be installed using the package manager. Check the documentation for your own distro. In my case, it\u0026rsquo;ll be Ubuntu. So the installation is done using the command:\n1 sudo apt install -y fio FIO Overview This section will focus mainly on high-level concept explanation. The technical, nitty-gritty details will come after.\nfio has an overwhelming number of parameters to simulate all sorts of workloads. We\u0026rsquo;ll be focusing on a handful of key ones that are enough to cover most scenarios:\nbs: block size size: File size to read/write from iodepth: Queue size for I/O submissions numjobs: Number of processes to create and perform the I/O operations rw: Type of the I/O operation to perform such read, write, rw etc. Check the documentation for further options. runtime: The duration to perform the I/O test direct: Boolean attribute, true if it\u0026rsquo;s set to 1 and 0 otherwise. ioengine: Two most common engines are io_uring and libaio. output-format: Defaults to shell-based format. json is recommended for parsing and automated processing. fio runs in one of two modes. Size-based, until the size is reached. Time-based, until the runtime is exhausted. I\u0026rsquo;d recommend to set one, not both.\nnumjobs vs iodepth This is a common point of confusion and can be misleading, often leading to very different results. In short:\nnumjobs controls the process-level parallelism meaning, fio would spawn and create different dedicated processes to perform the I/O task. iodepth controls the queue size, or, depth, at the job level The total number of maximum in-flight I/O requests is numjobs * iodepth.\nBlock size Setting the correct and proper block size is tricky. It can mess the entire benchmark and give a false impression. Block size can be split into two broad categories:\nSmall block size: Typically 4k up to 16k in size. Useful for random I/O and database/transactional workloads. AWS uses 4k block size by default for their EBS service Mid-range block size: Typically 16k up to 64k and is used for mixed workloads. MS SQL Server uses 64k block size by default Large block size: 128k and up to 4M. Recommended for sequential reads, backups and data warehousing. AWS Redshift uses 1M block size Each application has its own requirements and the I/O has to be fine-tuned for the workload itself. A good rule of thumb is to use 4k, 64k and 1M for small, middle and large block sizes respectively.\nioengine libaio, which stands for Library Asynchronous I/O, used to be the default I/O engine on Linux. It\u0026rsquo;s not completely deprecated, since it\u0026rsquo;s still the recommended I/O engine for testing HDD storage and legacy async I/O behavior. io_uring would be the go-to for SSD and NVMe type storage.\nFIO supports plenty of engines, including sync for synchronous I/O, mmap, windowsaio and plenty more. You can find the full list from the documentation.\nAlways read the documentation when choosing an engine, not all parameters are compatible with every engine.\ndirect vs non-direct I/O Direct I/O means that the I/O goes directly to the disk and bypasses any OS/page cache, hence the name. Non-direct I/O, on the other hand, can be considered as buffered-I/O. Meaning that I/O operations are collected together, in a buffer ( OS page cache), and then written (or read) at the same time. Buffered I/O will almost always be faster than non-buffered I/O, though it\u0026rsquo;s not a fair comparison since they perform different things and serve different purposes. The natural question then becomes, when to use which? Use Direct I/O to test the raw hardware storage performance and non-direct I/O for testing your RAM and caching system.\nTo sum things up, as a rule of thumb:\nUse direct=1 when benchmarking storage hardware Use direct=0 when benchmarking the cache system HDD vs SSD Considerations Different storage types behave very differently, so tuning fio parameters for HDDs and SSDs is important. HDDs, being mechanical, are latency-bound and benefit from low queue depths and single-job workloads ( numjobs=1). Random reads/writes are slow, so small blocks (4k) are used for testing database-like workloads, while sequential tests can use larger blocks (1M) and are useful for user applications/workloads.\nSSDs, on the other hand, are throughput-bound, handle high parallelism very well, and can achieve maximum performance with higher iodepth and multiple jobs. block sizes also vary depending on the workload that we\u0026rsquo;re trying to simulate.\nTuning numjobs, iodepth, and bs parameters according to the storage type is critical to ensure the benchmarks reflect the device’s true performance to avoid erroneous conclusions.\nA good rule of thumb:\nStorage Type Recommended numjobs Recommended iodepth Typical bs I/O Engine HDD 1 1–8 4k (small) / 1M (large) libaio SSD / NVMe 4–16+ 16–64+ 4k–128k (random) / 1M+ (sequential) io_uring Our First FIO Command We\u0026rsquo;re finally ready to test our first real FIO command after having built up a lot of background info and knowledge:\n1 2 3 4 5 6 7 8 9 sudo fio --name=hello-fio-read \\ --numjobs=1 \\ --iodepth=1 \\ --rw=read \\ --directory=/mnt/dev/ \\ --ioengine=libaio \\ --bs=1M \\ --direct=1 \\ --runtime=30 The command above basically spawns a single process to do a read I/O at /mnt/dev/ directory using the libaio engine with a block size set to 1M. The process will perform O_DIRECT I/O operations (--direct=1) to avoid the cache, and it will do so for a total of 30 seconds. Having specified the --directory, fio will create the file for us and perform the task. Finally, when the process finishes, we get a summary of the output:\n1 2 3 4 5 6 7 hello-fio: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=1 Jobs: 1 (f=1): [R(1)][100.0%][r=346MiB/s][r=346 IOPS][eta 00m:00s] read: IOPS=166, BW=166MiB/s (174MB/s)(4991MiB/30005msec) lat (usec): min=613, max=376663, avg=6008.79, stdev=16419.17 iops : min= 20, max= 370, avg=164.86, stdev=87.26, samples=59 Run status group 0 (all jobs): READ: bw=166MiB/s (174MB/s), 166MiB/s-166MiB/s (174MB/s-174MB/s), io=4991MiB (5233MB), run=30005-30005msec Some of the output has been removed for maintaining brevity.\nKey metrics to focus on are mainly:\nlat: Latency BW: Bandwidth iops: Number of I/O operations per second Parameter Comparison We\u0026rsquo;ve mentioned earlier the difference between direct and non-direct, big block size and small block size. So let\u0026rsquo;s see it in practice.\nAll commands and tests are done on the same machine to get accurate and consistent results\nSmall Block size vs Large Block size We\u0026rsquo;ll be using this base command:\n1 2 3 4 5 6 7 8 9 10 sudo fio --name=hello-fio \\ --numjobs=1 \\ --iodepth=1 \\ --direct=1 \\ --rw=write \\ --directory=/mnt/dev \\ --ioengine=libaio \\ --runtime=30 \\ --size=1G \\ --time_based Let\u0026rsquo;s start with bs=4k, we get:\n1 2 3 4 5 6 7 hello-fio: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 write: IOPS=895, BW=3582KiB/s (3668kB/s)(105MiB/30001msec); 0 zone resets lat (usec): min=358, max=29885, avg=1114.65, stdev=844.16 iops : min= 412, max= 1564, avg=893.86, stdev=304.19, samples=59 Run status group 0 (all jobs): WRITE: bw=3582KiB/s (3668kB/s), 3582KiB/s-3582KiB/s (3668kB/s-3668kB/s), io=105MiB (110MB), run=30001-30001msec And now with bs=1M, we get:\n1 2 3 4 5 6 write: IOPS=69, BW=69.8MiB/s (73.2MB/s)(2094MiB/30011msec); 0 zone resets lat (msec): min=6, max=222, avg=14.29, stdev=10.75 iops : min= 38, max= 84, avg=69.76, stdev= 8.34, samples=59 Run status group 0 (all jobs): WRITE: bw=69.8MiB/s (73.2MB/s), 69.8MiB/s-69.8MiB/s (73.2MB/s-73.2MB/s), io=2094MiB (2196MB), run=30011-30011msec We have the plot below (generated using Python\u0026rsquo;s matplot lib) to better summarize and visualize the results:\nWe can notice a pattern. The 4k block size has:\nLow throughput (an impressive 3.49 MB/s) Low latency High IOPs While the 1M block size is the complete opposite:\nHigh throughput High latency Low IOPs Direct vs Non-Direct Now let\u0026rsquo;s see the difference with the direct parameter. Just as before, we\u0026rsquo;ll use this base command:\n1 2 3 4 5 6 7 8 9 10 sudo fio --name=hello-fio \\ --numjobs=1 \\ --iodepth=1 \\ --rw=write \\ --directory=/mnt/dev \\ --ioengine=libaio \\ --bs=4k \\ --runtime=30 \\ --size=1G \\ --time_based It\u0026rsquo;s important to run both tests with the SAME block size to get as accurate of a comparison as possible\nFirst, we\u0026rsquo;ll test direct=1 (same as the command in the earlier section):\n1 2 3 4 5 6 7 hello-fio: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 write: IOPS=895, BW=3582KiB/s (3668kB/s)(105MiB/30001msec); 0 zone resets lat (usec): min=358, max=29885, avg=1114.65, stdev=844.16 iops : min= 412, max= 1564, avg=893.86, stdev=304.19, samples=59 Run status group 0 (all jobs): WRITE: bw=3582KiB/s (3668kB/s), 3582KiB/s-3582KiB/s (3668kB/s-3668kB/s), io=105MiB (110MB), run=30001-30001msec Now running with direct=0 and we get:\n1 2 3 4 5 6 7 write: IOPS=128k, BW=499MiB/s (523MB/s)(14.6GiB/30001msec); 0 zone resets lat (usec): min=2, max=2795, avg= 3.15, stdev= 3.20 bw ( KiB/s): min= 840, max=1247432, per=100.00%, avg=797612.32, stdev=385436.40, samples=37 iops : min= 210, max=311858, avg=199403.24, stdev=96359.24, samples=37 Run status group 0 (all jobs): WRITE: bw=499MiB/s (523MB/s), 499MiB/s-499MiB/s (523MB/s-523MB/s), io=14.6GiB (15.7GB), run=30001-30001msec Let\u0026rsquo;s plot the comparison just like before and see:\nWe have the following comparison:\nDirect I/O performs so badly that we can barely see the throughput. We could\u0026rsquo;ve done some normalization and log scaling, but the point was to show the real drastic difference between the two.\nThen again, seeing the results from the non-direct I/O, they are just too good to be true. This is the tricky part when it comes to benchmarking, knowing what results actually make sense, which ones are realistic, and which ones aren\u0026rsquo;t. Setting direct=0 will throw you off and you\u0026rsquo;d end up testing a completely different scenario (in this case, the RAM).\nTesting Other Operations Testing for other rw options is straightforward with FIO, we just need to change read to any of the supported options:\nwrite rw randread randwrite The base command remains the same. So if we want to test our write performance, the command would be adjusted as such:\n1 2 3 4 5 6 7 8 9 sudo fio --name=hello-fio-write \\ --numjobs=1 \\ --iodepth=1 \\ --rw=write \\ --directory=/mnt/dev/ \\ --ioengine=libaio \\ --bs=4k \\ --direct=1 \\ --runtime=30 And that\u0026rsquo;s it, we just had to change one parameter to test a completely different scenario. Which brings us to the next point, I/O testing can be, and is, very tricky because of that. A single parameter change would result in completely different outcomes and scenarios. So it\u0026rsquo;s important to carefully define what needs to be tested and which parameters to tinker with.\nConclusion We\u0026rsquo;ve covered plenty of concepts and topics in this post, providing a solid foundation for running and interpreting fio benchmarks.\nThe key takeaway is that running a benchmark is only half the work. Understanding what you’re measuring and why is just as important. Small parameter changes can lead to vastly different results, and some trial and error is needed when defining realistic workloads. In the end, benchmark results are only meaningful if the workload matches reality.\nIn the next post, we’ll take this a step further by automating fio runs and diving deeper into the JSON output, including how to parse and analyze results at scale.\n","date":"2026-02-08T00:00:00Z","image":"https://habibiops.com/p/io-benchmarking-with-fio-part-1-basics/assets/block_size_4k_1M_comparison_hu_69e5d9f5ea42eabc.png","permalink":"https://habibiops.com/p/io-benchmarking-with-fio-part-1-basics/","title":"I/O Benchmarking with FIO - Part 1 - Basics"},{"content":"Introduction We previously talked about OIDC integration with AWS using GitHub Actions. We will now see how to do a similar integration with BitBucket Pipelines. Namely, we want to push a container image to AWS ECR and then use that image in future pipelines.\nAWS IAM Identity Providers We have to configure the Bitbucket OpenID Connect provider in AWS IAM. Use the following to create a new OpenID Connect provider:\nProvider URL: api.bitbucket.org/2.0/workspaces/\u0026lt;YOUR-WORKSPACE-NAME\u0026gt;/pipelines-config/identity/oidc Audience: ari:cloud:bitbucket::workspace/\u0026lt;YOUR-WORKSPACE-ID\u0026gt; Now that the provider is configured, we can set up the IAM role for the Bitbucket OIDC provider and use it in our pipelines.\nAWS OIDC IAM Role The IAM role requires ECR push permissions and the GetAuthorizationToken action in order to be able to log into ECR.\nExample permissions:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 { \u0026#34;Statement\u0026#34;: [ { \u0026#34;Action\u0026#34;: \u0026#34;ecr:*\u0026#34;, \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Resource\u0026#34;: \u0026#34;arn:aws:ecr:eu-central-1:123456789012:repository/devops-terraform\u0026#34;, \u0026#34;Sid\u0026#34;: \u0026#34;AllowAllEcrActions\u0026#34; }, { \u0026#34;Action\u0026#34;: \u0026#34;ecr:GetAuthorizationToken\u0026#34;, \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Resource\u0026#34;: \u0026#34;*\u0026#34;, \u0026#34;Sid\u0026#34;: \u0026#34;EcrGetAuthorizationToken\u0026#34; } ], \u0026#34;Version\u0026#34;: \u0026#34;2012-10-17\u0026#34; } In the trust relationships section, we need to add the arn of the previously created Bitbucket OIDC provider and allw the AssumeRoleWithWebIdentity action for our specific Bitbucket repository. We can also add the IP addresses of the Bitbucket servers that will be using the OIDC provider in case you are using dedicated self-hosted Bitbucket runners.\nExample trust relationships:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 { \u0026#34;Version\u0026#34;: \u0026#34;2012-10-17\u0026#34;, \u0026#34;Statement\u0026#34;: [ { \u0026#34;Sid\u0026#34;: \u0026#34;\u0026#34;, \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Principal\u0026#34;: { \u0026#34;Federated\u0026#34;: \u0026#34;arn:aws:iam::123456789012:oidc-provider/api.bitbucket.org/2.0/workspaces/stulz_bitbucket/pipelines-config/identity/oidc\u0026#34; }, \u0026#34;Action\u0026#34;: \u0026#34;sts:AssumeRoleWithWebIdentity\u0026#34;, \u0026#34;Condition\u0026#34;: { \u0026#34;StringLike\u0026#34;: { \u0026#34;api.bitbucket.org/2.0/workspaces/stulz_bitbucket/pipelines-config/identity/oidc:sub\u0026#34;: \u0026#34;{YOUR-BITBUCKET-REPO-ID}*\u0026#34; }, \u0026#34;IpAddress\u0026#34;: { \u0026#34;aws:SourceIp\u0026#34;: [ \u0026#34;64.x.y.z/32\u0026#34;, \u0026#34;32.x.y.z/32\u0026#34; ] } } } ] } ECR Push Pipeline After setting up the IAM role, we can use it in our pipelines as follows:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 pipelines: custom: dockerEcrBuildPush: - variables: - name: IMAGE_NAME default: cicd-tf - name: IMAGE_TAG default: 1.12.0-6.5.0 - name: AWS_DEFAULT_REGION default: eu-central-1 - step: name: Build \u0026amp; Push Docker Image oidc: true runs-on: - \u0026#39;self.hosted\u0026#39; - \u0026#39;linux\u0026#39; script: - docker build -t $IMAGE_NAME:$IMAGE_TAG -t $IMAGE_NAME:latest . - pipe: atlassian/aws-ecr-push-image:2.6.0 variables: AWS_DEFAULT_REGION: \u0026#39;$AWS_DEFAULT_REGION\u0026#39; AWS_OIDC_ROLE_ARN: \u0026#39;arn:aws:iam::123456789012:role/devops-terraform-oidc\u0026#39; IMAGE_NAME: \u0026#39;$IMAGE_NAME\u0026#39; TAGS: \u0026#39;$IMAGE_TAG latest\u0026#39; The above pipeline will build the docker image and push it to ECR using the IAM role we created and using the official Bitbucket Pipelines Pipe: AWS ECR push image atlassian/aws-ecr-push-image.\nIf we want to use this image as a base image in our future pipelines, we can reference it in the image name section and specify the OIDC IAM role that can be used to pull it from ECR.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 steps: - step: name: TF Plan image: name: \u0026#39;123456789012.dkr.ecr.eu-central-1.amazonaws.com/devops-terraform:1.12.0-6.5.0\u0026#39; aws: oidc-role: \u0026#39;arn:aws:iam::123456789012:role/devops-terraform-oidc\u0026#39; oidc: true script: - terraform init - terraform validate - terraform plan -out=plan.tfplan runs-on: - \u0026#39;self.hosted\u0026#39; - \u0026#39;linux\u0026#39; This way, the terraform plan step will run inside the designated container image with a pre-installed terraform binary pinned to a specific version (1.12.0-6.5.0).\nConclusion In this post, we have seen how to use Bitbucket OIDC integration with AWS ECR. We have also seen how to use the official Bitbucket Pipelines Pipe: AWS ECR push image to build and push a docker image to ECR. Hopefully, this will help you in your future projects.\nReferences:\nAtlassian / Bitbucket Pipelines Pipes / aws-ecr-push-image Use AWS ECR images in Pipelines with OpenID Connect ","date":"2026-02-05T00:00:00Z","image":"https://habibiops.com/p/bitbucket-oidc-integration-with-aws-ecr/assets/bitbucket-oidc-ecr.drawio.svg","permalink":"https://habibiops.com/p/bitbucket-oidc-integration-with-aws-ecr/","title":"Bitbucket OIDC Integration with AWS ECR"},{"content":"Introduction 1 2 3 4 5 6 - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v6 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: ${{ env.AWS_REGION }} What\u0026rsquo;s wrong with this step inside the GitHub Actions workflow? Yes, it\u0026rsquo;s not the best way to authenticate to AWS because AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY were used to configure the credentials. This means having to manage an IAM user with valid access keys. This means that these keys have to be rotated regularly. This means that someone has to copy and paste those static credentials into the secrets and make sure they are not leaked or compromised somewhere else.\nWhat\u0026rsquo;s the best practice to authenticate to AWS in a pipeline you may ask? The answer is OpenID Connect (OIDC). OpenID Connect allows your workflows to exchange short-lived tokens directly from your cloud provider. Let\u0026rsquo;s see how to configure this in GitHub Actions and deploy to AWS using this method.\nAWS IAM Identity Providers Before allowing GitHub Actions to authenticate to AWS, we need to configure an OpenID Connect provider in AWS IAM. Inside the AWS IAM console, go to Identity Providers and create a new OpenID Connect provider. Use the following details for the configuration:\nProvider URL: token.actions.githubusercontent.com Audience: sts.amazonaws.com After setting up the provider, we can create an IAM role forGitHub Actions to assume it.\nAWS IAM Role and Trust Relationship Policy create a new IAM role with the following trust policy that allows the OIDC provider to assume the role with web identity:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 { \u0026#34;Version\u0026#34;: \u0026#34;2012-10-17\u0026#34;, \u0026#34;Statement\u0026#34;: [ { \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Principal\u0026#34;: { \u0026#34;Federated\u0026#34;: \u0026#34;arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com\u0026#34; }, \u0026#34;Action\u0026#34;: \u0026#34;sts:AssumeRoleWithWebIdentity\u0026#34;, \u0026#34;Condition\u0026#34;: { \u0026#34;StringEquals\u0026#34;: { \u0026#34;token.actions.githubusercontent.com:aud\u0026#34;: \u0026#34;sts.amazonaws.com\u0026#34; }, \u0026#34;StringLike\u0026#34;: { \u0026#34;token.actions.githubusercontent.com:sub\u0026#34;: [ \u0026#34;repo:GitHubOrg/GitHubRepo:ref:refs/heads/*\u0026#34;, \u0026#34;repo:GitHubOrg/GitHubRepo:ref:refs/tags/*\u0026#34; ] } } } ] } Let\u0026rsquo;s break down the trust relationship policy:\nFederated Principal: the OIDC provider ARN StringEquals Condition: the OIDC token audience (aws sts) StringLike Condition: the GitHub repository and branch patterns that the role can be assumed from. Here we allow the role to be assumed from any branch or tag in the GitHubRepo repository. We can use wildcards in the repository and branch patterns to allow multiple repositories and branches to assume the role. We can also define only specific branches or tags to allow only specific branches or tags to assume the role. For example:\nAllow only the main branch:\n1 2 3 \u0026#34;StringLike\u0026#34;: { \u0026#34;token.actions.githubusercontent.com:sub\u0026#34;: \u0026#34;\u0026#34;repo:GitHubOrg/GitHubRepo1:ref:refs/heads/main\u0026#34; } Allow any branch:\n1 2 3 \u0026#34;StringLike\u0026#34;: { \u0026#34;token.actions.githubusercontent.com:sub\u0026#34;: \u0026#34;\u0026#34;repo:GitHubOrg/GitHubRepo1:*\u0026#34; } Allow only tags from multiple repositories:\n1 2 3 4 5 6 7 \u0026#34;StringLike\u0026#34;: { \u0026#34;token.actions.githubusercontent.com:sub\u0026#34;: [ \u0026#34;repo:GitHubOrg/GitHubRepo1:ref:refs/tags/*\u0026#34;, \u0026#34;repo:GitHubOrg/GitHubRepo2:ref:refs/tags/*\u0026#34;, \u0026#34;repo:GitHubOrg/GitHubRepo3:ref:refs/tags/*\u0026#34; ] } These fine-grained permissions should be used to restrict access to the role accordingly. After creating the role with the trust policy, we can attach the IAM policy that allows the role to do whatever we would like to do inside AWS such as write to S3 buckets, update ECS tasks, etc.\nGitHub Actions Workflow After setting up the IAM role and trust relationship policy, we can configure the GitHub Actions workflow to assume the role and deploy to AWS. Here is an example workflow that deploys a Hugo site to S3:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 name: \u0026#39;Deploy to S3\u0026#39; on: workflow_dispatch: branches: - main permissions: id-token: write contents: read defaults: run: shell: bash env: AWS_REGION: \u0026#34;eu-central-1\u0026#34; AWS_DEFAULT_REGION: \u0026#34;eu-central-1\u0026#34; AWS_ROLE_ARN: \u0026#34;arn:aws:iam::123456789012:role/gha-oidc-rolename\u0026#34; AWS_ROLE_SESSION_NAME: \u0026#34;GHA-OIDC-${{ github.run_id }}-${{ github.run_number }}-${{ github.run_attempt }}\u0026#34; AWS_S3_BUCKET_NAME: \u0026#34;my-awesome-hugo-website\u0026#34; jobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v5 - name: Configure AWS Credentials uses: aws-actions/configure-aws-credentials@v5 with: role-to-assume: ${{ env.AWS_ROLE_ARN }} role-session-name: ${{ env.AWS_ROLE_SESSION_NAME }} aws-region: ${{ env.AWS_REGION }} - name: Setup Hugo run: sudo apt-get update \u0026amp;\u0026amp; sudo apt-get install hugo -y - name: Build Hugo site run: hugo --gc --minify working-directory: ./my-awesome-hugo-theme - name: Deploy to S3 run: aws s3 sync --delete ./public/ s3://${{ env.AWS_S3_BUCKET_NAME }}/public/ working-directory: ./my-awesome-hugo-theme Let\u0026rsquo;s break down the workflow:\nThe id-token is used in combination with OpenID Connect. Setting the permissions to write is required in order to request an OpenID Connect JWT Token as described in the docs. The configure-aws-credentials action is used to configure the AWS credentials using the OpenID Connect JWT Token. Notice that we are not using the aws-access-key-id and aws-secret-access-key inputs. We only specified the role ARN and session name. The Setup Hugo step installs Hugo on the runner. After building the Hugo site, the Deploy to S3 step deploys the site to S3 by copying over the generated public folder using the AWS CLI: aws s3 sync .... Here\u0026rsquo;s a diagram of the OIDC workflow:\nConclusion In this post, we saw how to configure GitHub Actions to authenticate to AWS using OpenID Connect. An example Hugo site deployment to S3 was used to demonstrate the workflow. I hope this post was helpful and you learned something new. Let\u0026rsquo;s stop using IAM users with static credentials inside CI/CD pipelines!\nReferences:\nGitHub Actions/Concepts/Security/OpenID Connect Integrating with GitHub Actions – CI/CD pipeline to deploy a Web App to Amazon EC2 ","date":"2026-01-19T00:00:00Z","image":"https://habibiops.com/p/github-actions-aws-deploy-using-oidc/assets/gha-oidc-cover_hu_cf47057c74601e69.png","permalink":"https://habibiops.com/p/github-actions-aws-deploy-using-oidc/","title":"GitHub Actions AWS Deploy using OIDC"},{"content":"Introduction Terraform is a great tool for managing infrastructure and can be integrated with CI/CD pipelines for automated deployments. In order to achieve idempotency and avoid unexpected changes from version updates, terraform should be run in a container image and be pinned to a specific version.\nIn addition to having a pinned terraform version inside the container image, it can be a wise choice to also include a pinned version of the main provider being used for your infrastructure. For example, if you are using AWS, you can pin the terraform aws provider to your specific version.\nTerraform and Provider Version Pinning Your terraform configuration file should include the terraform and provider versions you want to use. For example:\n1 2 3 4 5 6 7 8 9 10 terraform { required_version = \u0026#34;1.12.0\u0026#34; required_providers { aws = { source = \u0026#34;hashicorp/aws\u0026#34; version = \u0026#34;6.5.0\u0026#34; } } } I place the above in a separate file called versions.tf and include it in any directory used by terraform.\nTerraform Container Image The main characteristics of a good terraform container image are:\nsmall size: use Alpine Linux as a base image pinned terraform version pinned provider versions arguments to specify the terraform and provider versions environment variables reflecting the used versions This link provides an example on how to install terraform on Alpine Linux. Here is an example Dockerfile for our terraform container image:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 FROM alpine:3.22.0 ARG TERRAFORM_VERSION=\u0026#34;1.12.0\u0026#34; ARG TERRAFORM_AWS_PROVIDER_VERSION=\u0026#34;6.5.0\u0026#34; ENV TF_IN_AUTOMATION=true ENV TF_INPUT=false ENV TERRAFORM_VERSION=\u0026#34;${TERRAFORM_VERSION}\u0026#34; ENV TERRAFORM_AWS_PROVIDER_VERSION=\u0026#34;${TERRAFORM_AWS_PROVIDER_VERSION}\u0026#34; RUN apk add --update --no-cache \\ aws-cli \\ curl \\ git \\ unzip # terraform RUN curl -o tf.zip \u0026#34;https://releases.hashicorp.com/terraform/${TERRAFORM_VERSION}/terraform_${TERRAFORM_VERSION}_linux_amd64.zip\u0026#34; \\ \u0026amp;\u0026amp; unzip tf.zip -d /usr/local/bin/ \\ \u0026amp;\u0026amp; rm tf.zip # terraform hashicorp/aws provider RUN mkdir -p \u0026#34;/root/.terraform.d/plugins/registry.terraform.io/hashicorp/aws/${TERRAFORM_AWS_PROVIDER_VERSION}/linux_amd64\u0026#34; \\ \u0026amp;\u0026amp; curl -o tf_aws.zip \u0026#34;https://releases.hashicorp.com/terraform-provider-aws/${TERRAFORM_AWS_PROVIDER_VERSION}/terraform-provider-aws_${TERRAFORM_AWS_PROVIDER_VERSION}_linux_amd64.zip\u0026#34; \\ \u0026amp;\u0026amp; unzip tf_aws.zip -d \u0026#34;/root/.terraform.d/plugins/registry.terraform.io/hashicorp/aws/${TERRAFORM_AWS_PROVIDER_VERSION}/linux_amd64/\u0026#34; \\ \u0026amp;\u0026amp; rm tf_aws.zip Some advantages of having a pinned terraform and provider versions are:\nThe pipeline runs will be guaranteed to run in a reproducible manner terraform init will not download the latest provider version and will fail in case someone tries to use a different version in the pipeline The pipeline runs will be faster since the provider will not need to be downloaded every time on terraform init Image Tagging There are different ways to tag a container image. Some prefer to use the git commit hash as the tag, while others prefer to use semantic versioning. In this case, I prefer to use the explicit terraform and provider versions as the tag.\nThe image is therefore tagged accordingly: $IMAGE:$TERRAFORM_VERSION-$TERRAFORM_AWS_PROVIDER_VERSION. For example, an image with terraform version 1.12.0 installed and aws provider version 6.5.0 would have the tag 1.12.0-6.5.0.\nConclusion This post summarized the best practices for creating terraform container images to be used in CI/CD pipelines.\nA sequel post will cover how to configure pipeline builds and deployments using the above container image.\n","date":"2026-01-11T00:00:00Z","permalink":"https://habibiops.com/p/terraform-container-image-for-cicd-pipelines/","title":"Terraform Container Image for CI/CD Pipelines"},{"content":"Introduction LocalStack is a local cloud development solution that emulates AWS services in a local environment. This enables developers to build and test various AWS-based solutions on a small scale without requiring an AWS account.\nIn this post, LocalStack is used to test the deployment of a static website to a S3 bucket. Hugo, a popular static site generator, is used to build the site. The following topics are covered in more detail throughout this post:\nCreating a containerized environment that includes: LocalStack Hugo S3 bucket creation and configuration for static website hosting Building and deploying the static site to the S3 bucket Automating the deployment process The entire source code and setup is available on GitHub.\nThis post does not cover the inner workings of Hugo or how to set up a Hugo static site. Instead, it focuses solely on the deployment aspect. Hugo themes can be found here.\nSetup of the Containerized Environment LocalStack provides a pre-built Docker image that\u0026rsquo;s available on Docker Hub, offering a fully functional local AWS cloud stack in a containerized environment. However, the image doesn\u0026rsquo;t include any Hugo-related dependencies to build a static website. For that, we will create a new image based on the LocalStack image and include the additional dependencies.\nThe Dockerfile below implements the initial basic setup:\n1 2 3 4 5 6 7 8 9 FROM localstack/localstack RUN apt-get update \u0026amp;\u0026amp; apt-get install -y --no-install-recommends \\ hugo \\ ca-certificates \\ git \\ golang EXPOSE 4566 Details for installing Hugo can be found from the official documentation\nWe can then build the image once the Dockerfile is set up by using the command:\n1 podman build . -t localstack-gohugo Finally, we can run the container by using the command:\n1 2 3 4 5 podman run \\ -it \\ -p 4566:4566 \\ -v ./hugo-blog:/opt/code/localstack/hugo-blog \\ --name localstack-gohugo localstack-gohugo ./hugo-blog can be an empty directory, in which Hugo can be set up within the container, or it can be a directory that already contains a Hugo website.\nAccessing the container can be done by opening a new terminal and executing the following command:\n1 podman exec -it localstack-gohugo bash At this point, we have a running localstack-gohugo container that has the following built-in dependencies set up:\nLocalStack Hugo and its dependencies However, we do not yet have the actual final build of the website or any deployable, site-related output. In the next sections, we will walk through the deployment process within the container to better understand how the manual process works. Afterward, we will demonstrate a better and enhanced automated way to perform the deployment.\nS3 Bucket Setup Two simple steps are needed to set up the S3 bucket:\nCreate the bucket Apply a bucket policy to allow public access Bucket creation in LocalStack is done using the command below:\n1 awslocal s3 mb s3://testwebsite awslocal is used instead of aws for all AWS-related APIs when using LocalStack.\nOnce the bucket has been created, a policy has to be set and applied to the bucket so that the website can be publicly accessible. Therefore, the following bucket policy will be set:\n1 2 3 4 5 6 7 8 9 10 11 12 { \u0026#34;Version\u0026#34;: \u0026#34;2012-10-17\u0026#34;, \u0026#34;Statement\u0026#34;: [ { \u0026#34;Sid\u0026#34;: \u0026#34;PublicReadGetObject\u0026#34;, \u0026#34;Effect\u0026#34;: \u0026#34;Allow\u0026#34;, \u0026#34;Principal\u0026#34;: \u0026#34;*\u0026#34;, \u0026#34;Action\u0026#34;: \u0026#34;s3:GetObject\u0026#34;, \u0026#34;Resource\u0026#34;: \u0026#34;arn:aws:s3:::testwebsite/*\u0026#34; } ] } The bucket IAM policy allows anyone (\u0026quot;Principal\u0026quot;: \u0026quot;*\u0026quot;) to get read-only access (\u0026quot;Action\u0026quot;: \u0026quot;s3:GetObject\u0026quot;) to the website contents on the S3 bucket (\u0026quot;Resource\u0026quot;: \u0026quot;arn:aws:s3:::testwebsite/*\u0026quot;).\nThe official AWS Documentation contains more thorough details regarding the attributes.\nLastly, the policy is applied to the bucket by using the following command:\n1 awslocal s3api put-bucket-policy --bucket testwebsite --policy file://bucket_policy.json Deploying to S3 Several steps are required before we can deploy to our S3 bucket, namely:\nProperly set up the hugo.toml configuration file Build the static site using Hugo: hugo --gc --minify Deploy the site to the S3-bucket: awslocal s3 sync ./ s3://testwebsite Configuring hugo.toml The hugo.toml config file is already properly set when building the image; however, this section explains how to set the configuration file.\nThe configuration file is human-readable, and we only need to set four parameters to get started. With LocalStack, the S3 website endpoint accepts several formats; the most common is the virtual-hosted style: http://\u0026lt;BUCKET_NAME\u0026gt;.s3-website.localhost.localstack.cloud:4566. Hence, the configuration file looks as follows:\n1 2 3 4 baseURL = \u0026#39;http://testwebsite.s3-website.localhost.localstack.cloud:4566\u0026#39; languageCode = \u0026#39;en-us\u0026#39; title = \u0026#39;My Hugo Site\u0026#39; theme = \u0026#39;mixedpaper\u0026#39; It\u0026rsquo;s possible to set a section for deployments; however, it doesn\u0026rsquo;t seem to be well integrated with LocalStack, so this will be skipped for the time being.\nBuilding the Website A single command is needed to build the website:\n1 hugo A single command is needed to build the website:\n1 hugo Hugo will process all the Markdown content files and templates, generating a complete, hostable website in the public directory. The source files (Markdown, partials, etc.) will be converted into HTML/CSS/JS, which will be stored in the public directory. For better visualization, the structure of the public folder is presented below:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 tree -L 1 . ├── 404.html ├── archives ├── categories ├── favicon.png ├── index.html ├── index.xml ├── links ├── p ├── page ├── post ├── scss ├── search ├── sitemap.xml ├── tags └── ts For additional details, check Hugo\u0026rsquo;s official documentation.\nOnce the website has been built and all the content is generated, the index page has to be set using the command:\n1 2 # inside the generated public directory awslocal s3 website s3://testwebsite/ --index-document index.html Deploying to S3 Bucket Deploying to the S3 bucket simply means uploading the public directory to the bucket. This can be achieved using the s3 sync command as follows:\n1 awslocal s3 sync ./ s3://testwebsite The site is now hosted on the S3 bucket testwebsite. To test, simply open the browser and paste the base URL: http://testwebsite.s3-website.localhost.localstack.cloud:4566.\nIt might not work on Chrome, Firefox, or other Chromium-based browsers. Edit the /etc/hosts file on your machine and add the line 127.0.0.1 testwebsite.s3-website.localhost.localstack.cloud for it to work. For more info, check the resolved GitHub issue.\nFurther Automation To reduce the setup, a helper script was implemented and the Dockerfile was adjusted as follows:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 FROM localstack/localstack RUN apt-get update \\ \u0026amp;\u0026amp; apt-get install -y --no-install-recommends \\ hugo \\ ca-certificates \\ git \\ golang WORKDIR /opt/code/localstack COPY . . RUN chmod +x ./setup.sh \\ \u0026amp;\u0026amp; mkdir -p /etc/localstack/init/ready.d \\ \u0026amp;\u0026amp; mv ./setup.sh /etc/localstack/init/ready.d/init-aws.sh EXPOSE 4566 Below is a brief overview of the updated Dockerfile:\nCopy the entire working directory into the container so that we can avoid using the -v mount option. Set the script to executable and move it to /etc/localstack/init/ready.d/init-aws.sh so that it is automatically executed by LocalStack once the container is running. Check the LocalStack initialization hooks for more info regarding the lifecycle.\nWith the newly enhanced and improved Dockerfile, the container is ready to use out of the box, without the need to perform any manual tasks.\nSetup Script Below is the implemented script for automating the setup within the container:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 #!/bin/bash # Define variables BASE_DIR=\u0026#34;/opt/code/localstack\u0026#34; SITE_DIR=\u0026#34;$BASE_DIR/hugo-mixedpaper\u0026#34; BUCKET=\u0026#34;testwebsite\u0026#34; POLICY_FILE=\u0026#34;bucket_policy.json\u0026#34; # Set up the bucket awslocal s3 mb s3://$BUCKET awslocal s3api put-bucket-policy --bucket $BUCKET --policy file://$POLICY_FILE # Build the site cd $SITE_DIR \u0026amp;\u0026amp; hugo # Deploy to S3 cd \u0026#34;$SITE_DIR/public\u0026#34; \u0026amp;\u0026amp; awslocal s3 sync . s3://$BUCKET # Set the index page awslocal s3 website s3://$BUCKET/ --index-document index.html The script above handles:\nCreation of the bucket. Applying the bucket_policy.json file. Building the website using hugo and generating the public folder. Syncing the public folder to the S3 bucket. Setting the index-document for the bucket. Once the image is built and the container is running, we can directly access the website at the following endpoint: http://testwebsite.s3-website.localhost.localstack.cloud:4566\nDon\u0026rsquo;t forget to update the /etc/hosts file to add the line 127.0.0.1 testwebsite.s3-website.localhost.localstack.cloud for it to work.\nConclusion The setup explored in this post is mainly for local development and emulating a production-like environment. In short, we\u0026rsquo;ve managed to:\nExtend the LocalStack container image to integrate Hugo and other dependencies. Implement a script to automate the setup inside the container by performing the following tasks: S3 bucket creation and configuration Building and deploying the site to the created S3 bucket The complete source code used here can be found here on GitHub.\n","date":"2026-01-10T00:00:00Z","image":"https://habibiops.com/p/static-site-deployment-on-s3-using-containerized-localstack/assets/cover_hu_6aca43407e884201.png","permalink":"https://habibiops.com/p/static-site-deployment-on-s3-using-containerized-localstack/","title":"Static Site Deployment on S3 using Containerized LocalStack"}]