<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Performance on HabibiOps</title><link>https://habibiops.com/tags/performance/</link><description>Recent content in Performance on HabibiOps</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Sun, 08 Feb 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://habibiops.com/tags/performance/index.xml" rel="self" type="application/rss+xml"/><item><title>I/O Benchmarking with FIO - Part 1 - Basics</title><link>https://habibiops.com/p/io-benchmarking-with-fio-part-1-basics/</link><pubDate>Sun, 08 Feb 2026 00:00:00 +0000</pubDate><guid>https://habibiops.com/p/io-benchmarking-with-fio-part-1-basics/</guid><description>&lt;img src="https://habibiops.com/p/io-benchmarking-with-fio-part-1-basics/assets/block_size_4k_1M_comparison.png" alt="Featured image of post I/O Benchmarking with FIO - Part 1 - Basics" /&gt;&lt;h2 id="introduction"&gt;Introduction
&lt;/h2&gt;&lt;p&gt;Performance is often times an afterthought when it comes to Software Development and DevOps, kind of like how security
is. No one seems to consider or care about it until suddenly everything&amp;rsquo;s too slow and taking forever to work.&lt;/p&gt;
&lt;p&gt;More often than not, performance can be boiled down to I/O since storage is frequently the weakest link in the chain.
Understanding how I/O works and possible options to fine-tune it can make wonders on your infrastructure. This is
exactly what we&amp;rsquo;ll be discussing in this post today.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll be going over the basics of I/O benchmarking and trying to emulate various workloads while using the open source
tool &lt;a class="link" href="https://fio.readthedocs.io/en/latest/fio_doc.html" target="_blank" rel="noopener"
&gt;fio&lt;/a&gt;, short for Flexible I/O tester.&lt;/p&gt;
&lt;h2 id="setup"&gt;Setup
&lt;/h2&gt;&lt;p&gt;Luckily we don&amp;rsquo;t need a fancy or complicated setup to do our benchmarking. All we need is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One Linux VM with a distro of your choice. Personally I use Ubuntu 24.04 LTS.&lt;/li&gt;
&lt;li&gt;An HDD volume attached to the VM.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fio&lt;/code&gt; for I/O benchmarking.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can run the commands below to set up and mount your volume:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sudo mkdir /mnt/dev
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# check where your volume&amp;#39;s located &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;lsblk
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# any filesystem would work&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sudo mkfs.ext4 /dev/vdb
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sudo mount /dev/vdb /mnt/dev
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;Journaling and existing I/O workloads will affect the benchmark. So make sure to run on a clean system to get accurate
measures&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="installation"&gt;Installation
&lt;/h3&gt;&lt;p&gt;FIO can be installed using the package manager. Check
the &lt;a class="link" href="https://github.com/axboe/fio?tab=readme-ov-file" target="_blank" rel="noopener"
&gt;documentation&lt;/a&gt; for your own distro. In my case, it&amp;rsquo;ll be Ubuntu.
So the installation is done using the command:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sudo apt install -y fio
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id="fio-overview"&gt;FIO Overview
&lt;/h2&gt;&lt;blockquote&gt;
&lt;p&gt;This section will focus mainly on high-level concept explanation. The technical, nitty-gritty details will come after.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;code&gt;fio&lt;/code&gt; has an overwhelming number of parameters to simulate all sorts of workloads. We&amp;rsquo;ll be focusing on a handful of key ones that are enough to cover most scenarios:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;bs&lt;/code&gt;: block size&lt;/li&gt;
&lt;li&gt;&lt;code&gt;size&lt;/code&gt;: File size to read/write from&lt;/li&gt;
&lt;li&gt;&lt;code&gt;iodepth&lt;/code&gt;: Queue size for I/O submissions&lt;/li&gt;
&lt;li&gt;&lt;code&gt;numjobs&lt;/code&gt;: Number of processes to create and perform the I/O operations&lt;/li&gt;
&lt;li&gt;&lt;code&gt;rw&lt;/code&gt;: Type of the I/O operation to perform such &lt;code&gt;read&lt;/code&gt;, &lt;code&gt;write&lt;/code&gt;, &lt;code&gt;rw&lt;/code&gt; etc. Check the documentation for further options.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;runtime&lt;/code&gt;: The duration to perform the I/O test&lt;/li&gt;
&lt;li&gt;&lt;code&gt;direct&lt;/code&gt;: Boolean attribute, true if it&amp;rsquo;s set to &lt;code&gt;1&lt;/code&gt; and &lt;code&gt;0&lt;/code&gt; otherwise.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ioengine&lt;/code&gt;: Two most common engines are &lt;code&gt;io_uring&lt;/code&gt; and &lt;code&gt;libaio&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;output-format&lt;/code&gt;: Defaults to shell-based format. &lt;code&gt;json&lt;/code&gt; is recommended for parsing and automated processing.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;fio runs in one of two modes. Size-based, until the &lt;code&gt;size&lt;/code&gt; is reached. Time-based, until the &lt;code&gt;runtime&lt;/code&gt; is exhausted.
I&amp;rsquo;d recommend to set one, not both.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="numjobs-vs-iodepth"&gt;&lt;code&gt;numjobs&lt;/code&gt; vs &lt;code&gt;iodepth&lt;/code&gt;
&lt;/h3&gt;&lt;p&gt;This is a common point of confusion and can be misleading, often leading to very different results. In short:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;numjobs&lt;/code&gt; controls the &lt;strong&gt;process-level&lt;/strong&gt; parallelism meaning, &lt;code&gt;fio&lt;/code&gt; would spawn and create different dedicated
processes to perform the I/O task.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;iodepth&lt;/code&gt; controls the queue size, or, depth, at the &lt;strong&gt;job level&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The total number of maximum &lt;strong&gt;in-flight&lt;/strong&gt; I/O requests is &lt;code&gt;numjobs * iodepth&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="block-size"&gt;Block size
&lt;/h3&gt;&lt;p&gt;Setting the correct and proper block size is tricky. It can mess the entire benchmark and give a false impression. Block
size can be split into two broad categories:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Small block size: Typically 4k up to 16k in size. Useful for random I/O and database/transactional workloads. AWS uses 4k block size by default for
their &lt;a class="link" href="https://docs.aws.amazon.com/ebs/latest/userguide/volume_constraints.html" target="_blank" rel="noopener"
&gt;EBS service&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Mid-range block size: Typically 16k up to 64k and is used for mixed
workloads. &lt;a class="link" href="https://learn.microsoft.com/en-us/sql/relational-databases/pages-and-extents-architecture-guide?view=sql-server-ver17&amp;amp;utm_source=chatgpt.com#extents" target="_blank" rel="noopener"
&gt;MS SQL Server&lt;/a&gt;
uses 64k block size by default&lt;/li&gt;
&lt;li&gt;Large block size: 128k and up to 4M. Recommended for sequential reads, backups and data
warehousing. &lt;a class="link" href="https://docs.aws.amazon.com/redshift/latest/dg/r_STV_BLOCKLIST.html?utm_source=chatgpt.com" target="_blank" rel="noopener"
&gt;AWS Redshift&lt;/a&gt;
uses 1M block size&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each application has its own requirements and the I/O has to be fine-tuned for the workload
itself. A good rule of thumb is to use &lt;code&gt;4k&lt;/code&gt;, &lt;code&gt;64k&lt;/code&gt; and &lt;code&gt;1M&lt;/code&gt; for small, middle and large block sizes respectively.&lt;/p&gt;
&lt;h3 id="ioengine"&gt;&lt;code&gt;ioengine&lt;/code&gt;
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;libaio&lt;/code&gt;, which stands for Library Asynchronous I/O, used to be the default I/O engine on Linux. It&amp;rsquo;s not completely
deprecated, since it&amp;rsquo;s still the recommended I/O engine for testing HDD storage and legacy async I/O behavior.
&lt;code&gt;io_uring&lt;/code&gt; would be the go-to for SSD and NVMe type storage.&lt;/p&gt;
&lt;p&gt;FIO supports plenty of engines, including &lt;code&gt;sync&lt;/code&gt; for synchronous I/O, &lt;code&gt;mmap&lt;/code&gt;, &lt;code&gt;windowsaio&lt;/code&gt; and plenty more. You can find
the full list from the &lt;a class="link" href="https://fio.readthedocs.io/en/latest/fio_doc.html#i-o-engine" target="_blank" rel="noopener"
&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Always read the documentation when choosing an engine, not all parameters are compatible with every engine.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="direct-vs-non-direct-io"&gt;direct vs non-direct I/O
&lt;/h3&gt;&lt;p&gt;Direct I/O means that the I/O goes directly to the disk and bypasses any OS/page cache, hence the name. Non-direct I/O,
on the other hand, can be considered as buffered-I/O. Meaning that I/O operations are collected together, in a buffer (
OS page cache), and then written (or read) at the same time. Buffered I/O will almost always be faster than
non-buffered I/O, though it&amp;rsquo;s not a fair comparison since they perform different things and serve different purposes.
The natural question then becomes, when to use which? Use Direct I/O to test the raw hardware storage performance and
non-direct I/O for testing your RAM and caching system.&lt;/p&gt;
&lt;p&gt;To sum things up, as a rule of thumb:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use &lt;code&gt;direct=1&lt;/code&gt; when benchmarking storage hardware&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;direct=0&lt;/code&gt; when benchmarking the cache system&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="hdd-vs-ssd-considerations"&gt;HDD vs SSD Considerations
&lt;/h3&gt;&lt;p&gt;Different storage types behave very differently, so tuning fio parameters for HDDs and SSDs is important. HDDs, being
mechanical, are latency-bound and benefit from low queue depths and single-job workloads (
&lt;code&gt;numjobs=1&lt;/code&gt;). Random reads/writes are slow, so small blocks (4k) are used for testing database-like workloads, while
sequential tests can use larger blocks (1M) and are useful for user applications/workloads.&lt;/p&gt;
&lt;p&gt;SSDs, on the other hand, are throughput-bound, handle high parallelism very well,
and can achieve maximum performance with higher iodepth and multiple jobs. block sizes also vary depending on the
workload that we&amp;rsquo;re trying to simulate.&lt;/p&gt;
&lt;p&gt;Tuning &lt;code&gt;numjobs&lt;/code&gt;, &lt;code&gt;iodepth&lt;/code&gt;, and &lt;code&gt;bs&lt;/code&gt; parameters according to the storage type is critical to ensure the benchmarks
reflect the device’s true performance to avoid erroneous conclusions.&lt;/p&gt;
&lt;p&gt;A good rule of thumb:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Storage Type&lt;/th&gt;
&lt;th&gt;Recommended &lt;code&gt;numjobs&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;Recommended &lt;code&gt;iodepth&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;Typical &lt;code&gt;bs&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;I/O Engine&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;HDD&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1–8&lt;/td&gt;
&lt;td&gt;4k (small) / 1M (large)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;libaio&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSD / NVMe&lt;/td&gt;
&lt;td&gt;4–16+&lt;/td&gt;
&lt;td&gt;16–64+&lt;/td&gt;
&lt;td&gt;4k–128k (random) / 1M+ (sequential)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;io_uring&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="our-first-fio-command"&gt;Our First FIO Command
&lt;/h2&gt;&lt;p&gt;We&amp;rsquo;re finally ready to test our first real FIO command after having built up a lot of background info and knowledge:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;span class="lnt"&gt;8
&lt;/span&gt;&lt;span class="lnt"&gt;9
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sudo fio --name&lt;span class="o"&gt;=&lt;/span&gt;hello-fio-read &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --numjobs&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --iodepth&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --rw&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --directory&lt;span class="o"&gt;=&lt;/span&gt;/mnt/dev/ &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --ioengine&lt;span class="o"&gt;=&lt;/span&gt;libaio &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --bs&lt;span class="o"&gt;=&lt;/span&gt;1M &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --direct&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --runtime&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;30&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The command above basically spawns a single process to do a &lt;code&gt;read&lt;/code&gt; I/O at &lt;code&gt;/mnt/dev/&lt;/code&gt; directory using the &lt;code&gt;libaio&lt;/code&gt;
engine with a block size set to &lt;code&gt;1M&lt;/code&gt;. The process will perform &lt;code&gt;O_DIRECT&lt;/code&gt; I/O operations (&lt;code&gt;--direct=1&lt;/code&gt;) to avoid the
cache, and it will do so for a total of &lt;code&gt;30 seconds&lt;/code&gt;. Having specified the &lt;code&gt;--directory&lt;/code&gt;, &lt;code&gt;fio&lt;/code&gt; will create
the file for us and perform the task. Finally, when the process finishes, we get a summary of the output:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;hello-fio: &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;g&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0&lt;span class="o"&gt;)&lt;/span&gt;: &lt;span class="nv"&gt;rw&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;read, &lt;span class="nv"&gt;bs&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;R&lt;span class="o"&gt;)&lt;/span&gt; 1024KiB-1024KiB, &lt;span class="o"&gt;(&lt;/span&gt;W&lt;span class="o"&gt;)&lt;/span&gt; 1024KiB-1024KiB, &lt;span class="o"&gt;(&lt;/span&gt;T&lt;span class="o"&gt;)&lt;/span&gt; 1024KiB-1024KiB, &lt;span class="nv"&gt;ioengine&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;libaio, &lt;span class="nv"&gt;iodepth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Jobs: &lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;f&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;R&lt;span class="o"&gt;(&lt;/span&gt;1&lt;span class="o"&gt;)][&lt;/span&gt;100.0%&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="nv"&gt;r&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;346MiB/s&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="nv"&gt;r&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;346&lt;/span&gt; IOPS&lt;span class="o"&gt;][&lt;/span&gt;eta 00m:00s&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; read: &lt;span class="nv"&gt;IOPS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;166, &lt;span class="nv"&gt;BW&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;166MiB/s &lt;span class="o"&gt;(&lt;/span&gt;174MB/s&lt;span class="o"&gt;)(&lt;/span&gt;4991MiB/30005msec&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; lat &lt;span class="o"&gt;(&lt;/span&gt;usec&lt;span class="o"&gt;)&lt;/span&gt;: &lt;span class="nv"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;613, &lt;span class="nv"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;376663, &lt;span class="nv"&gt;avg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;6008.79, &lt;span class="nv"&gt;stdev&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;16419.17
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; iops : &lt;span class="nv"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 20, &lt;span class="nv"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 370, &lt;span class="nv"&gt;avg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;164.86, &lt;span class="nv"&gt;stdev&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;87.26, &lt;span class="nv"&gt;samples&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Run status group &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;all &lt;span class="nb"&gt;jobs&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; READ: &lt;span class="nv"&gt;bw&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;166MiB/s &lt;span class="o"&gt;(&lt;/span&gt;174MB/s&lt;span class="o"&gt;)&lt;/span&gt;, 166MiB/s-166MiB/s &lt;span class="o"&gt;(&lt;/span&gt;174MB/s-174MB/s&lt;span class="o"&gt;)&lt;/span&gt;, &lt;span class="nv"&gt;io&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;4991MiB &lt;span class="o"&gt;(&lt;/span&gt;5233MB&lt;span class="o"&gt;)&lt;/span&gt;, &lt;span class="nv"&gt;run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;30005-30005msec
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;Some of the output has been removed for maintaining brevity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Key metrics to focus on are mainly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;lat&lt;/code&gt;: Latency&lt;/li&gt;
&lt;li&gt;&lt;code&gt;BW&lt;/code&gt;: Bandwidth&lt;/li&gt;
&lt;li&gt;&lt;code&gt;iops&lt;/code&gt;: Number of I/O operations per second&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="parameter-comparison"&gt;Parameter Comparison
&lt;/h2&gt;&lt;p&gt;We&amp;rsquo;ve mentioned earlier the difference between direct and non-direct, big block size and small block size. So let&amp;rsquo;s see
it in practice.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;All commands and tests are done on the same machine to get accurate and consistent results&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="small-block-size-vs-large-block-size"&gt;Small Block size vs Large Block size
&lt;/h3&gt;&lt;p&gt;We&amp;rsquo;ll be using this base command:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sudo fio --name&lt;span class="o"&gt;=&lt;/span&gt;hello-fio &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --numjobs&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --iodepth&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --direct&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --rw&lt;span class="o"&gt;=&lt;/span&gt;write &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --directory&lt;span class="o"&gt;=&lt;/span&gt;/mnt/dev &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --ioengine&lt;span class="o"&gt;=&lt;/span&gt;libaio &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --runtime&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;30&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --size&lt;span class="o"&gt;=&lt;/span&gt;1G &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --time_based
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Let&amp;rsquo;s start with &lt;code&gt;bs=4k&lt;/code&gt;, we get:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;hello-fio: &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;g&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0&lt;span class="o"&gt;)&lt;/span&gt;: &lt;span class="nv"&gt;rw&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;write, &lt;span class="nv"&gt;bs&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;R&lt;span class="o"&gt;)&lt;/span&gt; 4096B-4096B, &lt;span class="o"&gt;(&lt;/span&gt;W&lt;span class="o"&gt;)&lt;/span&gt; 4096B-4096B, &lt;span class="o"&gt;(&lt;/span&gt;T&lt;span class="o"&gt;)&lt;/span&gt; 4096B-4096B, &lt;span class="nv"&gt;ioengine&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;libaio, &lt;span class="nv"&gt;iodepth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; write: &lt;span class="nv"&gt;IOPS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;895, &lt;span class="nv"&gt;BW&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3582KiB/s &lt;span class="o"&gt;(&lt;/span&gt;3668kB/s&lt;span class="o"&gt;)(&lt;/span&gt;105MiB/30001msec&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; zone resets
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; lat &lt;span class="o"&gt;(&lt;/span&gt;usec&lt;span class="o"&gt;)&lt;/span&gt;: &lt;span class="nv"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;358, &lt;span class="nv"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;29885, &lt;span class="nv"&gt;avg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1114.65, &lt;span class="nv"&gt;stdev&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;844.16
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; iops : &lt;span class="nv"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 412, &lt;span class="nv"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 1564, &lt;span class="nv"&gt;avg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;893.86, &lt;span class="nv"&gt;stdev&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;304.19, &lt;span class="nv"&gt;samples&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Run status group &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;all &lt;span class="nb"&gt;jobs&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; WRITE: &lt;span class="nv"&gt;bw&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3582KiB/s &lt;span class="o"&gt;(&lt;/span&gt;3668kB/s&lt;span class="o"&gt;)&lt;/span&gt;, 3582KiB/s-3582KiB/s &lt;span class="o"&gt;(&lt;/span&gt;3668kB/s-3668kB/s&lt;span class="o"&gt;)&lt;/span&gt;, &lt;span class="nv"&gt;io&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;105MiB &lt;span class="o"&gt;(&lt;/span&gt;110MB&lt;span class="o"&gt;)&lt;/span&gt;, &lt;span class="nv"&gt;run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;30001-30001msec
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;And now with &lt;code&gt;bs=1M&lt;/code&gt;, we get:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; write: &lt;span class="nv"&gt;IOPS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;69, &lt;span class="nv"&gt;BW&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;69.8MiB/s &lt;span class="o"&gt;(&lt;/span&gt;73.2MB/s&lt;span class="o"&gt;)(&lt;/span&gt;2094MiB/30011msec&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; zone resets
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; lat &lt;span class="o"&gt;(&lt;/span&gt;msec&lt;span class="o"&gt;)&lt;/span&gt;: &lt;span class="nv"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;6, &lt;span class="nv"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;222, &lt;span class="nv"&gt;avg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;14.29, &lt;span class="nv"&gt;stdev&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10.75
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; iops : &lt;span class="nv"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 38, &lt;span class="nv"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 84, &lt;span class="nv"&gt;avg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;69.76, &lt;span class="nv"&gt;stdev&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 8.34, &lt;span class="nv"&gt;samples&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Run status group &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;all &lt;span class="nb"&gt;jobs&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; WRITE: &lt;span class="nv"&gt;bw&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;69.8MiB/s &lt;span class="o"&gt;(&lt;/span&gt;73.2MB/s&lt;span class="o"&gt;)&lt;/span&gt;, 69.8MiB/s-69.8MiB/s &lt;span class="o"&gt;(&lt;/span&gt;73.2MB/s-73.2MB/s&lt;span class="o"&gt;)&lt;/span&gt;, &lt;span class="nv"&gt;io&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2094MiB &lt;span class="o"&gt;(&lt;/span&gt;2196MB&lt;span class="o"&gt;)&lt;/span&gt;, &lt;span class="nv"&gt;run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;30011-30011msec
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;We have the plot below (generated using Python&amp;rsquo;s &lt;code&gt;matplot lib&lt;/code&gt;) to better summarize and visualize the results:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://habibiops.com/p/io-benchmarking-with-fio-part-1-basics/assets/block_size_4k_1M_comparison.png"
width="3000"
height="1800"
srcset="https://habibiops.com/p/io-benchmarking-with-fio-part-1-basics/assets/block_size_4k_1M_comparison_hu_cf8e0e111f215014.png 480w, https://habibiops.com/p/io-benchmarking-with-fio-part-1-basics/assets/block_size_4k_1M_comparison_hu_439c4eb6869c051b.png 1024w"
loading="lazy"
alt="4k vs 1M Block Size Comparison"
class="gallery-image"
data-flex-grow="166"
data-flex-basis="400px"
&gt;&lt;/p&gt;
&lt;p&gt;We can notice a pattern. The 4k block size has:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Low throughput (an impressive 3.49 MB/s)&lt;/li&gt;
&lt;li&gt;Low latency&lt;/li&gt;
&lt;li&gt;High IOPs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While the 1M block size is the complete opposite:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;High throughput&lt;/li&gt;
&lt;li&gt;High latency&lt;/li&gt;
&lt;li&gt;Low IOPs&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="direct-vs-non-direct"&gt;Direct vs Non-Direct
&lt;/h3&gt;&lt;p&gt;Now let&amp;rsquo;s see the difference with the direct parameter. Just as before, we&amp;rsquo;ll use this base command:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sudo fio --name&lt;span class="o"&gt;=&lt;/span&gt;hello-fio &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --numjobs&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --iodepth&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --rw&lt;span class="o"&gt;=&lt;/span&gt;write &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --directory&lt;span class="o"&gt;=&lt;/span&gt;/mnt/dev &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --ioengine&lt;span class="o"&gt;=&lt;/span&gt;libaio &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --bs&lt;span class="o"&gt;=&lt;/span&gt;4k &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --runtime&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;30&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --size&lt;span class="o"&gt;=&lt;/span&gt;1G &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --time_based
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;It&amp;rsquo;s important to run both tests with the &lt;strong&gt;SAME&lt;/strong&gt; block size to get as accurate of a comparison as possible&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;First, we&amp;rsquo;ll test &lt;code&gt;direct=1&lt;/code&gt; (same as the command in the earlier section):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;hello-fio: &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;g&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0&lt;span class="o"&gt;)&lt;/span&gt;: &lt;span class="nv"&gt;rw&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;write, &lt;span class="nv"&gt;bs&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;R&lt;span class="o"&gt;)&lt;/span&gt; 4096B-4096B, &lt;span class="o"&gt;(&lt;/span&gt;W&lt;span class="o"&gt;)&lt;/span&gt; 4096B-4096B, &lt;span class="o"&gt;(&lt;/span&gt;T&lt;span class="o"&gt;)&lt;/span&gt; 4096B-4096B, &lt;span class="nv"&gt;ioengine&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;libaio, &lt;span class="nv"&gt;iodepth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; write: &lt;span class="nv"&gt;IOPS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;895, &lt;span class="nv"&gt;BW&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3582KiB/s &lt;span class="o"&gt;(&lt;/span&gt;3668kB/s&lt;span class="o"&gt;)(&lt;/span&gt;105MiB/30001msec&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; zone resets
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; lat &lt;span class="o"&gt;(&lt;/span&gt;usec&lt;span class="o"&gt;)&lt;/span&gt;: &lt;span class="nv"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;358, &lt;span class="nv"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;29885, &lt;span class="nv"&gt;avg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1114.65, &lt;span class="nv"&gt;stdev&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;844.16
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; iops : &lt;span class="nv"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 412, &lt;span class="nv"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 1564, &lt;span class="nv"&gt;avg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;893.86, &lt;span class="nv"&gt;stdev&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;304.19, &lt;span class="nv"&gt;samples&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Run status group &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;all &lt;span class="nb"&gt;jobs&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; WRITE: &lt;span class="nv"&gt;bw&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3582KiB/s &lt;span class="o"&gt;(&lt;/span&gt;3668kB/s&lt;span class="o"&gt;)&lt;/span&gt;, 3582KiB/s-3582KiB/s &lt;span class="o"&gt;(&lt;/span&gt;3668kB/s-3668kB/s&lt;span class="o"&gt;)&lt;/span&gt;, &lt;span class="nv"&gt;io&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;105MiB &lt;span class="o"&gt;(&lt;/span&gt;110MB&lt;span class="o"&gt;)&lt;/span&gt;, &lt;span class="nv"&gt;run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;30001-30001msec
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Now running with &lt;code&gt;direct=0&lt;/code&gt; and we get:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; write: &lt;span class="nv"&gt;IOPS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;128k, &lt;span class="nv"&gt;BW&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;499MiB/s &lt;span class="o"&gt;(&lt;/span&gt;523MB/s&lt;span class="o"&gt;)(&lt;/span&gt;14.6GiB/30001msec&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; zone resets
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; lat &lt;span class="o"&gt;(&lt;/span&gt;usec&lt;span class="o"&gt;)&lt;/span&gt;: &lt;span class="nv"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2, &lt;span class="nv"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2795, &lt;span class="nv"&gt;avg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 3.15, &lt;span class="nv"&gt;stdev&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 3.20
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; bw &lt;span class="o"&gt;(&lt;/span&gt; KiB/s&lt;span class="o"&gt;)&lt;/span&gt;: &lt;span class="nv"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 840, &lt;span class="nv"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1247432, &lt;span class="nv"&gt;per&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;100.00%, &lt;span class="nv"&gt;avg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;797612.32, &lt;span class="nv"&gt;stdev&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;385436.40, &lt;span class="nv"&gt;samples&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;37&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; iops : &lt;span class="nv"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; 210, &lt;span class="nv"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;311858, &lt;span class="nv"&gt;avg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;199403.24, &lt;span class="nv"&gt;stdev&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;96359.24, &lt;span class="nv"&gt;samples&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;37&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Run status group &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;all &lt;span class="nb"&gt;jobs&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; WRITE: &lt;span class="nv"&gt;bw&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;499MiB/s &lt;span class="o"&gt;(&lt;/span&gt;523MB/s&lt;span class="o"&gt;)&lt;/span&gt;, 499MiB/s-499MiB/s &lt;span class="o"&gt;(&lt;/span&gt;523MB/s-523MB/s&lt;span class="o"&gt;)&lt;/span&gt;, &lt;span class="nv"&gt;io&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;14.6GiB &lt;span class="o"&gt;(&lt;/span&gt;15.7GB&lt;span class="o"&gt;)&lt;/span&gt;, &lt;span class="nv"&gt;run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;30001-30001msec
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Let&amp;rsquo;s plot the comparison just like before and see:&lt;/p&gt;
&lt;p&gt;We have the following comparison:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://habibiops.com/p/io-benchmarking-with-fio-part-1-basics/assets/direct_non_direct_comparison.png"
width="3000"
height="1800"
srcset="https://habibiops.com/p/io-benchmarking-with-fio-part-1-basics/assets/direct_non_direct_comparison_hu_e7c83c1f86573f63.png 480w, https://habibiops.com/p/io-benchmarking-with-fio-part-1-basics/assets/direct_non_direct_comparison_hu_7f52038974a4cfcb.png 1024w"
loading="lazy"
alt="Direct vs Non-Direct Comparison"
class="gallery-image"
data-flex-grow="166"
data-flex-basis="400px"
&gt;&lt;/p&gt;
&lt;p&gt;Direct I/O performs so badly that we can barely see the throughput. We could&amp;rsquo;ve done some normalization and log scaling,
but the point was to show the real drastic difference between the two.&lt;/p&gt;
&lt;p&gt;Then again, seeing the results from the non-direct I/O, they are just too good to be true. This is the tricky part when
it comes to benchmarking, knowing what results actually make sense, which ones are
realistic, and which ones aren&amp;rsquo;t. Setting &lt;code&gt;direct=0&lt;/code&gt; will throw you off and you&amp;rsquo;d end up testing a completely different
scenario (in this case, the RAM).&lt;/p&gt;
&lt;h3 id="testing-other-operations"&gt;Testing Other Operations
&lt;/h3&gt;&lt;p&gt;Testing for other &lt;code&gt;rw&lt;/code&gt; options is straightforward with FIO, we just need to change &lt;code&gt;read&lt;/code&gt; to any of the supported
options:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;write&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;rw&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;randread&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;randwrite&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;strong&gt;base&lt;/strong&gt; command remains the same. So if we want to test our &lt;code&gt;write&lt;/code&gt; performance, the command would be adjusted as
such:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;span class="lnt"&gt;4
&lt;/span&gt;&lt;span class="lnt"&gt;5
&lt;/span&gt;&lt;span class="lnt"&gt;6
&lt;/span&gt;&lt;span class="lnt"&gt;7
&lt;/span&gt;&lt;span class="lnt"&gt;8
&lt;/span&gt;&lt;span class="lnt"&gt;9
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sudo fio --name&lt;span class="o"&gt;=&lt;/span&gt;hello-fio-write &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --numjobs&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --iodepth&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --rw&lt;span class="o"&gt;=&lt;/span&gt;write &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --directory&lt;span class="o"&gt;=&lt;/span&gt;/mnt/dev/ &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --ioengine&lt;span class="o"&gt;=&lt;/span&gt;libaio &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --bs&lt;span class="o"&gt;=&lt;/span&gt;4k &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --direct&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; --runtime&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;30&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;And that&amp;rsquo;s it, we just had to change one parameter to test a completely different scenario. Which brings us to the next
point, I/O testing can be, and is, very tricky because of that. A single parameter change would result in completely
different outcomes and scenarios. So it&amp;rsquo;s important to carefully define what needs to be tested and which parameters to
tinker with.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;We&amp;rsquo;ve covered plenty of concepts and topics in this post, providing a solid foundation for running and interpreting fio
benchmarks.&lt;/p&gt;
&lt;p&gt;The key takeaway is that running a benchmark is only half the work. Understanding &lt;strong&gt;what&lt;/strong&gt; you’re measuring and &lt;strong&gt;why&lt;/strong&gt;
is just as important. Small parameter changes can lead to vastly different results, and some trial and error is needed
when defining realistic workloads. In the end, benchmark results are only meaningful if the workload matches reality.&lt;/p&gt;
&lt;p&gt;In the next post, we’ll take this a step further by automating fio runs and diving deeper into the JSON output,
including how to parse and analyze results at scale.&lt;/p&gt;</description></item></channel></rss>