A few weeks prior to our first Microsoft SQL deployment, I was trying to benchmark a Virtual Database running MS SQL on the Delphix Appliance. I picked SQLIO as a starting point. SQLIO is an IO benchmark that Microsoft recommends to assess storage systems behind SQL installations. SQLIO creates a file on the target disk and measures its capacity/speed using reads/writes. It can perform IO with various block sizes, sequential or random etc.
One problem I discovered right away is that, SQLIO relies on zero blocks to populate its datasets (it writes 0x0). This is generally fine for a naive storage system. But most modern storage systems often identify this pattern and end up gaming the benchmark, unintentionally. ZFS for example, detects zero blocks and compresses them, leaving only meta data writes to IO. So the actual storage devices only see a fraction of the traffic that SQLIO generates. SQLIO reported numbers which were too good to be true. The screenshot below shows output from one of the runs. An 8K sequential write test resulted in ~21,000 IOPS, at 170 MB/sec of aggregate write bandwidth. The majority of the IOs were serviced in < 2ms ! None of the caches in my stack were large enough to capture this traffic and account for the latency. ZFS was compressing the zero blocks to nothing, doing only metadata writes to the actual storage. It was gaming the benchmark unintentionally, thereby producing phenomenal IOPS and bandwidth/latency characteristics. Add to that, ZFS does not take credit for the phenomenal storage savings achieved The compression ratio does not account for the zero blocks eliminated.
ZFS is not the only storage system that exposes this problem with SQLIO. There are other storage processors and SSDs that will do zero block detection as a way to improve storage utilization and performance. Using SQLIO to characterize such IO systems will result in gross miss-characterization of the underlying storage. Note that this problem exists regardless of whether you are running a read or a write test.
SQLIOSIM is another benchmark Microsoft recommends to evaluate storage systems. SQLIOSIM performs parallel IO operations to multiple files on the target system to mimic traffic generated during normal operation of an SQL database. This benchmark also suffers from the same problem, uses zero blocks to populate some of its datasets.
There are other alternatives for benchmarking IO that do not suffer from this problem. I found FIO to be the most useful among them. First, it uses random data to populate the datasets, so it is agnostic to zero block detection. It has all of the capabilities of SQLIO and more. It is available for multiple environments -you can compare results in a heterogeneous environment. It is open source, so you can customize it for your environment. If you are trying to benchmark your storage system, FIO is a good benchmark to use.
If you need to evaluate your storage under database load, we have developed a wrapper for FIO that can be used. This wrapper will generate random reads, sequential reads and sequential writes, typical of database systems. More details at Kyle’s git repository here. This could be a good replacement for something like SQLIOSIM.
There are other nuances that you should be wary of, when evaluating IO systems, especially those backing databases. I will talk about a couple of them in my next post. The best thing to do is understand the limitations of your benchmark and question everything that does not look right. If the results look too good to be true, they probably are.