Materials

Method

Like the sequential read/write zfs benchmarks the bash script creates zfs vdevs of a certain size, ashift, volblocksize and strategy (raidz1, raidz2, raidz3, mirror and so on). Second it starts fio with a number of parameters and repeats this 5 times. Third, results are collected in a .csv file for analysis.

Because every pool need to be written first almost entirely, an acceptable volblocksize and io blocksize is determined. This saves a lot of time. Further iops benchmarks are carried out with this volblocksize and io blocksize only

In Open Office, each set of 5 measurements was averaged. In some charts, because of jitter, the standard deviation for writing is presented in an errorbar.

The results are adjusted for data disks, so for example a raidz1 of 5 disks will appear as 4 data disks in the charts.

The y-axis show scaling instead of iops, to make the results less machine specific.

Raw data

The resulting .csv data sits here. The Open Office version with aggregated results is there

This is the data file for gnuplot, made from the .ods file, and that is the file you can load in gnuplot to have the scaling graphs in PNG format.


Which volblocksize to use as shortcut?

Benchmarking every combination of array strategy and volblocksize over a range of disks is just too time consuming. For realistic random (read) iops a zvol needs to be written entirely. Have a month? So is there a single volblocksize we can use for benchmarking? If so, this would save writing terabytes of data for each combination. The results below are for a 8-disk zfs stripe with various volblocksizes and io blocksizes.

For an 8 disk stripeset, it would appear 32k is the best volblocksize for random iops at any io blocksize. However, given the fact that a 64k volblocksize performs better for sequential throughput, 64k might be a much better real-world choice for the other benchmarks. Obviously, this generalizes these findings to other pool strategies and at other sizes. This may may not be true. Indeed, to be certain, a month of benchmarking time, or more, would be needed.

Random read/write IOPS scaling comparison

"Hey what's wrong with the random writes?!"

The reason the random write IOPS are disastrous due to the way this benchmark was setup. To get a real-life figure for the random reads, the volume was pre-written entirely. That doesn't leave ZFS any room to serialize random writes. So basically, this is what happens to your random write performance on a 100% full volume.