I started using sar -d to look at disk performance on a Linux system this week and had to look up what some of the returned numbers meant. I’ve used sar -d on HP Unix but the format is different.
Here is an edited output from a Linux VM that we are copying files to:
$ sar -d 30 1 Linux 2.6.32-504.3.3.el6.x86_64 (myhostname) 04/01/2015 _x86_64_ (4 CPU) 05:26:55 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 05:27:25 PM dev253-9 7669.55 2.44 61353.93 8.00 35.39 4.61 0.03 19.80
I edited out the real host name and I removed all the lines with devices except the one busy device, dev253-9.
Earlier today I got confused and thought that rd_sec/s meant read I/O requests per second but it is not. Here is how the Linux man page describes rd_sec/s:
Number of sectors read from the device. The size of a sector is 512
bytes.
In the example above all the activity is writing so if you look at wr_sec/s it is the same kind of measure of activity:
Number of sectors written to the device. The size of a sector is 512
bytes.
So in the example you have 61353.93 512 byte sectors written per second. Divide by 2 to get kilobytes = 30676 KB/sec. Divide by 1024 and round-up to get 30 megabytes per second.
But, how many write I/O operations per second does this translate to? It looks like you can’t tell in this listing. You can get overall I/O operations per second including both reads and writes from the tps value which the man page defines as:
Total number of transfers per second that were issued to physical
devices. A transfer is an I/O request to a physical device. Multiple
logical requests can be combined into a single I/O request to the
device. A transfer is of indeterminate size.
Of course there aren’t many read requests so we can assume all the transfers are writes so that makes 7669.55 write IOPS. Also, you can find the average I/O size by dividing rd_sec/s + wr_sec/s by tps. This comes out to just about 8 which is the same as avgrq-sz which the man page defines as
The average size (in sectors) of the requests that were issued to the
device.
So, avgrq-sz is kind of superfluous since I can calculate it from the other values but it means that our average I/O is 8 * 512 bytes = 4 kilobytes. This seems like a small I/O size considering that we are copying large data files over NFS. Hmmm.
Also, the disk device is queuing the I/O requests but the device is only in use 19% of the time. Maybe there are bursts of 4K writes which queue up and then gaps in activity? Here are the definitions for the remaining items.
avgqu-sz
The average queue length of the requests that were issued to the
device.
await
The average time (in milliseconds) for I/O requests issued to the
device to be served. This includes the time spent by the requests in
queue and the time spent servicing them.
svctm
The average service time (in milliseconds) for I/O requests that were
issued to the device.
%util
Percentage of CPU time during which I/O requests were issued to the
device (bandwidth utilization for the device). Device saturation
occurs when this value is close to 100%.
The service time is good – only .03 milliseconds – so I assume that the I/Os are writing to a memory cache. But the total time is higher – 4.61 – which is mostly time spent waiting in the queue. The average queue length of 35.39 makes sense given that I/Os spend so much time waiting in the queue. But it’s weird that utilization isn’t close to 100%. That’s what makes me wonder if we are having bursts of activity.
Anyway, I have more to learn but I thought I would pass along my thoughts on Linux’s version of sar -d.
– Bobby
P.S. Here is the output on HP-UX that I am used to:
HP-UX myhostname B.11.31 U ia64 04/02/15 11:27:14 device %busy avque r+w/s blks/s avwait avserv 11:27:44 disk1 1.60 0.50 3 95 0.00 10.27 disk6 0.03 0.50 1 6 0.00 0.64 disk15 0.00 0.50 0 0 0.00 3.52 disk16 100.00 0.50 337 5398 0.00 5.52
r+w/s on HP-UX sar -d seems to be the equivalent of tps on Linux. blks/s on HP-UX appears to be the same as rd_sec/s + wr_sec/s on Linux. The other weird difference is that in HP-UX avwait is just the time spent in the queue which I believe is equal to await – svctm on Linux. I am more accustomed to the HP-UX tool so I needed to get up to speed on the differences.
Very good one.
Thanks!
good review, thanks
Thanks! FYI, I updated the post with the HP-UX version for comparison.