Recently we’ve been using VMware’s vSCSI trace tool to capture traces of the IO generated by Windows guest VM’s during user login. The output of vSCSI trace can be transformed into a simple CSV file (normally this is done for replay in ioblazer). Once in this CSV format it’s nice and easy to visualize with gnuplot.
Here is an example of the trace file
#Vscsi Cmd Trace. (Trace Format Version 1) #Serial Number,IO Data Length,Num SG Entries,Command Type,LBN,Time Stamp (microseconds) 2147483692,4096,1,write,24048792,2756778716647 2147483769,4096,1,write,6322464,2756783857269 2147483732,4096,1,write,6234880,2756783857712 2147483711,4096,1,write,6234872,2756783858110 2147483658,4608,2,write,23749656,2756797029103 2147483715,512,1,write,6422064,2756797702520 2147483713,4096,1,write,6322472,2756802107264 2147483766,4096,1,write,6234888,2756802107736
Using the conditional plot argument, I can separate out the reads and writes and plot them separately with their own colour. The conditional syntax is best shown in context. Basically I say plot “vscsi.text”, and for the y-axis use the value in column number five “$5″ if column 4 contains the word “write” stringcolumn(4) eq “write”. Otherwise do nothing ($5:0/0). Use the ‘line color’ 2, which is “Green” and call this series “WRITES”. Then I basically do the same thing but separate out the reads instead.
What we end up with is a plot which has a point for each read or write in the trace file. The y-axis represents the LBA (block address) of the read or write. And the color (Green or Blue) represents whether the operation is a read or a write. Time is effectively on the x-axis. What’s especially nice is that we can see that the accesses are not totally random, and there are clear “bands” where a lot of accesses occur. This is much harder to recognize by just reading the trace file.
gnuplot> plot "vscsi.txt" u :(stringcolumn(4) eq "write" ? $5:0/0) lc 2 t "WRITES" w points, "" u :(stringcolumn(4) eq "read" ? $5:0/0) lc 3 t "READS" w points

It’s so refreshing that a modern tool provides readable output in a format like csv that can be handled by any unix tool, in this case gnuplot. So often we end up with the festering bloated carcass of “text” that is XML.