dotplan

troubleshooting & performance analysis

Conditional plotting in R

Tags: , , ,

Once again plotting vSCSI trace files, this time in “R”.

Read the file “trace-2xx.csv” into a variable called vscsi_trace as a csv file using “read.csv” which splits the file into columns.

> vscsi_trace <- read.csv("/var/tmp/trace-2xx.csv")

Create a "writes" object that contains only the lines that contain the word "WRITE_REQUEST"

> writes <- subset(vscsi_trace,vscsi_trace["WRITE_REQUEST"] == "WRITE_REQUEST")

Create a "reads" object that contains only the lines in the trace that contain "READ_REQUEST"

> reads <- subset(vscsi_trace,vscsi_trace["WRITE_REQUEST"] == "READ_REQUEST")

Now create a plot with the "reads". In the trace file, I want to plot column 4.

> plot(reads[,4],cex=.1,col="green")

and finally add in the writes. Notice that we use the "points" command, not the plot command (which will create a new plot).

> points(writes[,4],cex=.1,col="red")

Here is the plot as .PDF.
r_plot_vscsi_trace

Plot vSCSI trace using conditional argument in gnuplot

Tags: , , ,

Recently we’ve been using VMware’s vSCSI trace tool to capture traces of the IO generated by Windows guest VM’s during user login. The output of vSCSI trace can be transformed into a simple CSV file (normally this is done for replay in ioblazer). Once in this CSV format it’s nice and easy to visualize with gnuplot.

Here is an example of the trace file

#Vscsi Cmd Trace. (Trace Format Version 1)
#Serial Number,IO Data Length,Num SG Entries,Command Type,LBN,Time Stamp (microseconds)
2147483692,4096,1,write,24048792,2756778716647
2147483769,4096,1,write,6322464,2756783857269
2147483732,4096,1,write,6234880,2756783857712
2147483711,4096,1,write,6234872,2756783858110
2147483658,4608,2,write,23749656,2756797029103
2147483715,512,1,write,6422064,2756797702520
2147483713,4096,1,write,6322472,2756802107264
2147483766,4096,1,write,6234888,2756802107736

Using the conditional plot argument, I can separate out the reads and writes and plot them separately with their own colour. The conditional syntax is best shown in context. Basically I say plot “vscsi.text”, and for the y-axis use the value in column number five “$5″ if column 4 contains the word “write” stringcolumn(4) eq “write”. Otherwise do nothing ($5:0/0). Use the ‘line color’ 2, which is “Green” and call this series “WRITES”. Then I basically do the same thing but separate out the reads instead.

What we end up with is a plot which has a point for each read or write in the trace file. The y-axis represents the LBA (block address) of the read or write. And the color (Green or Blue) represents whether the operation is a read or a write. Time is effectively on the x-axis. What’s especially nice is that we can see that the accesses are not totally random, and there are clear “bands” where a lot of accesses occur. This is much harder to recognize by just reading the trace file.


gnuplot> plot "vscsi.txt" u :(stringcolumn(4) eq "write" ? $5:0/0) lc 2 t "WRITES" w points, "" u :(stringcolumn(4) eq "read" ? $5:0/0) lc 3 t "READS" w points

It’s so refreshing that a modern tool provides readable output in a format like csv that can be handled by any unix tool, in this case gnuplot. So often we end up with the festering bloated carcass of “text” that is XML.

© 2009 dotplan. All Rights Reserved.

This blog is powered by Wordpress and Magatheme by Bryan Helmig.