dotplan

troubleshooting & performance analysis

  • Author: gary
  • Published: Jan 26th, 2012
  • Category: how-to
  • Comments: None

Plotting NFS Reads and Writes from packet trace.

Tags: , , , ,

lovebox-4:[~/data/traces] $ tshark -r trace.trc -R nfs > trace.out

lovebox-4:[~/data/traces] $ gnuplot

gnuplot> plot "trace.out" u 2:(stringcolumn(8) eq "WRITE" ? $1:0/0 ) lc 2 t "WRITES" w points, "" u 2:(stringcolumn(8) eq "READ" ? $1:0/0) lc 3 t "READS" w dots
gnuplot> set term png
gnuplot> set output "trace.png"
gnuplot> replot

  • Author: gary
  • Published: Jan 25th, 2012
  • Category: how-to
  • Comments: None

Using read filters with tshark and NFS.

Tags: , ,

Typically, I use the GUI version of wireshark to see how to specify the read filter, then use tshark at the command line to make use of all the CLI goodness of Unix.

Display NFS_LOOKUP Calls

 tshark -R "nfs.procedure_v3 == 3" -r sometracefile.trc

Display NFS_GETATTR Calls

 tshark -R "nfs.procedure_v3 == 1" -r sometracefile.trc

Display NFS_SETATTR Calls

 tshark -R "nfs.procedure_v3 == 2" -r sometracefile.trc

Display NFS_ACCESS Calls

 tshark -R "nfs.procedure_v3 == 4" -r sometracefile.trc


Display NFS_LINK Calls

 tshark -R "nfs.procedure_v3 == 15" -r sometracefile.trc
  • Author: gary
  • Published: Jan 19th, 2010
  • Category: how-to
  • Comments: None

Collecting NFS trace data for performance tasks.

Tags: , ,

When collecting network traces to solve performance problems, we need to balance the need to capture enough data to re-construct the upper protocols with the need to limit the amount of trace data collected overall. When capturing NFS traffic, the packet size can grow to be quite large, since the NFS packets will contain the “payload” data i.e. the data that the userland application is reading or writing.

Most of the tools used to collect network trace data support the ‘snaplen’ option which controls how much data to capture per packet. For NFS traces a snaplen of 256 bytes seems to work well. With a snaplen of 256 the trace analysers will have enough data to reconstruct the filehandles, offset and IO size. Tools like tshark will be able to match the NFS response to the request and so you can see the response time from the NFS server.

In this trace below, the total response time is from the first NFS READ Call (packet 13) until the final RPC Continuation in the sequence. In NFS the NFS V3 READ Reply (Packet 14) is just the beginning of the NFS server response. The bulk of the payload is in the “RPC Continuation” packets. The number of Continuation packets will depend on the size of the NFS request and the size of the Ethernet frames. In the trace below, the IO size is 32K, the Ethernet segment size is 1500 and the IO stream is single threaded.

I use the tshark option “-R rpc” to show the “NFS READ calls” as well as the “RPC Continuation” responses from the NFS server.

The command line.

lovebox:test_snaplen gjl$ tshark -r snap256_from_open -R rpc|head -35

The trace.

 13   3.529598 192.168.10.3 -> 192.168.10.4 NFS V3 READ Call, FH:0x44281a4d Offset:0 Len:32768
 14   3.529844 192.168.10.4 -> 192.168.10.3 NFS V3 READ Reply (Call In 13) Len:32768[Unreassembled Packet]
 15   3.529856 192.168.10.4 -> 192.168.10.3 RPC Continuation
 17   3.529872 192.168.10.4 -> 192.168.10.3 RPC Continuation
 18   3.529882 192.168.10.4 -> 192.168.10.3 RPC Continuation
 20   3.529896 192.168.10.4 -> 192.168.10.3 RPC Continuation
 21   3.529905 192.168.10.4 -> 192.168.10.3 RPC Continuation
 23   3.529921 192.168.10.4 -> 192.168.10.3 RPC Continuation
 24   3.529930 192.168.10.4 -> 192.168.10.3 RPC Continuation
 26   3.529944 192.168.10.4 -> 192.168.10.3 RPC Continuation
 27   3.529955 192.168.10.4 -> 192.168.10.3 RPC Continuation
 29   3.529970 192.168.10.4 -> 192.168.10.3 RPC Continuation
 30   3.529980 192.168.10.4 -> 192.168.10.3 RPC Continuation
 32   3.529997 192.168.10.4 -> 192.168.10.3 RPC Continuation
 33   3.530005 192.168.10.4 -> 192.168.10.3 RPC Continuation
 35   3.530019 192.168.10.4 -> 192.168.10.3 RPC Continuation
 36   3.530029 192.168.10.4 -> 192.168.10.3 RPC Continuation
 38   3.530044 192.168.10.4 -> 192.168.10.3 RPC Continuation
 39   3.530053 192.168.10.4 -> 192.168.10.3 RPC Continuation
 41   3.530068 192.168.10.4 -> 192.168.10.3 RPC Continuation
 42   3.530077 192.168.10.4 -> 192.168.10.3 RPC Continuation
 44   3.530092 192.168.10.4 -> 192.168.10.3 RPC Continuation
 45   3.530100 192.168.10.4 -> 192.168.10.3 RPC Continuation
 47   3.530115 192.168.10.4 -> 192.168.10.3 RPC Continuation
 48   3.530197 192.168.10.3 -> 192.168.10.4 NFS V3 READ Call, FH:0x44281a4d Offset:32768 Len:32768

If you are processing someone elses trace data, one clue that the snaplen was too short is that the NFS reads do not contain the offset or Len values. In this example the snaplen was 192. Notice that packet 27 does not show the offset of the read, compare to packet 13 in the above trace, where the snaplen was 256.

 17  30.646984 192.168.10.3 -> 192.168.10.4 NFS V3 ACCESS Call, FH:0x34d96998
 18  30.647233 192.168.10.4 -> 192.168.10.3 NFS V3 ACCESS Reply (Call In 17)
 20  30.647279 192.168.10.3 -> 192.168.10.4 NFS V3 GETATTR Call, FH:0x44281a4d
 21  30.647472 192.168.10.4 -> 192.168.10.3 NFS V3 GETATTR Reply (Call In 20)  Regular File mode:0666 uid:0 gid:0
 22  30.647501 192.168.10.3 -> 192.168.10.4 NFS V3 ACCESS Call, FH:0x44281a4d
 23  30.647727 192.168.10.4 -> 192.168.10.3 NFS V3 ACCESS Reply (Call In 22)
 27  30.648254 192.168.10.3 -> 192.168.10.4 NFS V3 READ Call[Packet size limited during capture]
 28  30.648539 192.168.10.4 -> 192.168.10.3 NFS V3 READ Reply (Call In 27) Len:32768[Packet size limited during capture]

In wireshark, there are a number of settings which control how wireshark/tshark attempts to re-assemble rpc/tcp/nfs packets. Enabling “more aggressive” RPC decoding seemed to mess up the ability to show the RPC continuation packets. Here are the RPC settings from the wireshark preferences used to generate this data. These prefs also affect the way tshark works.

lovebox:test_snaplen gjl$ grep rpc ~/.wireshark/preferences
dcerpc.desegment_dcerpc: TRUE
dcerpc.reassemble_dcerpc: TRUE
rpc.desegment_rpc_over_tcp: TRUE
rpc.defragment_rpc_over_tcp: TRUE
rpc.max_tcp_pdu_size: 4194304
rpc.dissect_unknown_programs: FALSE
rpc.find_fragment_start: FALSE
rpcap.desegment_pdus: TRUE
rpcap.decode_content: TRUE
rpcap.linktype: 0

Detailed description of an NFS Read

Tags:

Here’s a blog post by a guy at Sun which goes into very deep detail on what happens when a read() is done against a file on an NFS filesystem. Amongst other things it explains, why we see GETATTR’s turn up at the NFS server, even when the attributes should be cached locally (the answer is that a GETATTR is performed at file open() time regardless of the timeouts on the attribute cache) There is a 40 page PDF document linked to the post.

http://blogs.sun.com/vallishguru/entry/solaris_kernel_nfs_client_s

© 2009 dotplan. All Rights Reserved.

This blog is powered by Wordpress and Magatheme by Bryan Helmig.