dotplan

troubleshooting & performance analysis

  • Author: gary
  • Published: Jan 26th, 2012
  • Category: how-to
  • Comments: None

Plotting NFS Reads and Writes from packet trace.

Tags: , , , ,

lovebox-4:[~/data/traces] $ tshark -r trace.trc -R nfs > trace.out

lovebox-4:[~/data/traces] $ gnuplot

gnuplot> plot "trace.out" u 2:(stringcolumn(8) eq "WRITE" ? $1:0/0 ) lc 2 t "WRITES" w points, "" u 2:(stringcolumn(8) eq "READ" ? $1:0/0) lc 3 t "READS" w dots
gnuplot> set term png
gnuplot> set output "trace.png"
gnuplot> replot

  • Author: gary
  • Published: Jan 25th, 2012
  • Category: how-to
  • Comments: None

Using read filters with tshark and NFS.

Tags: , ,

Typically, I use the GUI version of wireshark to see how to specify the read filter, then use tshark at the command line to make use of all the CLI goodness of Unix.

Display NFS_LOOKUP Calls

 tshark -R "nfs.procedure_v3 == 3" -r sometracefile.trc

Display NFS_GETATTR Calls

 tshark -R "nfs.procedure_v3 == 1" -r sometracefile.trc

Display NFS_SETATTR Calls

 tshark -R "nfs.procedure_v3 == 2" -r sometracefile.trc

Display NFS_ACCESS Calls

 tshark -R "nfs.procedure_v3 == 4" -r sometracefile.trc


Display NFS_LINK Calls

 tshark -R "nfs.procedure_v3 == 15" -r sometracefile.trc

Unix limits in OS X

TAGS: None

When running a lot of unix commands via scripts, it’s quite easy to hit the maximum allowable processes per user, or the maximum number of open files per user.

My OS 10.6.8 mac has the following limits defined.


lovebox-4:[~] $ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 266
virtual memory          (kbytes, -v) unlimited

A user can increase these limits directly from the shell using the ulimit command.

ulimit -n

Changes the maximum allowable number of concurrently open files.

ulimit -u

Changes the maximum allowable concurrent processes.

These limits are in place to stop a user from hurting the overall system for other users. Typically on an OS X system – there is only one user, so it’s quite OK to increase the limits. Another thing to know is that the user can only increase the limit up to the maximum allowed by the system. This system maximum can also be changed (by the system administrator / root) – again on most macs that will be the same user. So how to do this?.

First, let’s see what the system limits are using the command

sysctl
 lovebox-4:[~] $ sysctl -a | egrep '(maxfiles|maxproc)'

kern.maxproc = 532
kern.maxfiles = 12288
kern.maxfilesperproc = 102480
kern.maxprocperuid = 266
kern.maxproc: 532
kern.maxfiles: 12288
kern.maxfilesperproc: 20480
kern.maxprocperuid: 512

Now, I want to change those system maximums, so that I can change the user maximums too.

sudo sysctl -w kern.maxfilesperproc=20480
kern.maxfilesperproc: 10240 -> 20480
sudo sysctl -w kern.maxprocperuid=512
kern.maxprocperuid: 266 -> 512

So, now I have higher system limits, I need to tell OS X that I want to use those larger limits. Again, the system allows the user to protect himself by restricting himself to a lower limit.

Next, open .bashrc and put the new limits in there

ulimit -n 1024
ulimit -u 512

You may not be able to set the new limits directly in an existing shell even though the kernel maximums were changed. However a newly executed shell should have the new limits

e.g. In an existing shell

lovebox-4:[~] $ ulimit -n 1024
-bash: ulimit: open files: cannot modify limit: Operation not permitted

In a newly created shell window

lovebox-4:[~] $ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 512
virtual memory          (kbytes, -v) unlimited

So, here’s something non-intuative

lovebox-4:[~] $ ulimit -n 1024  <--- Works because we set 1024 in bashrc
lovebox-4:[~] $ ulimit -n 512  <---- Lower the limit
lovebox-4:[~] $ ulimit -n 1024  <--- Now it cannot be raised in this shell, or any descendents.
-bash: ulimit: open files: cannot modify limit: Operation not permitted

Now, that's sorted out our shells. To do a similar thing but for processes invoked from 'finder' e.g. Web browser or something. We need to mess with launchd...

lovebox-4:[~] $ sudo launchctl limit
	cpu         unlimited      unlimited
	filesize    unlimited      unlimited
	data        unlimited      unlimited
	stack       8388608        67104768
	core        0              unlimited
	rss         unlimited      unlimited
	memlock     unlimited      unlimited
	maxproc     266            532
	maxfiles    256            unlimited

lovebox-4:[~] $ sudo launchctl limit maxproc 1024
lovebox-4:[~] $ sudo launchctl limit
cpu unlimited unlimited
filesize unlimited unlimited
data unlimited unlimited
stack 8388608 67104768
core 0 unlimited
rss unlimited unlimited
memlock unlimited unlimited
maxproc 1024 1024
maxfiles 256 unlimited

^This may not be persistent, and will only be picked up when new processes are launched. In other words, if an already executing process has reached the maximum number of open files – doing the above will not help until the process is re-started.

Cannot see NetApp LUN’s from Linux?

Tags: , , ,

After some connectivity swap-a-roos in the lab, I could no longer see my LUNS from the linux host attached to my filer.

In this case I am using a QLogic HBA – and I am not using any of the NetApp host side tools – just the sanlun tool.

Using the SANsurfer Menu (/opt/QLogic_Corporation/SANsurferCLI) I can tell that this linux host can see the filers’ LUNS over FC. But there are no SCSI /dev/sdX devices for them, and so Linux cannot use them…

Here’s how I checked to see that there was FC connectivity – which also confirms that the FC protocol is working.

	SANsurfer FC/CNA HBA CLI

	v1.7.2 Build 7

    Main Menu

    1:	General Information  <---- Option 1
    2:	HBA Information
    3:	HBA Parameters
    4:	Target/LUN List
    5:	iiDMA Settings
...

    General Information Menu

    1:	Host Information
    2:	Host Topology
    3:	Report     <---- Option 3..
    4:	Refresh
    5:	Return to Previous Menu

	Note: 0 to return to Main Menu
	Enter Selection: 1

   Report Menu

    HBA Model QLE2462
      1: Port   1: WWPN: 21-00-00-E0-8B-9B-C5-36 Online
      2: Port   2: WWPN: 21-01-00-E0-8B-BB-C5-36 Online
      3: All HBAs  <---- Option 3
      4: Return to Previous Menu

	Note: 0 to return to Main Menu
	Enter Selection: 3

I could see that there was connectivity from the Linux host to the filer

---------------------------------------
LUN 1
---------------------------------------
Product Vendor                    : NETAPP
Product ID                        : LUN
Product Revision                  : 811a
LUN                               : 1
Size                              : 17.93 GB
Type                              : SBC-2 Direct access block device
			           (e.g., magnetic disk)
WWULN                             : 4E-45-54-41-50-50-20-20-20-4C-55-4E-20-32-46-68
			           72-53-3F-2D-68-4F-79-6C-33-00-00-00-00-00-00-00
OS LUN Name                       :

From the filer side, I could see that the host's FC adapters had connected to the filer,
and were in the right igroup

filer1*> igroup show
    filer1 (FCP) (ostype: linux):
        21:00:00:e0:8b:9b:c5:36 (logged in on: 0a)
        21:01:00:e0:8b:bb:c5:36 (logged in on: 0b)

The only thing that was missing was that there were no 'sd' devices created in Linux for these devices.

"sanlun" utility was not helpful and just told me that there wer no LUNs mapped.

The solution was to issue this very odd looking command

linuxhost:[/sys/class/scsi_host] $ echo "- - -" > host0/scan

This caused the sd devices to be created, representing the NetApp LUNs which I knew could already be seen over FC. Since I have both ports on the same HBA attached to the filer, host0 scan created my /dev/sdc* devices, and host1/scan created my /dev/sdd* devices.

The shell 'hung' for the duration of the command, and I would expect that Linux was off in kernel land for some time - and so i would NOT recommend issuing the command on a production server.

I'm still puzzled why the linux host did not see the luns even after reboot though.

David Patterson : 1988 RAID Paper

Tags: ,

David Pattersons seminal paper on RAID

FastFS Paper 1984 (Joy/McKusick)

Tags: , ,

Seminal NetApp / storage papers.

Tags: , ,

I am mentoring a new starter at NetApp and so I found a couple of papers which discuss at a high level some of the problems that WAFL and RAID set out to solve. Both papers are quite old, but are interesting in that they discuss the big picture of the original problem.

A Storage Networking Appliance
This is a great paper which discusses the principles of the “filer” concept.

File System Design for an NFS
File Server Appliance

AWK needs -f when running as intrerpreter

Tags: ,

Just remember, that when writing self contained awk scripts, the command line interpreter line (i.e. the #! line) needs to have -f at the end. Otherwise you will get all sorts of divide by zero errors, and other similarly non-helpful messages.

#!/bin/awk -f

#!/opt/local/bin/gawk -f

NetApp performance related documentation.

Tags: , ,

Just recently an ex-colleague from Sun asked me to send him some background info on NetApp filer performance. Normally the filer can just be plugged in and it ought to be ready to go without much tweaking. The following documents give some background on how to setup a filer in various environments.

Performance (Bug) Advocacy

TAGS: None

Courtesy of Cem Kaner you are employed as a performance analyst, then the following probably applies to you too. I consider performance defects as a class of bug, which need to be fixed by some other entity (maybe a programmer, or an architect or sysadmin). This means you’re about to create work for some other person, who might prefer to be watching Star Wars, drinking coffee or spending time with their family.

1. The point of testing is to find bugs.

2. Bug reports are your primary
work product.
This is what people outside
of the testing group will most notice and most
remember of your work.

3. The best tester isn’t the one who finds the most bugs or who embarrasses the most programmers. The best tester is the one who gets the most bugs fixed.

4. Programmers operate under time constraints and competing priorities. For example, outside of the 8- hour workday, some programmers prefer sleeping and watching Star Wars to fixing bugs.

A bug report is a tool that you use to sell the
programmer on the idea of spending her time
and energy to fix a bug.

Go here for the full text.. I
———————————————————————–

© 2009 dotplan. All Rights Reserved.

This blog is powered by Wordpress and Magatheme by Bryan Helmig.