Analysing NetApp sysstat PT1: The CP columns
Thursday, 4 January 2007
The CP type is displayed in the CP/ty column in the output of sysstat. The CP type column contains two pieces of data. The cause of the CP (the CP type) – the first character, and the ‘phase’. The second character. In the output below, the first row shows a CP type “T” and phase of “f”. The second row shows a cp type of “:” which just means that the same CP was still ongoing when sysstat sampled the internal CP counters the next time. We get a bit more insight into this process in the CP/time column which just represents the amount of the sample time which was spent in the CP. So, in the case where we are sampling every second – the entire CP took about 1/2 a second in the first sample, and then continued over into the second sample for 17% of one second. So, we can assume that the CP started in the second half of the first sample period, and continued a little into the next sample.
filer*> sysstat -x 1 CP CP Disk FCP iSCSI FCP kB/s iSCSI kB/s time ty util in out in out 54% Tf 92% 183 0 4 145228 0 0 17% : 96% 201 0 5 149234 0 0
The table below describes the CP types as per the sysstat manpage. In a system with a small amount of incoming data (or none at all) there will be an artificially generated CP caused by a timer, which fires once every 10 seconds, this is the CP type “T”.
When there is a high incoming data rate, the filer tries to free up resources before they become exhausted, which would mean that the user would see high write latencies. When everything is working well, as is normally the case – the incoming write latency is the time it takes to write the users data into NVRAM (plus the network round trip time). This is how the filer is able to achieve extremely high write rates, even for a random IO pattern. If the filer is not able to keep up with the incoming workload, it will sometimes show CP type “B” which can mean that the filer is continually in a state of CP, and the user workload can see higher latencies as a result.
The log full CP literally means that the NVRAM log is 50% full, and so the filer must write this dirty data out to disk. Whilst the CP is happening, the other 50% of the NVRAM is used to accept more incoming data, and so the clients should see the same low latency as when there is no CP ongoing.
A CP type of “H” indicates that the filer has a large number of dirty buffers in the system, and even though the NRVRAM is not yet 50% full, the filer issues a CP in order to free up RAM. This CP type is sometimes seen on filers with a small amount of RAM, and a large incoming write rate with a small IO size to random offsets.
CP type “Z” is often caused by snapshot creation or deletion. Remember that snapshot deletion can be triggered by several conditions besides a user typing “snap delete” at the console. Examples of ‘automatic’ deletion are.
- Snapshot deletion to recover space on the aggregate
- Snapshot deletion, to maintain a specific snap schedule
|CP Types||CP Phases|
|B – Back to back CPs (CP generated CP)||0 – Initializing|
|b – Deferred back to back CPs (CP generated CP)||n – Processing normal files|
|F – CP caused by full NVLog||s – Processing special files|
|H – CP caused by high water mark||f – Flushing modified data to disk|
|L – CP caused by low water mark||v – Flushing modified superblock to disk|
|S – CP caused by snapshot operation|
|T – CP caused by timer|
|U – CP caused by flush|
|Z – CP caused by internal sync|
|: continuation of CP from previous interval|
|# continuation of CP from previous interval, and the NVLog for the next CP is now full, so that the next CP will be of type B.|