dotplan

troubleshooting & performance analysis

Quick search & replace in enormous files using sed

Tags: ,

I have a very large file, which is full of directory listings. The file is around 500mb. I want to do a simple replace of “vol2″ to become “vol4″. Using vi or vim would require loading the entire file into memory (and is more difficult to script).

The file is called “vol2_first_40k.out”, and the format of the file looks like this

[root@unix tmp]# head vol2_first_40k.out
/vol2/dirA1/dirA11/file-bCOST@0paeRg`FIbONu]Tn_`n2BNsE\N5HuCaa7cmXw0H[K_n
/vol2/dirA1/dirA11/file-Vzi5oCec@]7cw2bQmBATD4Sf;4[\;EdPWMvO>JX5hvhY58dC@
/vol2/dirA1/dirA11/file-V?4LB<^SrhdT^CfGSod5d0eC8vvPf[QMa1wfcUz]:5O_TIett
/vol2/dirA1/dirA11/file-[tM_^[]dykcavgBtuD6Eo@_>8veEJNsKF5nYJ@^uo4:vXz6dA
/vol2/dirA1/dirA11/file-]cbNkxo_]tS2bTOC;TzxIv^>kk6?mazmN7BPE[jYeYD3<=YQX
/vol2/dirA1/dirA11/file-XyA5eT;KV8^wGsk1eCnDG\@4jPS0?OtP3IPAa[TfXU@jT`cF?
/vol2/dirA1/dirA11/file-O@YLYLL6sDqYXG4L=Sg477CTndn_;wF\GB0MmafH\Na^KEuzW
/vol2/dirA1/dirA11/file-xm<9mzj:MVmKE`IGCzs:lU8]7bWlmPxIkrVUDPPNddGdXCk5S
/vol2/dirA1/dirA11/file-agaj]\8_yMz5V@hLuVFm94e6JD0QYSslQgcHgx^oKLXVovCEt
/vol2/dirA1/dirA11/file-3`PC;tshre0ZzgJAb;xoyDszAHuR76dDAuvp5w@95Y7nFYtWf

Using sed, I can do the search/replace and send the output to a new filename e.g. vol4_first_40k.out I use the simplest sed command ‘s’ which just means substitute. Here I am substituting each instance of the string “vol2″ with the string “vol4″. Often, very simple search/replace is all that’s needed. Anything more complex, is often easier to do (for me at least) in something like Python.

[root@unix tmp]# sed "s/vol2/vol4/" vol2_first_40k.out  > vol4_first_40k.out

Or to test the replacement, before operating on the entire file


[root@unix tmp]# sed "s/vol2/vol4/" vol2_first_40k.out |head
/vol4/dirA1/dirA11/file-bCOST@0paeRg`FIbONu]Tn_`n2BNsE\N5HuCaa7cmXw0H[K_n
/vol4/dirA1/dirA11/file-Vzi5oCec@]7cw2bQmBATD4Sf;4[\;EdPWMvO>JX5hvhY58dC@
/vol4/dirA1/dirA11/file-V?4LB<^SrhdT^CfGSod5d0eC8vvPf[QMa1wfcUz]:5O_TIett
/vol4/dirA1/dirA11/file-[tM_^[]dykcavgBtuD6Eo@_>8veEJNsKF5nYJ@^uo4:vXz6dA
/vol4/dirA1/dirA11/file-]cbNkxo_]tS2bTOC;TzxIv^>kk6?mazmN7BPE[jYeYD3<=YQX
/vol4/dirA1/dirA11/file-XyA5eT;KV8^wGsk1eCnDG\@4jPS0?OtP3IPAa[TfXU@jT`cF?
/vol4/dirA1/dirA11/file-O@YLYLL6sDqYXG4L=Sg477CTndn_;wF\GB0MmafH\Na^KEuzW
/vol4/dirA1/dirA11/file-xm<9mzj:MVmKE`IGCzs:lU8]7bWlmPxIkrVUDPPNddGdXCk5S
/vol4/dirA1/dirA11/file-agaj]\8_yMz5V@hLuVFm94e6JD0QYSslQgcHgx^oKLXVovCEt
/vol4/dirA1/dirA11/file-3`PC;tshre0ZzgJAb;xoyDszAHuR76dDAuvp5w@95Y7nFYtWf

Unix limits in OS X

TAGS: None

When running a lot of unix commands via scripts, it’s quite easy to hit the maximum allowable processes per user, or the maximum number of open files per user.

My OS 10.6.8 mac has the following limits defined.


lovebox-4:[~] $ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 266
virtual memory          (kbytes, -v) unlimited

A user can increase these limits directly from the shell using the ulimit command.

ulimit -n

Changes the maximum allowable number of concurrently open files.

ulimit -u

Changes the maximum allowable concurrent processes.

These limits are in place to stop a user from hurting the overall system for other users. Typically on an OS X system – there is only one user, so it’s quite OK to increase the limits. Another thing to know is that the user can only increase the limit up to the maximum allowed by the system. This system maximum can also be changed (by the system administrator / root) – again on most macs that will be the same user. So how to do this?.

First, let’s see what the system limits are using the command

sysctl
 lovebox-4:[~] $ sysctl -a | egrep '(maxfiles|maxproc)'

kern.maxproc = 532
kern.maxfiles = 12288
kern.maxfilesperproc = 102480
kern.maxprocperuid = 266
kern.maxproc: 532
kern.maxfiles: 12288
kern.maxfilesperproc: 20480
kern.maxprocperuid: 512

Now, I want to change those system maximums, so that I can change the user maximums too.

sudo sysctl -w kern.maxfilesperproc=20480
kern.maxfilesperproc: 10240 -> 20480
sudo sysctl -w kern.maxprocperuid=512
kern.maxprocperuid: 266 -> 512

So, now I have higher system limits, I need to tell OS X that I want to use those larger limits. Again, the system allows the user to protect himself by restricting himself to a lower limit.

Next, open .bashrc and put the new limits in there

ulimit -n 1024
ulimit -u 512

You may not be able to set the new limits directly in an existing shell even though the kernel maximums were changed. However a newly executed shell should have the new limits

e.g. In an existing shell

lovebox-4:[~] $ ulimit -n 1024
-bash: ulimit: open files: cannot modify limit: Operation not permitted

In a newly created shell window

lovebox-4:[~] $ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 512
virtual memory          (kbytes, -v) unlimited

So, here’s something non-intuative

lovebox-4:[~] $ ulimit -n 1024  <--- Works because we set 1024 in bashrc
lovebox-4:[~] $ ulimit -n 512  <---- Lower the limit
lovebox-4:[~] $ ulimit -n 1024  <--- Now it cannot be raised in this shell, or any descendents.
-bash: ulimit: open files: cannot modify limit: Operation not permitted

Now, that's sorted out our shells. To do a similar thing but for processes invoked from 'finder' e.g. Web browser or something. We need to mess with launchd...

lovebox-4:[~] $ sudo launchctl limit
	cpu         unlimited      unlimited
	filesize    unlimited      unlimited
	data        unlimited      unlimited
	stack       8388608        67104768
	core        0              unlimited
	rss         unlimited      unlimited
	memlock     unlimited      unlimited
	maxproc     266            532
	maxfiles    256            unlimited

lovebox-4:[~] $ sudo launchctl limit maxproc 1024
lovebox-4:[~] $ sudo launchctl limit
cpu unlimited unlimited
filesize unlimited unlimited
data unlimited unlimited
stack 8388608 67104768
core 0 unlimited
rss unlimited unlimited
memlock unlimited unlimited
maxproc 1024 1024
maxfiles 256 unlimited

^This may not be persistent, and will only be picked up when new processes are launched. In other words, if an already executing process has reached the maximum number of open files – doing the above will not help until the process is re-started.

Performance (Bug) Advocacy

TAGS: None

Courtesy of Cem Kaner you are employed as a performance analyst, then the following probably applies to you too. I consider performance defects as a class of bug, which need to be fixed by some other entity (maybe a programmer, or an architect or sysadmin). This means you’re about to create work for some other person, who might prefer to be watching Star Wars, drinking coffee or spending time with their family.

1. The point of testing is to find bugs.

2. Bug reports are your primary
work product.
This is what people outside
of the testing group will most notice and most
remember of your work.

3. The best tester isn’t the one who finds the most bugs or who embarrasses the most programmers. The best tester is the one who gets the most bugs fixed.

4. Programmers operate under time constraints and competing priorities. For example, outside of the 8- hour workday, some programmers prefer sleeping and watching Star Wars to fixing bugs.

A bug report is a tool that you use to sell the
programmer on the idea of spending her time
and energy to fix a bug.

Go here for the full text.. I
———————————————————————–

Issues with Intel PRO/1000 and tcpdump / windump.

Tags:

Today I was trying to run a packet trace to capture iSCSI traffic from a Windows box to one of our filers. The trace would only show iSCSI Read’s and no iSCSI Writes, even though I knew that there was write traffic. A packet trace taken from the filer using pktt showed the Reads and Writes I expected – which indicated something strange at the Windows end. It turned out that the cause was TCP/IP offloading. The intel NIC was configured to do the offloading, and evidently ‘hid’ some of the network activity from windump (a tcpdump implementation on Windows). Once I turned off the offloading from inside Windows using the “Advanced” Tab inside the NIC Properties window I was able to see the iSCSI reads and Writes at the Windows end.

My best guess is that the issue is to do with where in the network stack the WinPCAP module is inserted. It seems that some traffic was routed ‘around’ wherever in the stack WinPCAP was listening.

Munge columnar into one datafile (for processing in a spreadsheet, gnuplot or whatever)

Tags: ,

Often we’ll use awk or cut to spit out a single column of data from a bunch of different files, or maybe from the same file… anyhow once we have these files with one column of data, sometimes we’ll want to splice those files together so we can look at each line (typically a stat of some sort) next to each other. Also it’s very convenient if we want to import the data into a spreadsheet.

Anyhow, let’s say I have three files iscsi_080626.txt greads.out.txt idle.out.txt. I can put them together into one file using the ‘pr’ command like so.

iscsi file

bash-3.00$ head iscsi_080626.txt 32782319.8632793022.9332734064.2332719233.7432652552.7532815570.70

gread file
bash-3.00$ head greads.out.txt

0.000.000.000.000.000.000.000.000.000.00

idle file

bash-3.00$ head idle.out.txt 248.72214217.66291517.19502416.20896915.85643217.06451916.87494817.12295617.19130516.920703

bash-3.00$ pr -m -t iscsi_080626.txt greads.out.txt idle.out.txt > iscsi_greads_idle.txt

And the output file looke like this

bash-3.00$ head iscsi_greads_idle.txt 32782319.86             0.00                    248.72214232793022.93             0.00                    17.66291532734064.23             0.00                    17.19502432719233.74             0.00                    16.20896932652552.75             0.00                    15.85643232815570.70             0.00                    17.06451932801494.63             0.00                    16.87494832696942.10             0.00                    17.12295632715608.64             0.00                    17.19130532835538.38             0.00                    16.920703

Daves place.

TAGS: None

Directions to Daves place in Cary.
[googlemaps http://maps.google.com/maps?f=d&hl=en&geocode=&saddr=919+N+Columbia+St,+Chapel+Hill,+NC+27516&daddr=104+Franklin+Chase+Court&sll=35.921065,-79.057145&sspn=0.011399,0.014441&ie=UTF8&ll=35.85695,-78.89953&spn=0.2256,0.32374&output=embed&s=AARTsJp-HhRu7qC6b33hz8g9GT3vMBNNOQ&w=425&h=350]

Editing long lines in vi

TAGS: None

<pre>
:set nowrap
</pre>

That’s it.

Blogged with the Flock Browser

FAST ’07 Technical Sessions

TAGS: None

Super Furry Animals at Cats Cradle

TAGS: None

We had the pleasure Super Furry animals in Chapel Hill this week, Pete Brooker was over from the UK and so we had a small crowd of neighbors and NetAppers with us.

Blogged with Flock

Welcome to Hotel Retro.

TAGS: None

I’ve been thinking for a while that I should wean myself off of expensive hotels especially since I am now paying for them myself more often than not these days. So it was um, ‘handy’ that our lovely travel agents have me checked into what can only be described as hotel-retro. Boasting a pre bubble-era design aesthetic it is… the ‘anti-W’. Instead of psuedo-groovy deep house in the lobby we have at hotel-retro a pleasing blend of take-away food and industrial cleaner (Mmmm clean-but-greasy). In keeping with the 80′s chic The Quality-Inn Sunnyvale (Persian Dr) provides the pampered traveller the nostalgic feel of 56K modems via the “Worlds Slowest Broadband”.

© 2009 dotplan. All Rights Reserved.

This blog is powered by Wordpress and Magatheme by Bryan Helmig.