dotplan

troubleshooting & performance analysis

Quick search & replace in enormous files using sed

Tags: ,

I have a very large file, which is full of directory listings. The file is around 500mb. I want to do a simple replace of “vol2″ to become “vol4″. Using vi or vim would require loading the entire file into memory (and is more difficult to script).

The file is called “vol2_first_40k.out”, and the format of the file looks like this

[root@unix tmp]# head vol2_first_40k.out
/vol2/dirA1/dirA11/file-bCOST@0paeRg`FIbONu]Tn_`n2BNsE\N5HuCaa7cmXw0H[K_n
/vol2/dirA1/dirA11/file-Vzi5oCec@]7cw2bQmBATD4Sf;4[\;EdPWMvO>JX5hvhY58dC@
/vol2/dirA1/dirA11/file-V?4LB<^SrhdT^CfGSod5d0eC8vvPf[QMa1wfcUz]:5O_TIett
/vol2/dirA1/dirA11/file-[tM_^[]dykcavgBtuD6Eo@_>8veEJNsKF5nYJ@^uo4:vXz6dA
/vol2/dirA1/dirA11/file-]cbNkxo_]tS2bTOC;TzxIv^>kk6?mazmN7BPE[jYeYD3<=YQX
/vol2/dirA1/dirA11/file-XyA5eT;KV8^wGsk1eCnDG\@4jPS0?OtP3IPAa[TfXU@jT`cF?
/vol2/dirA1/dirA11/file-O@YLYLL6sDqYXG4L=Sg477CTndn_;wF\GB0MmafH\Na^KEuzW
/vol2/dirA1/dirA11/file-xm<9mzj:MVmKE`IGCzs:lU8]7bWlmPxIkrVUDPPNddGdXCk5S
/vol2/dirA1/dirA11/file-agaj]\8_yMz5V@hLuVFm94e6JD0QYSslQgcHgx^oKLXVovCEt
/vol2/dirA1/dirA11/file-3`PC;tshre0ZzgJAb;xoyDszAHuR76dDAuvp5w@95Y7nFYtWf

Using sed, I can do the search/replace and send the output to a new filename e.g. vol4_first_40k.out I use the simplest sed command ‘s’ which just means substitute. Here I am substituting each instance of the string “vol2″ with the string “vol4″. Often, very simple search/replace is all that’s needed. Anything more complex, is often easier to do (for me at least) in something like Python.

[root@unix tmp]# sed "s/vol2/vol4/" vol2_first_40k.out  > vol4_first_40k.out

Or to test the replacement, before operating on the entire file


[root@unix tmp]# sed "s/vol2/vol4/" vol2_first_40k.out |head
/vol4/dirA1/dirA11/file-bCOST@0paeRg`FIbONu]Tn_`n2BNsE\N5HuCaa7cmXw0H[K_n
/vol4/dirA1/dirA11/file-Vzi5oCec@]7cw2bQmBATD4Sf;4[\;EdPWMvO>JX5hvhY58dC@
/vol4/dirA1/dirA11/file-V?4LB<^SrhdT^CfGSod5d0eC8vvPf[QMa1wfcUz]:5O_TIett
/vol4/dirA1/dirA11/file-[tM_^[]dykcavgBtuD6Eo@_>8veEJNsKF5nYJ@^uo4:vXz6dA
/vol4/dirA1/dirA11/file-]cbNkxo_]tS2bTOC;TzxIv^>kk6?mazmN7BPE[jYeYD3<=YQX
/vol4/dirA1/dirA11/file-XyA5eT;KV8^wGsk1eCnDG\@4jPS0?OtP3IPAa[TfXU@jT`cF?
/vol4/dirA1/dirA11/file-O@YLYLL6sDqYXG4L=Sg477CTndn_;wF\GB0MmafH\Na^KEuzW
/vol4/dirA1/dirA11/file-xm<9mzj:MVmKE`IGCzs:lU8]7bWlmPxIkrVUDPPNddGdXCk5S
/vol4/dirA1/dirA11/file-agaj]\8_yMz5V@hLuVFm94e6JD0QYSslQgcHgx^oKLXVovCEt
/vol4/dirA1/dirA11/file-3`PC;tshre0ZzgJAb;xoyDszAHuR76dDAuvp5w@95Y7nFYtWf

Invalidate Linux page cache

Tags:

Issue sync first, to flush dirty pages back to the backing store

sync

Issue this command to invalidate the cache

echo 3 > /proc/sys/vm/drop_caches

This kernel.org page has an interesting list of the various settable values for the linux page cache. http://www.kernel.org/doc/Documentation/sysctl/vm.txt

drop_caches

Writing to this will cause the kernel to drop clean caches, dentries and
inodes from memory, causing that memory to become free.

To free pagecache:
	echo 1 > /proc/sys/vm/drop_caches
To free dentries and inodes:
	echo 2 > /proc/sys/vm/drop_caches
To free pagecache, dentries and inodes:
	echo 3 > /proc/sys/vm/drop_caches

Attach MIME from command line with mutt

Tags: , ,

I need to send a MIME encoded email ( a png file) to myself. uuencoded files no longer play well with Outlook 2011 on Mac (no idea why). Thankfully we can replace mailx with mutt to do the MIME encoding for us

diskperf-3650-2:[/tmp] $ mutt -s “From mutt” -a /u/little/charts/spc1_csc_cam.png little@netapp.com

Mutt requires the < /dev/null to avoid going into interactive mode. Presumably it checks to see if input is from stdin and goes interactive if it is. The redirection could also point to some boilerplate text “This is the chart you requested” but redirecting

  • Author: gary
  • Published: Feb 2nd, 2012
  • Category: emacs
  • Comments: None

How to get rid of subscript annoyance in org-mode.

Tags: ,

By default, a string like hello_world, sill be translated to hello{subscript}world when org exports to html. This can be annoying. Luckily we can turn the bahavior off, by using the #+OPTIONS “macro”


#+OPTIONS:   H:3 num:t toc:t \n:nil @:t ::t |:t -:t f:t *:t <:t ^:{}

The actual magic to turn of subscripts is

^:{}

Which says, "interpret hello_{world}" as a subscript directive, but not "hello_world"

© 2009 dotplan. All Rights Reserved.

This blog is powered by Wordpress and Magatheme by Bryan Helmig.