Word Count
Monday, February 8. 2010
Unix (or gnu utils for that matter) are build on the premise: do one thing, and do it well. wc is a very nice tool for counting, however, in practice there are others that might be even more important when dealing with text, numbers and reporting.
For instance, I want to count the number of times each variable in a file is called, nicely sorted and all:
cat test.php | grep -o -E ‘\$[A-Za-z0-9]+’ | sort -r | uniq -c | sort -r -n
display test.php, output to grep, which only displays the strings that match a $, followed by a word (or letters), basically, this is a variable reference in php. After this, sort all output (this is needed for uniq), then, move all output (nicely sorted) through uniq, which count the number of times duplicates are found (it stops counting when it reaches another value). It will display the count in front of the value. After that, the last thing it does, move all output again to a sort, but this time sort it in descending order by naturial sort (so you get 1, 2, 10, 11 instead of 1, 10, 11, 2).
At the end, it will display a list like:
23 $this 12 $a 4 $key 4 $val 3 $i
Although there are other ways of achieving the same output, it shows you how much you can do with basically 3 "simple" gnu commands.
My advice anyway, if you going to work on a cli, learn your commands! (tip: look at find and xargs. It won’t prepare your diner, but it can do pretty much do anything else)
@Lorna If you don’t want to remember "word count" for doing a "line count", you can do your own alias:
alias lc=‘wc -l’
I use lots of alias to speed things up :)
@Joshua nice one line trick!
Joshua: that’s a blog post in itself, thanks for contributing so nicely to my blog :)
minterior: I tend not to use aliases, because I use lots of different machines and they never have the alias I’m looking for – great tip though, thanks!
The grep utility actually has line counting build in, so instead of doing:
shell> grep -R TODO * | wc -l
You could do the following instead:
shell> grep -cR TODO *
Although for some reason I still naturally do the former. :)


