2009-11-30 22:26Which Linux applications are named after dictionary words?Every now and then I find my mind gets caught on some seemingly trivial observation, and I end up following a chain of thought tangential to the one I was originally on, until I arrive at somewhere quite unexpected. Whereas people in former times may have been unable to travel too far down these intellectual rabbit holes, we now live in a world where Google and Wikipedia have made us seemingly omniscient, and hypertext in particular allows us to jump from one idea to the next, wherever our curiosity takes us. The secondary limit, I suppose, would be the ability to process all of this information that we amass while browsing the Web. As a programmer, though, there are certain options for information processing which are open to me but would not be readily available to non-programmers, and even if what I do with the processed information isn’t particularly ground-breaking, it can at least be the subject of a new blog post. As the title of this post suggests, my most recent such endeavour involved looking at Linux application names, and dictionary words, and below I explain what I found and how I found it. Data sourceDebian can be thought of as a meta-program, in that it is a collection of programs where each one is a unit of data in the meta-program, and this data can be manipulated by functions in the meta-program, which are themselves programs. Just as Iceweasel has an update mechanism, so does the Debian meta-program, and just as Apache has configuration files, so does the Debian meta-program. Most relevantly, just as Kate can list plugins which are available to extend its functionality, so can Debian list “plugins” (in fact packages) which can extend its functionality. As Debian is a multi-platform program, one way to list all these plugins is on the command line, using the command apt-cache dump | grep “^Package” | sort. The output of that command starts something like: Package: 2vcard Package: 3dchess Package: 4g8 Package: 6tunnel Package: 7z Package: 915resolution Package: 9base Package: 9fonts Package: 9menu Package: 9mount but when you get to a section like this: Package: an Package: anacron Package: anagramarama Package: analog Package: anarchism Package: and Package: angband Package: angband-doc Package: angrydd Package: animals you might start to wonder just how keen software developers are to name their projects after English words. I did, at least, which is why I wrote the following one-liner. One-linerThe one-liner I actually ended up using was this: apt-cache dump | grep "^Package" | sort | cut -c 10- | sed ‘a/etc/dictionaries-common/words’ | xargs -n 2 grep -i -x -m 1
In case you don’t know or can’t guess what the cut command does, I’ll tell you that it removes (in this example) the first 10 characters of each line, leaving just the name of each package, one per line. To find out which of these are in the dictionary, though, a command like this is needed for each package name: grep -i -x -m 1 test /etc/dictionaries-common/words
This command looks in the word list file for the first (-m 1) line containing only (-x) “test” case insensitively (-i). If it finds it, it outputs it, and if not, it outputs nothing. The job of the sed in the one-liner, then, is to build a list of pairs of arguments for greps like this, but it does that by taking the list of package names and appending (‘a’) “/etc/dictionaries-common/words” after each line. As it happens, this is a perfectly acceptable input to xargs which can be instructed (-n 2) to take in arguments from stdin two at a time, so the first two arguments / lines are 2vcard and /etc/dictionaries-common/words, which builds the grep: grep -i -x -m 1 2vcard /etc/dictionaries-common/words
as required. ConclusionSo, it turns out there are actually 881 applications with a name that is in the word list. If you don’t run the one-liner with | wc -l at the end to count the number of lines, you will probably see a long pause in the output at about “less”, but do not be alarmed, it’s just that the list of packages has reached “lib*” and there a lot of libraries that aren’t named after real words. With so many package names that are real words, though, the temptation must be to write a sentence using some of them. Here’s the best collection of them I’ve come up with so far: Hello Kitty! Why visit Istanbul (via Babel) when Switzerland links alpine memories? An apt birthday gift (nice modest shoes) stops greed. Man contacts God: prayer, God contacts man: grace. Open the gate, since most leave. Make love and smile. Can you do better? Leave a comment. Trackbacks
Trackback specific URI for this entry
No Trackbacks
|
QuicksearchCategoriesSyndicate This BlogBlog Administration |