Namazu is a full-text search engine that can be used on the web or as a personal search engine for e-mail or other data. Namazu is also available in a version for Windows. Namazu has two main components: the mknmz command is written in Perl and uses filters to pull search terms from specialized files. Document filters for plain text are supplied as well as some for Excel, Word, HTML, MP3, PDF, and more, although some of these require additional software to work. The filter specification allows us to make new filters. The search engine proper is written in C and is called namazu. Usage from the command line is easy enough: namazu term searches for term which can be a case-insensitive plain word, a regular expression enclosed in forward slashes (/), a negation with not or even a phrase or a group. The possibilities are documented with some examples. My favorite is namazu's ability to search in fields of messages with a +from:user@someplace.com. Namazu supplies a CGI program namazu.cgi which can be set up to provide a web-based search engine for your index. Its interface is simple, but it does supply results blindingly fast. After installation, a

$ mknmz -a -O indexdir $HOME

will crunch through all files below $HOME and it will build an index in the specified directory. This step is tedious and can take time, depending on the amount and sizes of files, but there are some tips for speeding up the process. mknmz's behavior can be influenced with configuration files which limit the types and sizes of documents that should be processed. When finished, statistics are printed to stdout.

[Base]
Date:                Mon Aug 20 23:59:57 2007
Added Documents:     8,401
Size (bytes):        460,773,652
Total Documents:     8,401
Added Keywords:      2,642,835
Total Keywords:      2,642,835
Time (sec):          2,008
File/Sec:            4.18
System:              linux
Perl:                5.008005
Namazu:              2.0.17

Documentation is a bit sparse and difficult to read at times, but it should be sufficient to get us started. Johannes Hofmann wrote a nifty little tool called nmzmail which uses Namazu to index Maildir-type folders via mknmz. The beauty of this tool is the integration into mail readers such as Mutt to produce a very fast index that can be searched for within the MUA.

Flattr this
Mail, Software, Linux, Database, MacOSX, and CLI :: 20 Aug 2007 :: e-mail

Comments

blog comments powered by Disqus