Add parser definition commands. These commands have the following format with three arguments:
Mime <from_mime> <to_mime> <command line>
For example, the following command defines a parser for man pages:
# Use deroff for parsing man pages ( *.man )
Mime application/x-troff-man text/plain deroff
This parser will take data from STDIN and output
results to STDOUT.
Some parsers can not operate on STDIN and require a file to read from.
In this case indexer can create a
temporary file in /tmp and remove the file when the parser is done.
Use the $1
macro in the parser command line to substitute the temporary file name. For example,
the Mime command for the catdoc
MS-Word-to-text converter can look like this:
Mime application/msword text/plain "/usr/bin/catdoc -a $1"
If your parser writes the result
into an output file, use the $2 macro.
indexer will replace $2
with the output temporary file name, then start the parser,
read the result from this temporary file and delete the file. For example:
Mime application/msword text/plain "/usr/bin/catdoc -a $1 >$2"
The parser above will read data
from the first temporary file and write results to the second file. Both
temporary files will be deleted after reading parser results. Note that
this command is effectively the same with the previous example. They
only differ in the execution method used by indexer:
file-to-STDOUT versus file-to-file.