Description
Mime is used to enable parsing documents with mime types
other than text/plain,
text/html or text/xml, which
have built-in parsers.
Processing of documents with other mime types is possible
with help of
external parsers -
external programs which convert documents of arbitrary types
to the above types natively supported by mnoGoSearch.
The from_mime and
to_mime parameters are standard mime types.
to_mime should be one of the natively supported types (listed above)
and can optionally have the charset= part.
If the charset= part is omitted,
the parser output is considered to be in
LocalCharset.
By default, when executing a parser, indexer sends data
to its STDIN and reads results from its STDOUT.
Some parsers can not operate on STDIN and need a file.
The command line parameter can have $1
reference which stands for a temporary file name.
If $1 is specified, indexer creates a temporary
file, writes the input data to it, and substitutes the temporary
file in the parser command line instead of the $1 reference.
Command line can also use variables,
for example ${URL} or ${Content-Type}.
See the list of all available variables in indexer -v6 output,
in the lines having the "Response." prefix.
The fourth parameter source is optional.
It can specify what kind of data is sent to the parser.
By default, indexer sends raw document content.
With help of the source parameter you
can mix document content with other kind of data,
for example, its URL or some HTTP header,
using the same notation with the command line parameter.
Raw content is available as ${HTTP.Content}.
Note:
To make ${HTTP.Content} available, use Section HTTP.Content 0 0
command.