Configuring mnoGoSearch Web Configurator
To make the configuration options available, you have to start Service first. Click the Start service link at the bottom of
mnoGoSearch Web Configurator page.
This section covers main database configuration, including Database Type, Data source and Word storage mode.
To configure database click Edit link in Database section.
In Enter Database Settings page select DSN you created to store indexer information, select Database type you wish to use.
Select Database mode (i.e. database word storage mode) (for details see Chapter 17., Storage modes). Select User name and Password.
Now press OK button.
Configure more detailed mnoGoSearch parameters here. By clicking Edit link you can configure the following options.
Document size limit: maximum size of a document that can be indexed in bytes. Any document larger than this limit will be ignored by the indexer.
The default value is 1048576 bytes, i.e. 1 MB.
Local charset: the local character set used by your server.
Force 125: this option is useful for users which deal with Cyrillic content and broken (or misconfigured?) Microsoft IIS web servers,
which tends to not report charset correctly. This option is turned on it is assumed that all servers which reports as 'Microsoft'
or 'IIS' have content in Windows-1251 charset.
Word length: specify the range of word length mnoGoSearch will index. Any words above or below the limit will not be indexed.
Include numbers in index: specify whether to index numbers or not.
Include num/char sequences in index: specify whether to index words containing both letters and numbers or not.
Get ISpell data from: specify whether to get ISpell data from text files or from a database. See Chapter 15., Ispell for details.
Use crosswords: this feature allows to assign words between <a href="xxx"> and </a> also to a document this link leads to. It works in SQL
database mode and is not supported in built-in database and Cache mode.
Use phrases: enable phrase search support.
Click Edit link to change indexer settings.
Delete URL from table if it does not match server rules: delete from the database those URLs that are not listed in "Servers".
Detect clones: detect documents that are identical but are stored in various locations and index only one of them.
User agent: the User-Agent string is the text that programs use to identify themselves to HTTP, mail and news servers, for usage tracking
and other purposes. You can specify any string you wish.
Store valid words: store documents that are found in dictionary.
Store invalid words: store documents that are not found in dictionary.
You can specify text files containing stop words. These are the words that are not taken into account during indexing and
later searching. Click Add link to specify a stop words file and Clear link to remove all files from the list.
You can specify custom HTTP header that mnoGoSearch sends with its request to HTTP server. E.g. you can request server to
return only pages in specific language if possible.
In this page you can specify which documents to index and which to ignore.
Please note that the higher a rule is in the list the more priority it is given.
Press Append button to add a new rule, i.e. assign additional file types to be indexed.
In Edit Document page select Command type: Disallow to exclude certain file types from indexing, Allow to include certain file types.
Use Check only to specify the file to be checked only for existence and not to be downloaded. It is useful for zip, exe, arj and other binary
Use Href only to scan a HTML page for "href" tags but not to index the contents of the page with URLs that match (don't match) given argument.
When indexing large mail list archives for example, the index and thread index pages (like mail.10.html, thread.21.html, etc.) should be scanned for links but shouldn't be indexed.
In Expression type use Reg.expr to choose regular expression comparison and String to choose string with wildcards comparison. Wildcards are '*' for any number of characters and '?' for one character. Note
that '?' and '*' have special meaning in "String" match type. Please, use "Reg.expr" to describe documents with '?' and '*'
signs in URL.
"String" match is much faster than "Reg.expr". Use "String" where it is possible.
Specify whether documents are to Match or do Not Match given arguments.
Use Characters Case to select case insensitive or case sensitive comparison.
In Command string enter masks for files you wish to Allow/Disallow.
In case you wish to insert a command before the specific item and not on the top of the list, use Ins link instead of Append
button. Click ED link to edit specific command, DL link to delete it and Clear button to delete all the entries from the list.
With UP and DN links move the selected item up and down the list. You can always revert to default list by pressing the Default
mnoGoSearch automatically adds one Allow String * command after reading config file. It means that allowed everything that is not disallowed.
This command associates file name extensions (for services that don't automatically include them) with their mime types.
Use optional first two parameters to choose comparison type. You may use '?' and '*' wildcards for one and several characters
You may also use quotes in mime type definition for example to specify charset. e.g. Russian webmasters often use *.htm extension
for windows-1251 documents and *.html for unix koi8-r documents: "text/html; charset=koi8-r" for *.html and "text/html; charset=windows-1251" for *.htm.
Default unknown type for other extensions: application/unknown *.*
With this page you can specify files containing dictionaries or affixes to be used when mnoGoSearch is run with ISpell support.
To add a file press Append button. On Edit Ispell page select the type of file you wish to add (Spell for spelling dictionary, or Affix for affix table). Select language and file itself. For additional information on ISpell support in mnoGoSearch please refer
to Chapter 15., Ispell
mnoGoSearch can use external parsers to index different file types (mime types).
Parser is any executable program which converts one of the mime types to text/plain or text/html. For example, if you have
postscript files, you can use ps2ascii parser (filter), which reads postscript file and produces ascii.
mnoGoSearch supports the type of parsers which can read data from a file and send result to a file.
To add a new parser to the list, press the Append button and enter Source type, e.g. application/pdf for Adobe's Portable Document Format, then select destination format with the Result type menu. In Command box enter command
line to use with the parser, e.g. pdftotext.exe $1 > $2. Press OK button to finish.
See Chapter 16., External parsers for more examples.
With this page you can configure the starting page of the server you wish to index. It is useful if the main page of it (e.g.
index.html) does not contain enough links and there is another page on the server with most of the links (e.g. main.html). Press Append button and enter URL of the starting page.
This is the main page to configure servers you wish to index and server-specific indexing options. To add a new server press
Access parameters in Server settings
In Name field enter description for the server. URL - enter your server URL. Alias - enter your server alias, see Chapter 10., URL aliases for details. In Type menu select match type for your server or alias. Select follow mode: Page to index only one document, Path to index documents stored in the specified path only, Site to index all the documents belonging to current site and World to make indexer follow all external links. Use case menu to specify case sensitive or case insensitive comparison. If the
server access is restricted, enter Login and Password to access the server.
Proxy parameters in Server settings
If you use a proxy server on your network, enter Proxy Address, Port, Proxy Login and Proxy Password.
Options parameters in Server settings
Configure server-specific indexing options here.
In Tag field you may assign a specific tag for the server, see Chapter 18., Tags for details.
Specify a category of the server by filling the Category field, see Chapter 19., Categories for details.
In Period field enter expiration period for the documents stored on the server in seconds (the default value is 604800 which stands for 1 week).
Limit number of Hops indexer may use to access the server.
Select Charset of the documents stored on the server.
Choose Default language of the server.
Specify Timeout for connection to the server in seconds, limit number of network errors (Net errors limit) and specify delay
in seconds after network error occurs (Net errors delay).
If you want indexer to index ID3 information found in mp3 files, i.e. song title, artist name, etc., check the Check mp3 tag
flag. Note that you have to allow indexing mp3 files first in the section called “Documents tab”.
Check Check only mp3 tag flag to index only data found in ID3 tags of mp3 files, and scan other documents for links only.
Modes parameters in Server settings
Index: check this box to index server, uncheck to grab links only.
Delete bad URLs: delete URLs with no corresponding document from database.
Robots: check robots.txt file on server.
Weights parameters in Server settings
You can specify what parts of a document are to be indexed and taken into account in weight calculation. Body, title, keywords,
description are supported.
Use this page to configure indexing options.
By pressing Edit link you can configure the following options.
If you leave Add config URL's unchecked, indexer will process only those links that are already in the database, as previously indexed. Even if you add
or remove Servers, this will not affect indexing. To index new URL's you may have added to the Servers list, check this option.
Check Expired first to start indexing with expired documents.
Check Force reindex to index all documents irrelevant if they are expired or not.
You may specify Delay in seconds between indexing each document.
Select number of Threads to start to use when indexing documents. More threads may require higher bandwidth.
Check Use log file and specify Log file name to store indexer messages in a text file.
In Log mode menu select type of messages to be written in log file.
Use these options to limit indexing according to any of the given parameters.
Filtering by status means limiting indexing to URLs with specific HTTP status, i.e. the HTTP code returned by the server.
0 is for "not indexed", for other codes see Chapter 14., HTTP codes.
You may limit indexing by URL. To do so, enter the URL you wish to limit indexing to.
To limit indexing by Tag and by Category, enter corresponding tag or category in the appropriate fields. Tag and Category
can be assigned to a server, see the section called “Options parameters in Server settings” for details.
Here you can perform a test search.
Simply enter keywords in the Search for field and press Search! button. You may select number of results to display on one page (Results per page menu), Output format, Search mode (Match, Search for, through, in menus). The results are displayed below.
You may select different front-end to use in searching. To do so, select the desired front-end in Path to search.exe field.
It is possible to use several templates to display search results. In case multiple templates are used, every one of them
is assigned an ID number. Press Templates button to add templates and assign unique ID number to each template.
Select ID of template to display search results with the Template drop-down menu.