|
|
| |
WordType(3) |
FreeBSD Library Functions Manual |
WordType(3) |
WordType - defines a word in term of allowed characters, length etc.
Only called thru WordContext::Initialize()
WordType defines an indexed word and operations to validate a word to be
indexed. All words inserted into the mifluz index are Normalize
d before insertion. The configuration options give some control over the
definition of a word.
For more information on the configuration attributes and a complete list of
attributes, see the mifluz(3) manual page.
- wordlist_locale <locale> (default C)
- Set the locale of the program to locale for more information.
- wordlist_allow_numbers {true|false} <number> (default
false)
- A digit is considered a valid character within a word if this
configuration parameter is set to true otherwise it is an error to
insert a word containing digits. See the Normalize method for more
information.
- wordlist_mimimun_word_length <number> (default 3)
- The minimum length of a word. See the Normalize method for more
information.
- wordlist_maximum_word_length <number> (default 25)
- The maximum length of a word. See the Normalize method for more
information.
- wordlist_allow_numbers {true|false} <number> (default
false)
- A digit is considered a valid character within a word if this
configuration parameter is set to true otherwise it is an error to
insert a word containing digits. See the Normalize method for more
information.
- wordlist_truncate {true|false} <number> (default true)
- If a word is too long according to the wordlist_maximum_word_length
it is truncated if this configuration parameter is true otherwise
it is considered an invalid word.
- wordlist_lowercase {true|false} <number> (default true)
- If a word contains upper case letters it is converted to lowercase if this
configuration parameter is true, otherwise it is left untouched.
- wordlist_valid_punctuation [characters] (default none)
- A list of punctuation characters that may appear in a word. These
characters will be removed from the word before insertion in the
index.
- int Normalize(String &s) const
- Normalize a word according to configuration specifications and builtin
transformations. Every word inserted in the inverted index goes
thru this function. If a word is rejected (return value has
WORD_NORMALIZE_NOTOK bit set) it will not be inserted in the index. If a
word is accepted (return value has WORD_NORMALIZE_OK bit set) it will be
inserted in the index. In addition to these two bits, informational values
are stored that give information on the processing done on the word. The
bit field values and their meanings are as follows:
- WORD_NORMALIZE_TOOLONG
- the word length exceeds the value of
the wordlist_maximum_word_length configuration parameter.
- WORD_NORMALIZE_TOOSHORT
- the word length is smaller than the value of
the wordlist_minimum_word_length configuration parameter.
- WORD_NORMALIZE_CAPITAL
- the word contained capital letters and has been converted
to lowercase. This bit is only set
if the wordlist_lowercase configuration parameter
is true.
- WORD_NORMALIZE_NUMBER
- the word contains digits and the configuration
parameter wordlist_allow_numbers is set to false.
- WORD_NORMALIZE_CONTROL
- the word contains control characters.
- WORD_NORMALIZE_BAD
- the word is listed in the file pointed by
the wordlist_bad_word_list configuration parameter.
- WORD_NORMALIZE_NULL
- the word is a zero length string.
- WORD_NORMALIZE_PUNCTUATION
- at least one character listed in
the wordlist_valid_punctuation attribute was removed
from the word.
- WORD_NORMALIZE_NOALPHA
- the word does not contain any alphanumerical character.
- static String NormalizeStatus(int flags)
- Returns a string explaining the return flags of the Normalize method.
Loic Dachary loic@gnu.org
The Ht://Dig group http://dev.htdig.org/
htdb_dump(1), htdb_stat(1), htdb_load(1), mifluzdump(1), mifluzload(1),
mifluzsearch(1), mifluzdict(1), WordContext(3), WordList(3), WordDict(3),
WordListOne(3), WordKey(3), WordKeyInfo(3), WordDBInfo(3), WordRecordInfo(3),
WordRecord(3), WordReference(3), WordCursor(3), WordCursorOne(3),
WordMonitor(3), Configuration(3), mifluz(3)
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |