|
|
| |
Index(3) |
User Contributed Perl Documentation |
Index(3) |
Search::OpenFTS::Index - Provides functions for indexing
my $fts=Search::OpenFTS::Index->new( DBI );
my $fts=Search::OpenFTS::Index->new(
DBI, prefix );
my $fts=Search::OpenFTS::Index->init(
dbi=>DBI,
txttid=>NAME_TXT_ID,
dict=>[DICT1, DICT2, ...],
parser=>PARSER,
map=>'{IDTYPELEXEM1=>[IDDICT1, ...], ...}',
tsvector_field=>FIELD_NAME,
ignore_id_index=>"IDTYPELEXEM1 [IDTYPELEXEM2 [...]]",
ignore_headline=>"IDTYPELEXEM1 [IDTYPELEXEM2 [...]]",
prefix=>PREFIX );
This is the initialization function. It is called only once, at
the creation of a new search index, to create the configuration and indexing
tables.
- txttid
- The table where the documents are stored together with its primary key
(e.g. messages.msg_id)
- dict
- List of available dictionaries. Dictionaries should support three methods:
lemms, is_stoplexem, drop and init. init is used for the initialization of
the dictionary. lemms returns an array of lexems for a given word and
is_stoplexem answers whether the given lexeme corresponds to a stop word
or not. drop is used for clearing dictionaries tables (if any) while
dropping OpenFTS instance. Methods is_stoplexem, drop and init are
optional.
- parser
- The full name of the parser in use. Parser should have the same interface
as Search::OpenFTS::Parser module.
- map
- A mapping from types of lexemes to dictionaries. This is helpful for
optimizing the search engine and it is also helpful for indexing
multi-languages or exotic-text documents.
- tsvector_field
- The field name that holds the text index of integers for each document.
This field must have tsvector type( from contrib/tsearch )
- ignore_id_index
- Type IDs of lexemes to ignore while indexing documents.
- ignore_id_headline
- Type IDs of lexemes to ignore while constructing headlines of the search
results.
- prefix
- If more than one content tables require indexing and searching
functionality the user can pass a special parameter named prefix which is
a character value from a-z. The given prefix is used, as a naming
convention, to create different instances of the configuration and
indexing table.
To specify dictionary which requires parameters (snowball
stemmer, for example), use following syntax:
dict=>[
# example how to use snowball stemmer
{ mod=>'Search::OpenFTS::Dict::Snowball', param=>'{lang=>"english"}' },
'Search::OpenFTS::Dict::UnknownDict',
]
- index( $txt_id, [ $FH | $text | $reftext ] );
- index( $txt_id, [ $FH | $text | $reftext ], $title );
- Used for indexing text.
- delete ( $txt_id )
- Deletes all records of the given identifier.
- create_index
- create_index(1);
- Creates indices for fast searching, non-zero option - verbose mode
- drop_index()
- Removes all indices on tables correspoding current instance of OpenFTS.
Any error are ignored, only warn. This method is opposite for
create_index. This is usefull for bulk uploading.
- drop()
- Removes all tables correspoding current instance of OpenFTS. Any error are
ignored, only warn.
- start_index( $tid )
- Opening a session for indexing
Use:
my $idx =
Search::OpenFTS::Index->new( ... );
my $idx_chunk =
$idx->start_index( ID );
foreach my $f ( glob <*.html> )
{
$idx_chunk->index_chunk( IO::File->new( $f ) );
}
$idx_chunk->flush;
- fix_permissions($user)
- Grant r/o access on indexes and search table to user
$user or to PUBLIC if
$user doesn't specified.
Return TRUE on success or error message if fails. Please,
check return value explicitly for '1' !
Calls fix_permissions for each dictionary if it can.
- index_chunk( [FH|REFTXT|TXT], direction=>[1|-1] )
- index_chunk( [FH|REFTXT|TXT], wclass=>[A|B|C|D] )
- index_chunk( FH, direction=>[1|-1], offset=>$offset,
length=>$length );
- index_chunk( FH, wclass=>[A|B|C|D], offset=>$offset,
length=>$length );
- Adds a part to an index. Option 'direction' is to store compatibility with
old version of OpenFTS. wclass option has defaults 'D'.
- flush
- Dump in base of an index
The OpenFTS Primer ( see doc/ subdirectory )
The Crash-course to OpenFTS ( in examples/ subdirectory )
perldoc Search::OpenFTS::Search
perldoc Search::OpenFTS::Parser
perldoc Search::OpenFTS::Dict::PorterEng
perldoc Search::OpenFTS::Dict::Snowball
perldoc Search::OpenFTS::Dict::UnknownDict
perldoc Search::OpenFTS::Morph::ISpell
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |