- "new({ -database => $database_object [,'-search_cache_size' =>
1000, -search_cache_dir => '/var/tmp/search_cache', -stringifier =>
['Storable','Data::Dumper'], ] });"
- Provides the interface for obtaining a new Search::InvertedIndex object
for manipulating a inverted database.
Example 1:
my $database = Search::InvertedIndex::DB::DB_File_SplitHash->new({
-map_name => '/www/databases/test-map_names/test',
-multi => 4,
-file_mode => 0644,
-lock_mode => 'EX',
-lock_timeout => 30,
-blocking_locks => 0,
-cachesize => 1000000,
-write_through => 0,
-read_write_mode => 'RDONLY',
});
my $inv_map = Search::InvertedIndex->new({
'-database' => $database,
'-search_cache_size' => 1000,
'-search_cache_dir' => '/var/tmp/search_cache',
-stringifier => ['Storable','Data::Dumper'],
});
Parameter explanations:
-database - A database interface object. Defined database interfaces
are currently Search::InvertedIndex::DB::DB_File_SplitHash
and Search::InvertedIndex::DB::Mysql. (Required)
-stringifier - Declares the stringifier used to store information in the
underlaying database. Currently defined stringifiers are
'Storable' and 'Data::Dumper'. The default is to use
'Storable' with fallback to 'Data::Dumper'. (Optional)
-search_cache_size - Sets the number of cached searched to hold in the search cache (Optional)
-search_cache_dir - Sets the directory to be used for the search cache
(Required if search_cache_size is set to something other than 0)
The -database parameter is required and must be a
'Search::InvertedIndex::DB::...' type database object. The other two
parameters are optional and define the location and size of the search
cache. If omitted, no search caching will be done.
The optional '-stringifier' parameter can be used to override
the default use of 'Storable' (with fallback to 'Data::Dumper') as the
stringifier used for storing data by the module. Specifiying
-stringifier => 'Data::Dumper' would specify using 'Data::Dumper'
(only) as the stringifier while specifiying -stringifier =>
['Data::Dumper','Storable'] would specify to use Data::Dumper by
preference (but to fall back to 'Storable' if Data::Dumper was not
available). If a database was created using a particular serializer, it
will automatically detect it and attempt to use the correct one.
- "lock({ -lock_mode =" 'EX|SH|UN' [, -lock_timeout => 30] [,
-blocking_locks => 0] });>
- Changes a lock on the underlaying database.
Forces 'sync' if the stat is changed from 'EX' to a lower lock
state (i.e. 'SH' or 'UN'). Croaks on errors.
Example:
$inv->lock({ -lock_mode => 'EX' [, -lock_timeout => 30] [, -blocking_locks => 0],
});
The only _required_ parameter is the -lock_mode. The other
parameters can be inherited from the object state. If the other
parameters are used, they change the object state to match the new
settings.
- "status(-open|-lock_mode);"
- Returns the requested status line for the database. Allowed requests are
'-open', and '-lock'.
Example 1:
my $status =
$inv_map->status(-open); # Returns either '1'
or '0'
Example 2:
my $status =
$inv_map->status(-lock_mode); # Returns 'UN',
'SH' or 'EX'
- "update({ -update => $update });"
- Performs an update on the map. This is designed for
adding/changing/deleting a bunch of related information in a single block
update. It takes a Search::InvertedIndex::Update object as input. It
assumes that you wish to remove all references to the specified index and
replace them with a new list of references. It can also will update the
-data for the -index. If -data is passed and the -index does not already
exist, a new index record will be created. It is a fatal error to pass a
non-existant index without a -data parm to initialize it. It is also a
fatal error to pass an update for a non-existant -group.
Passing an empty -keys has the effect of deleting the index
from group (but not from the system).
Example:
my $update = Search::InvertedIndex::Update->new(...);
$inv_map->update({ -update => $update });
It is much faster to update a index using the update method
than the add_entry_to_group method in most cases because the batching of
changes allows for efficiency optimizations when there is more than one
key.
- "preload_update({ -update => $update });"
- 'preload_update' places the passed 'update' object data into a pending
queue which is not reflected in the searchable database until the
'update_group' method has been called. This allows the loading process to
be streamlined for maximum performance on large full updates. This method
is not appropriate to incremental updates as the 'update_group' method
destroys the previous searchable data set on execution.
It also places the database effectively offline during the
update, so this is not a suitable method for updating a 'online'
database. Updates should happen on an 'offline' copy that is then
swapped into place with the 'online' database.
Example:
my $update = Search::InvertedIndex::Update->new(...);
$inv_map->preload_update({ -update => $update });
.
.
.
$inv_map->update_group({ -group => 'test' });
- "clear_preload_update_for_group({ -group => $group });"
- This clears all the data from the preload area for the specified
group.
- "update_group({ -group => $group[, -block_size => 65536]
});"
- This clears the specifed group and loads all preloaded data (updates batch
loaded through the 'preload_update' method pending finalization.
This is by far the fastest way to load a large set of data
into the search system - but it is an 'all or nothing' approach. No
'incremental' updating is possible via this interface - the update_group
completely erases all previously searchable data from the group and
replaces it with the pending 'preload'ed data.
Examples:
$inv_map->update_group({ -group => 'test' });
$inv_map->update_group({ -group => 'test', -block_size => 65536 });
-block_size determines the 'chunking factor' used to limit the
amount of memory the update uses (it corresponds roughly to the number
of line entry items to be processed in memory at one time). Higher
'-block_size's will improve performance until you run out of real
memory. The default is 65536.
Since an exclusive lock should be held during the entire
process, the database is essentially inaccessible until the update is
complete. It is probably inadvisable to use this method of updating
without keeping an 'online' and a seperate 'offline' database and copy
over the 'offline' to 'online' after completion of the mass update on
the 'offline' database.
- "search({ -query => $query [,-cache => 1] });"
- Performs a query on the map and returns the results as a
Search::InvertedIndex::Result object containing the keys and rankings.
Example:
my $query = Search::InvertedIndex::Query->new(...);
my $result = $inv_map->search({ -query => $query });
Performs a complex multi-key match search with boolean logic
and optional search term weighting.
The search request is formatted as follows:
my $result =
$inv_map->search({ -query =>
$query });
where '$query' is a Search::InvertedIndex::Query object.
Each node can either be a specific search term with an
optional weighting term (a Search::InvertedIndex::Query::Leaf object) or
a logic term with its own sub-branches (a Search::Inverted::Query
object).
The weightings are applied to the returned matches for each
search term by multiplication of their base ranking before combination
with the other logic terms.
This allows recursive use of search to resolve arbitrarily
complex boolean searches and weight different search terms.
The optional -cache parameter instructs the database to cache
( if the -search_cache_dir and -search_cache_size initialization
parameters are configured for use) the search and results for
performance on repeat searches. '1' means use the cache, '0' means do
not.
- "data_for_index({ -index => $index });"
- Returns the data record for the passed -index. Returns undef if no
matching -index is in the system.
Example:
my $data =
$self->data_for_index({ -index =>
$index });
- "clear_all;"
- Completely clears the contents of the database and the search cache.
- "clear_cache;"
- Completely clears the contents of the search cache.
- "close;"
- Closes the currently open -map and flushes all associated buffers.
- "number_of_groups;"
- Returns the raw number of groups in the system.
Example: my $n =
$inv_map->number_of_groups;
- "number_of_indexes;"
- Returns the raw number of indexes in the system.
Example: my $n =
$inv_map->number_of_indexes;
- "number_of_keys;"
- Returns the raw number of keys in the system.
Example: my $n =
$inv_map->number_of_keys;
- "number_of_indexes_in_group({ -group => $group });"
- Returns the raw number of indexes in a specific group.
Example: my $n =
$inv_map->number_of_indexes_in_group({ -group
=> $group });
- "number_of_keys_in_group({ -group => $group });"
- Returns the raw number of keys in a specific group.
Example: my $n =
$inv_map->number_of_keys_in_group({ -group
=> $group });
- "add_group({ -group => $group });"
- Adds a new '-group' to the map. There is normally no need to call this
method from outside the module. The addition of new -groups is done
automatically when adding new entries.
Example: $inv_map->add_group({
-group => $group });
croaks if unable to successfuly create the group for some
reason.
It silently eats attempts to create an existing group.
- "add_index({ -index => $index, -data => $data });"
- Adds a index entry to the system.
Example: $inv_map->add_index({
-index => $index, -data =>
$data });
If the 'index' is the same as an existing index, the '-data'
for that index will be updated.
-data can be pretty much any scalar. strings/object/hash/array
references are ok. They will be transparently serialized using Storable
(preferred) or Data::Dumper.
This method should be called to set the '-data' record
returned by searches to something useful. If you do not, you will have
to maintain the information you want to show to users seperately from
the main search engine core.
The method returns the index_enum of the index.
- "add_index_to_group({ -group => $group, -index => $index[,
-data => $data] });"
- Adds an index entry to a group. If the index does not already exist in the
system, adds it to the system as well.
Examples:
$inv_map->add_index_to_group({ -group => $group, '-index' => $index});
$inv_map->add_index_to_group({ -group => $group, '-index' => $index, -data => $data});
Returns the 'index_enum' for the index record.
If the 'index' is the same as an existing key, the
'index_enum' of the existing index will be returned.
There is normally no need to call this method directly.
Addition of index to groups is handled automatically during addition of
new entries.
It cannot be used to add index to non-existant groups. This is
a feature not a bug.
The -data parameter is optional
- "add_key_to_group({ -group => $group, -key => $key
});"
- Adds a key entry to a group.
Example: $inv_map->_add_key({
-group => $group, -key =>
$key });
Returns the 'key_enum' for the key record.
If the 'key' is the same as an existing key, the 'key_enum' of
the existing key will be returned.
There is normally no need to call this method directly.
Addition of keys to groups is handled automatically during addition of
new entries.
It cannot be used to add keys to non-existant groups. This is
a feature not a bug.
- "add_entry_to_group({ -group => $group, -key => $key, -index
=> $index, -ranking => $ranking });"
- Adds a reference to a particular index for a key with a ranking to a
specific group.
Example:
$inv_map->add_entry_to_group({ -group =>
$group, -key => $key,
-index => $index, -ranking =>
$ranking });
This method cannot be used to create new -indexes or -groups.
This is a feature, not a bug. It *will* create new -keys as needed.
- "remove_group({ -group => $group });"
- Remove all entries for a group from the map.
Example: $inv_map->remove_group({
-group => $group });
This removes all key and key/index entries for the group and
all other group specific data from the map.
Use this method when you wish to completely delete a
searchable 'group' from the map without disturbing other existing
groups.
- "remove_entry_from_group({ -group => $group, -key => $key,
-index => $index });"
- Remove a specific key<->index entry from the map for a group.
Example:
$inv_map->remove_entry_from_group({ -group
=> $group, -key =>
$key, -index =>
$index });
Does not remove the -key or -index from the database or the
group - only the entries mapping the two to each other.
- "remove_index_from_group ({ -group => $group, -index => $index
});"
- Remove all references to a specific index for all keys for a group.
Example:
$inv_map->_remove_index_from_group({ -group
=> $group, -index =>
$index });
Note: This *does not* remove the index from the _system_ -
just a specific group.
It is a null operation to remove an undeclared index or to
remove a declared index from a group where it is not used.
- "remove_index_from_all ({ -index => $index });"
- Remove all references to a specific index from the system.
Example:
$inv_map->_remove_index_from_all({ -index
=> $index });
This *completely* removes it from all groups and the master
system entries.
It is a null operation to remove an undefined index.
- "remove_key_from_group({ -group => $group, -key => $key
});"
- Remove all references to a specific key for all indexes for a group.
Example: $inv_map->remove({ -group
=> $group, -key =>
$key });
Returns undef if the key speced was not even in database.
Returns '1' if the key speced was in the database, and has been
successfully deleted.
croaks on errors.
- "list_all_keys_in_group({ -group => $group });"
- Returns an anonymous array containing a list of all defined keys in the
specified group.
Example:
$keys =
$inv_map->list_all_keys_in_group({ -group
=> $group });
Note: This can result in *HUGE* returned lists. If you have a
lot of records in the group, you are better off using the iteration
support ('first_key_in_group', 'next_key_in_group').
- "first_key_in_group({ -group => $group_name });"
- Returns the 'first' key in the -group based on hash ordering.
Returns 'undef' if there are no keys in the group.
Example: my $first_key =
$inv_map->first_key_in_group({-group =>
$group});
- "next_key_in_group({ -group => $group, -key => $key
});"
- Returns the 'next' key in the group based on hash ordering.
Returns 'undef' when there are no more keys in the group or if
the passed -key is not in the group map.
Example: my $next_key =
$inv_map->next_key_in_group({ -group =>
$group, -key => $key
});
- "list_all_indexes_in_group({ -group => $group });"
- Returns an anonymous array containing a list of all defined indexes in the
group
Example: $indexes =
$inv_map->list_all_indexes_in_group({ -group
=> $group });
Note: This can result in *HUGE* returned lists. If you have a
lot of records in the group, you are better off using the iteration
support (first_index_in_group(),
next_index_in_group())
- "first_index_in_group;"
- Returns the 'first' index in the -group based on hash ordering. Returns
'undef' if there are no indexes in the group.
Example: my $first_index =
$inv_map->first_index_in_group({ -group =>
$group });
- "next_index_in_group({-group =" $group, -index =>
$index});>
- Returns the 'next' index in the -group based on hash ordering. Returns
'undef' if there are no more indexes.
Example: my $next_index =
$inv_map->next_index_in_group({-group =>
group, -index => $index});
- "list_all_indexes;"
- Returns an anonymous array containing a list of all defined indexes in the
map.
Example: $indexes =
$inv_map->list_all_indexes;
Note: This can result in *HUGE* returned lists. If you have a
lot of records in the map or do not have a lot memory, you are better
off using the iteration support ('first_index', 'next_index')
- "first_index;"
- Returns the 'first' index in the system based on hash ordering. Returns
'undef' if there are no indexes.
Example: my $first_index =
$inv_map->first_index;
- "next_index({-index => $index});"
- Returns the 'next' index in the system based on hash ordering. Returns
'undef' if there are no more indexes.
Example: my $next_index =
$inv_map->next_index({-index =>
$index});
- "list_all_groups;"
- Returns an anonymous array containing a list of all defined groups in the
map.
Example: $groups =
$inv_map->list_all_groups;
If you have a lot of groups in the map or do not have a lot of
memory, you are better off using the iteration support ('first_group',
'next_group')
- "first_group;"
- Returns the 'first' group in the system based on hash ordering. Returns
'undef' if there are no groups.
Example: my $first_group =
$inv_map->first_group;
- "next_group ({-group => $group });"
- Returns the 'next' group in the system based on hash ordering. Returns
'undef' if there are no more groups.
Example: my $next_group =
$inv_map->next_group({-group =>
$group});