|
|
| |
Genezzo::Index::bt2(3) |
User Contributed Perl Documentation |
Genezzo::Index::bt2(3) |
construct comparison/equality callbacks
my $cmp1 = sub
{
my ($k1, $k2) = @_;
# NOTE: use "spaceship" (-1,0,1) comparison with
# short-circuit OR (which returns 0 or VALUE, not 0 or 1)
# to perform multi-column key comparison
# a la Schwartzian Transform
return (
( ($k1->[0] <=> $k2->[0])
|| ($k1->[1] <=> $k2->[1])) == -1
);
};
my $eq1 = sub
{
my ($k1, $k2) = @_;
return (($k1->[0] == $k2->[0])
&& ($k1->[1] == $k2->[1])
);
};
Genezzo::Index::bt2 - basic btree
A btree built of row directory blocks.
use Genezzo::Index::bt?;
my $tt = Genezzo::Index::btree->new();
$tt->insert(1, "hi");
$tt->insert(7, "there");
This btree algorithm is a bottom-up implementation based upon ideas from Chapter
16 of "Algorithms in C++ (third edition)", by Robert Sedgewick, 1998
and Chapter 15, "Access Paths", of "Transaction Processing:
Concepts and Techniques" by Jim Gray and Andreas Reuter, 1993. The
pedagogical examples use a fixed number of entries per node, or fixed-size
keys in each block, but this implementation has significant extensions to
support variable numbers of variably-sized keys in fixed-size disk blocks,
with the associated error handling, plus support for reverse scans.
This package supports a constructor "new", plus standard b-tree
methods like insert, delete, search.
The "new" constructor takes many arguments, but they are all optional.
If none are specified, the constructor will allocate 100 blocks of the default
size for a b-tree. The default assumption is to support scalar string keys
with a scalar string values. The tree will have a maximum of 50 entries per
node.
- maxsize (default 50)
- The maximum number of entries in a node. If set to zero, the insert will
pack as many entries as space allows in each node
- numblocks (default 100)
- The constructor will allocate a private buffer cache for the b-tree of up
to the number of blocks specified. If numblocks=0, no cache is created. In
this case, the user must create a subclass to overload the make_new_block
and getarr methods.
- blocksize (default DEFBLOCKSIZE)
- The size of each block in the b-tree
- key_type (null by default)
- The key type is either a single scalar "c" (for char) or
"n" (for number), or a ref to an array of "c" and/or
"n" values. If key_type is specified, bt2 finds or constructs
the appropriate compare/equals and pack/unpack functions, overriding any
user-supplied arguments. If key type is not specified, bt2 processes the
insert keys as a scalar strings.
- compare, equal (default string comparison -- ignored if key_type argument
specified)
- Supply methods to compare your key. This package contains special
comparison methods for numeric and multi-column keys, and their associated
packing functions.
- pack_fn/unpack_fn (default single scalar key and value -- ignored if
key_type specified)
- "Packing" functions convert key/value pairs to and from a byte
representation which gets stored in the nodes of the b-tree. The b-tree
package supports scalar keys and values by default. It also contains
methods for multi-column keys with a single value.
- use_IOT (default off)
- special flag for Index Organized Tables, which means the "value"
can be an array, not a scalar. This approach requires a couple extra
checks in the branch nodes, since branches contain (key, nodeid) pairs,
and leaves contain (key, array of values). Normally, indexes only have a
scalar value: a nodeid or a rid.
- unique_key (default off)
- Enforce uniqueness (no duplicates) at insertion time
- use_keycount (default off)
- Special case for building non-unique indexes where the "value"
is null because it is already part of the key vector. In this usage, we
construct a unique index (unique_key=1) where the key vector is the key
columns *plus* the table rid, and the value is null. The key columns might
be duplicates, but the addition of the rid guarantees uniqueness. The
fetch is asymmetric: the table rid is returned as both the last key column
and the value.
Q: Why not just have a non-unique index and store the rids as
regular values? A: This approach clusters related rids, so index scans
are more efficient and deletes are easier. Note that the basic index row
physical storage is unaffected. Only the unpack function needs an extra
argument to describe the number of key columns. Q: But doesn't the extra
comparison for the rid column make inserts more expensive? A: Yes, but
we're trading off insert performance against index scan performance. The
workload of most database applications is typically dominated by
selects, not inserts.
- insert
- delete
- search
- btCLEAR
- hash_key/array_offset iterators: FIRSTKEY, NEXTKEY, FETCH, plus reverse
iterators LASTKEY, PREVKEY.
- DBI-style search interface: SQLPrepare, Execute, Fetch
- hkey/offset functions: should be able to convert between different
"place" formats (Array and Hash prefixes), like the common fetch
routine, or ASSERT that prefix matches.
- add reverse scan to search/SQLFetch
- support multicol keys, non-unique keys (via combo of key + rid as
unique)
- support transaction unique constraints -- probably via treat key+rid as
unique, then turn on true unique key, and scan for duplicates?
- find out why can't do pctfree=0
- Work on RDBlk_NN support.
- search with startkey/stopkey support, vs supplying compare/equal methods.
restricting the search api to straight "=","<"
comparisons means can try the estimation function
- need to handle partial startkey/stopkey comparison in searchR/SQLFetch for
multi-col keys
- semantics of nulls in multi-col keys -- sort low?
- simplify _pack_row with splice and a supplied split position, something
like -1 for normal indexes (n-1 key cols, 1 val col, so pop the val) or
"N=?" for index-organized tables (N key cols, M val cols, so
splice N)
- reorganize along the lines of "GiST" Generalized Search Trees
(Paul Aoki, J. Hellerstein, UCB)
- ecount support?
Jeffrey I. Cohen, jcohen@genezzo.com
perl(1).
Copyright (c) 2003, 2004 Jeffrey I Cohen. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
Address bug reports and comments to: jcohen@genezzo.com
For more information, please visit the Genezzo homepage at
<http://www.genezzo.com>
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |