|
NAMESearch::Estraier - pure perl module to use Hyper Estraier search engineSYNOPSISSimple indexeruse Search::Estraier; # create and configure node my $node = new Search::Estraier::Node( url => 'http://localhost:1978/node/test', user => 'admin', passwd => 'admin', create => 1, label => 'Label for node', croak_on_error => 1, ); # create document my $doc = new Search::Estraier::Document; # add attributes $doc->add_attr('@uri', "http://estraier.gov/example.txt"); $doc->add_attr('@title', "Over the Rainbow"); # add body text to document $doc->add_text("Somewhere over the rainbow. Way up high."); $doc->add_text("There's a land that I heard of once in a lullaby."); die "error: ", $node->status,"\n" unless (eval { $node->put_doc($doc) }); Simple searcheruse Search::Estraier; # create and configure node my $node = new Search::Estraier::Node( url => 'http://localhost:1978/node/test', user => 'admin', passwd => 'admin', croak_on_error => 1, ); # create condition my $cond = new Search::Estraier::Condition; # set search phrase $cond->set_phrase("rainbow AND lullaby"); my $nres = $node->search($cond, 0); if (defined($nres)) { print "Got ", $nres->hits, " results\n"; # for each document in results for my $i ( 0 ... $nres->doc_num - 1 ) { # get result document my $rdoc = $nres->get_doc($i); # display attribte print "URI: ", $rdoc->attr('@uri'),"\n"; print "Title: ", $rdoc->attr('@title'),"\n"; print $rdoc->snippet,"\n"; } } else { die "error: ", $node->status,"\n"; } DESCRIPTIONThis module is implementation of node API of Hyper Estraier. Since it's perl-only module with dependencies only on standard perl modules, it will run on all platforms on which perl runs. It doesn't require compilation or Hyper Estraier development files on target machine.It is implemented as multiple packages which closly resamble Ruby implementation. It also includes methods to manage nodes. There are few examples in "scripts" directory of this distribution. Inheritable common methodsThis methods should really move somewhere else._sRemove multiple whitespaces from string, as well as whitespaces at beginning or endmy $text = $self->_s(" this is a text "); $text = 'this is a text'; Search::Estraier::DocumentThis class implements Document which is single item in Hyper Estraier.It's is collection of:
newCreate new document, empty or from draft.my $doc = new Search::HyperEstraier::Document; my $doc2 = new Search::HyperEstraier::Document( $draft ); add_attrAdd an attribute.$doc->add_attr( name => 'value' ); Delete attribute using $doc->add_attr( name => undef ); add_textAdd a sentence of text.$doc->add_text('this is example text to display'); add_hidden_textAdd a hidden sentence.$doc->add_hidden_text('this is example text just for search'); add_vectorsAdd a vectors$doc->add_vector( 'vector_name' => 42, 'another' => 12345, ); set_scoreSet the substitute score$doc->set_score(12345); scoreGet the substitute scoreidGet the ID number of document. If the object has never been registred, "-1" is returned.print $doc->id; attr_namesReturns array with attribute names from document object.my @attrs = $doc->attr_names; attrReturns value of an attribute.my $value = $doc->attr( 'attribute' ); textsReturns array with text sentences.my @texts = $doc->texts; cat_textsReturn whole text as single scalar.my $text = $doc->cat_texts; dump_draftDump draft data from document object.print $doc->dump_draft; deleteEmpty document object$doc->delete; This function is addition to original Ruby API, and since it was included in C wrappers it's here as a convinience. Document objects which go out of scope will be destroyed automatically. Search::Estraier::Conditionnewmy $cond = new Search::HyperEstraier::Condition; set_phrase$cond->set_phrase('search phrase'); add_attr$cond->add_attr('@URI STRINC /~dpavlin/'); set_order$cond->set_order('@mdate NUMD'); set_max$cond->set_max(42); set_options$cond->set_options( 'SURE' ); $cond->set_options( qw/AGITO NOIDF SIMPLE/ ); Possible options are:
Skipping N-grams will speed up search, but reduce accuracy. Every call to "set_options" will reset previous options; This option changed in version 0.04 of this module. It's backwards compatibile. phraseReturn search phrase.print $cond->phrase; orderReturn search result order.print $cond->order; attrsReturn search result attrs.my @cond_attrs = $cond->attrs; maxReturn maximum number of results.print $cond->max; "-1" is returned for unitialized value, 0 is unlimited. optionsReturn options for this condition.print $cond->options; Options are returned in numerical form. set_skipSet number of skipped documents from beginning of results$cond->set_skip(42); Similar to "offset" in RDBMS. skipReturn skip for this condition.print $cond->skip; set_distinct$cond->set_distinct('@author'); distinctReturn distinct attributeprint $cond->distinct; set_maskFilter out some links when searching.Argument array of link numbers, starting with 0 (current node). $cond->set_mask(qw/0 1 4/); Search::Estraier::ResultDocumentnewmy $rdoc = new Search::HyperEstraier::ResultDocument( uri => 'http://localhost/document/uri/42', attrs => { foo => 1, bar => 2, }, snippet => 'this is a text of snippet' keywords => 'this\tare\tkeywords' ); uriReturn URI of result documentprint $rdoc->uri; attr_namesReturns array with attribute names from result document object.my @attrs = $rdoc->attr_names; attrReturns value of an attribute.my $value = $rdoc->attr( 'attribute' ); snippetReturn snippet from result documentprint $rdoc->snippet; keywordsReturn keywords from result documentprint $rdoc->keywords; Search::Estraier::NodeResultnewmy $res = new Search::HyperEstraier::NodeResult( docs => @array_of_rdocs, hits => %hash_with_hints, ); doc_numReturn number of documentsprint $res->doc_num; This will return real number of documents (limited by "max"). If you want to get total number of hits, see "hits". get_docReturn single documentmy $doc = $res->get_doc( 42 ); Returns undef if document doesn't exist. hintReturn specific hint from results.print $res->hint( 'VERSION' ); Possible hints are: "VERSION", "NODE", "HIT", "HINT#n", "DOCNUM", "WORDNUM", "TIME", "LINK#n", "VIEW". hintsMore perlish version of "hint". This one returns hash.my %hints = $res->hints; hitsSyntaxtic sugar for total number of hits for this queryprint $res->hits; It's same as print $res->hint('HIT'); but shorter. Search::Estraier::Nodenewmy $node = new Search::HyperEstraier::Node; or optionally with "url" as parametar my $node = new Search::HyperEstraier::Node( 'http://localhost:1978/node/test' ); or in more verbose form my $node = new Search::HyperEstraier::Node( url => 'http://localhost:1978/node/test', user => 'admin', passwd => 'admin' create => 1, label => 'optional node label', debug => 1, croak_on_error => 1 ); with following arguments:
set_urlSpecify URL to node server$node->set_url('http://localhost:1978'); set_proxySpecify proxy server to connect to node server$node->set_proxy('proxy.example.com', 8080); set_timeoutSpecify timeout of connection in seconds$node->set_timeout( 15 ); set_authSpecify name and password for authentication to node server.$node->set_auth('clint','eastwood'); statusReturn status code of last request.print $node->status; "-1" means connection failure. put_docAdd a document$node->put_doc( $document_draft ) or die "can't add document"; Return true on success or false on failure. out_docRemove a document$node->out_doc( document_id ) or "can't remove document"; Return true on success or false on failture. out_doc_by_uriRemove a registrated document using it's uri$node->out_doc_by_uri( 'file:///document/uri/42' ) or "can't remove document"; Return true on success or false on failture. edit_docEdit attributes of a document$node->edit_doc( $document_draft ) or die "can't edit document"; Return true on success or false on failture. get_docRetreive documentmy $doc = $node->get_doc( document_id ) or die "can't get document"; Return true on success or false on failture. get_doc_by_uriRetreive documentmy $doc = $node->get_doc_by_uri( 'file:///document/uri/42' ) or die "can't get document"; Return true on success or false on failture. get_doc_attrRetrieve the value of an atribute from objectmy $val = $node->get_doc_attr( document_id, 'attribute_name' ) or die "can't get document attribute"; get_doc_attr_by_uriRetrieve the value of an atribute from objectmy $val = $node->get_doc_attr_by_uri( document_id, 'attribute_name' ) or die "can't get document attribute"; etch_docExctract document keywordsmy $keywords = $node->etch_doc( document_id ) or die "can't etch document"; etch_doc_by_uriRetreive documentmy $keywords = $node->etch_doc_by_uri( 'file:///document/uri/42' ) or die "can't etch document"; Return true on success or false on failture. uri_to_idGet ID of document specified by URImy $id = $node->uri_to_id( 'file:///document/uri/42' ); This method won't croak, even if using "croak_on_error". _fetch_docPrivate function used for implementing of "get_doc", "get_doc_by_uri", "etch_doc", "etch_doc_by_uri".# this will decode received draft into Search::Estraier::Document object my $doc = $node->_fetch_doc( id => 42 ); my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42' ); # to extract keywords, add etch my $doc = $node->_fetch_doc( id => 42, etch => 1 ); my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', etch => 1 ); # to get document attrubute add attr my $doc = $node->_fetch_doc( id => 42, attr => '@mdate' ); my $doc = $node->_fetch_doc( uri => 'file:///document/uri/42', attr => '@mdate' ); # more general form which allows implementation of # uri_to_id my $id = $node->_fetch_doc( uri => 'file:///document/uri/42', path => '/uri_to_id', chomp_resbody => 1 ); namemy $node_name = $node->name; labelmy $node_label = $node->label; doc_nummy $documents_in_node = $node->doc_num; word_nummy $words_in_node = $node->word_num; sizemy $node_size = $node->size; searchSearch documents which match conditionmy $nres = $node->search( $cond, $depth ); $cond is "Search::Estraier::Condition" object, while <$depth> specifies depth for meta search. Function results "Search::Estraier::NodeResult" object. cond_to_queryReturn URI encoded string generated from Search::Estraier::Conditionmy $args = $node->cond_to_query( $cond, $depth ); shuttle_urlThis is method which uses "LWP::UserAgent" to communicate with Hyper Estraier node master.my $rv = shuttle_url( $url, $content_type, $req_body, \$resbody ); $resheads and $resbody booleans controll if response headers and/or response body will be saved within object. set_snippet_widthSet width of snippets in results$node->set_snippet_width( $wwidth, $hwidth, $awidth ); $wwidth specifies whole width of snippet. It's 480 by default. If it's 0 snippet is not sent with results. If it is negative, whole document text is sent instead of snippet. $hwidth specified width of strings from beginning of string. Default value is 96. Negative or zero value keep previous value. $awidth specifies width of strings around each highlighted word. It's 96 by default. If negative of zero value is provided previous value is kept unchanged. set_userManage users of node$node->set_user( 'name', $mode ); $mode can be one of:
Return true on success, otherwise false. set_linkManage node links$node->set_link('http://localhost:1978/node/another', 'another node label', $credit); If $credit is negative, link is removed. adminsmy @admins = @{ $node->admins }; Return array of users with admin rights on node guestsmy @guests = @{ $node->guests }; Return array of users with guest rights on node linksmy $links = @{ $node->links }; Return array of links for this node cacheusageReturn cache usage for a nodemy $cache = $node->cacheusage; masterSet actions on Hyper Estraier node master ("estmaster" process)$node->master( action => 'sync' ); All available actions are documented in <http://hyperestraier.sourceforge.net/nguide-en.html#protocol> PRIVATE METHODSYou could call those directly, but you don't have to. I hope._set_infoSet information for node$node->_set_info; _clear_infoClear information for node$node->_clear_info; On next call to "name", "label", "doc_num", "word_num" or "size" node info will be fetch again from Hyper Estraier. EXPORTNothing.SEE ALSO<http://hyperestraier.sourceforge.net/>Hyper Estraier Ruby interface on which this module is based. Hyper Estraier now also has pure-perl binding included in distribution. It's a faster way to access databases directly if you are not running "estmaster" P2P server. AUTHORDobrica Pavlinusic, <dpavlin@rot13.org>Robert Klep <robert@klep.name> contributed refactored search code COPYRIGHT AND LICENSECopyright (C) 2005-2006 by Dobrica PavlinusicThis library is free software; you can redistribute it and/or modify it under the GPL v2 or later. POD ERRORSHey! The above document had some coding errors, which are explained below:
Visit the GSP FreeBSD Man Page Interface. |