Net::OAI::Record::NamespaceFilter - general filter class based on namespace URIs
$plug = Net::OAI::Record::NamespaceFilter->new(); # Noop
$multihandler = Net::OAI::Record::NamespaceFilter->new(
'http://www.openarchives.org/OAI/2.0/oai_dc/' => 'Net::OAI::Record::OAI_DC',
'http://www.openarchives.org/OAI/2.0/provenance' => 'MySAX::ProvenanceHandler'
);
$saxfilter = new SOME_SAX_Filter;
...
$filter = Net::OAI::Record::NamespaceFilter->new(
'*' => $saxfilter, # '*' for any namespace
);
$filter = Net::OAI::Record::NamespaceFilter->new(
'*' => sub { my $x = "";
return XML::SAX::Writer->new(Output => \$x);
};
);
It will forward any element belonging to a namespace from this list to the
associated SAX filter and all of the element's children (regardless of their
respective namespace) to the same one. It can be used either as a
"metadataHandler" or
"recordHandler".
This SAX filter takes a hashref
"namespaces" as argument, with namespace
URIs for keys ('*' for "any namespace") and the values are
either
- undef
- Matching elements and their subelements are suppressed.
If the list of namespaces ist empty or
"undefined" is connected to the
filter, it effectively acts as a plug to Net::OAI::Harvester. This might
come handy if you are planning to get to the raw result by other means,
e.g. by tapping the user agent or accessing the result's xml()
method:
$plug = Net::OAI::Record::NamespaceFilter->new();
$harvester = Net::OAI::Harvester->new( [
baseURL => ...,
] );
$tapped_by_ua = "";
open ($TAP, ">", \$tapped_by_ua);
$harvester->userAgent()->add_handler(response_data => sub {
my($response, $ua, $h, $data) = @_;
print $TAP $data;
});
$list = $harvester->listRecords(
metadataPrefix => 'a_strange_one',
recordHandler => $plug,
);
print $tapped_by_ua; # complete OAI response
print $list->xml(); # should be exactly the same
Comment: This is quite an efficient way of not processing the
XML content of OAI records received.
- a class name of a SAX filter
- As usual for any record element of the OAI response a new instance is
created.
# end_document() of instances of MyWriter returns something meaningful...
$consumer = Net::OAI::Record::NamespaceFilter->new('*'=> 'MyWriter');
$filter = Net::OAI::Record::NamespaceFilter->new(
'*' => $consumer
);
$list = $harvester->listAllRecords(
metadataPrefix => 'oai_dc',
recordHandler => $filter,
);
while( $r = $list->next() ) {
next if $r->status() eq "deleted";
$xmlstringref = $r->recorddata()->result('*');
...
};
Note: The handlers are instantiated for each single OAI record
in the response and will see one start_document() and
end_document() event in any case (this behavior is different from
that of handler class names directly specified as
"metadataHandler" or
"recordHandler" for a request:
instances from those constructions will never see such events).
- a code reference for an constructor
- Must return a SAX filter ready to accept a new document.
The following example returns a string serialization for each
single record:
# end_document() events will return \$x
$constructor = sub { my $x = "";
return XML::SAX::Writer->new(Output => \$x);
};
$filter = Net::OAI::Record::NamespaceFilter->new(
'*' => $constructor
);
$list = $harvester->listRecords(
metadataPrefix => 'oai_dc',
recordHandler => $filter,
);
while( $r = $list->next() ) {
$xmlstringref = $r->recorddata()->result('*');
...
};
Comment: This example shows an approach to insulate the
"true contents" of individual response records without having
to provide a SAX handler class of one's own (just the addidtional
prerequisite of XML::SAX::Writer). But what you get is a serialized XML
document which then has to be parsed for further processing ...
- an already instantiated SAX filter
- As usual in this case no
"start_document()" and
"end_document()" events are forwarded to
the filter.
open $fh, ">", $some_file;
$builder = XML::SAX::Writer->new(Output => $fh);
$builder->start_document();
$rootEL = { Name => 'collection',
LocalName => 'collection',
NamespaceURI => "http://www.loc.gov/MARC21/slim",
Prefix => "",
Attributes => {}
};
$builder->start_element( $rootEL );
# filter for OAI-Namespace in records: forward all
$filter = Net::OAI::Record::NamespaceFilter->new(
'http://www.loc.gov/MARC21/slim' => $builder);
$list = $harvester->listRecords(
metadataPrefix => 'a_strange_one',
metadataHandler => $filter,
);
# handle resumption tokens if more than the first
# chunk shall be stored into $fh ....
$builder->end_element( $rootEL );
$builder->end_document();
close($fh);
# ... process contents of $some_file
In this example calling the
"result()" method for individual
records in the response will probably not be of much use.
Caution: Depending on the namespaces specified, even a handlers
which are freshly instantiated for each OAI record might be fed with more
than one top-level XML element.
Creates a Handler suitable as recordHandler or metadataHandler.
%namespaces has namespace URIs for keys and
values according to the four types described as above.
If called with a namespace, it returns the result of the handler, i.e.
what "end_document()" returned for the
record in question. Otherwise it returns a hashref for all the results with
the corresponding namespaces as keys.
Thomas Berger <ThB@gymel.com>