|
NAMEXML::RSS::Parser - A liberal object-oriented parser for RSS feeds.SYNOPSIS#!/usr/bin/perl -w use strict; use XML::RSS::Parser; use FileHandle; my $p = XML::RSS::Parser->new; my $fh = FileHandle->new('/path/to/some/rss/file'); my $feed = $p->parse_file($fh); # output some values my $feed_title = $feed->query('/channel/title'); print $feed_title->text_content; my $count = $feed->item_count; print " ($count)\n"; foreach my $i ( $feed->query('//item') ) { my $node = $i->query('title'); print ' '.$node->text_content; print "\n"; } DESCRIPTIONXML::RSS::Parser is a lightweight liberal parser of RSS feeds. This parser is "liberal" in that it does not demand compliance of a specific RSS version and will attempt to gracefully handle tags it does not expect or understand. The parser's only requirements is that the file is well-formed XML and remotely resembles RSS. Roughly speaking, well formed XML with a "channel" element as a direct sibling or the root tag and "item" elements etc.There are a number of advantages to using this module then just using a standard parser-tree combination. There are a number of different RSS formats in use today. In very subtle ways these formats are not entirely compatible from one to another. XML::RSS::Parser makes a couple assumptions to "normalize" the parse tree into a more consistent form. For instance, it forces "channel" and "item" into a parent-child relationship. For more detail see "SPECIAL PROCESSING NOTES". This module is leaner then XML::RSS -- the majority of code was for generating RSS files. It also provides a XPath-esque interface to the feed's tree. While XML::RSS::Parser creates a normalized parse tree, it still leaves the mapping of overlapping and alternate tags common in the RSS format space to the developer. For this look at the XML::RAI (RSS Abstraction Interface) package which provides an object-oriented layer to XML::RSS::Parser trees that transparently maps these various tags to one common interface. XML::RSS::Parser is based on XML::Elemental, a a SAX-based package for easily parsing XML documents into a more native and mostly object-oriented perl form. SPECIAL PROCESSING NOTESThere are a number of different RSS formats in use today. In very subtle ways these formats are not entirely compatible from one to another. What's worse is that there are unlabeled versions within the standard in addition to tags with overlapping purposes and vague definitions. (See Mark Pilgrim's "The myth of RSS compatibility" "/diveintomark.org/archives/2004/02/04/incompatible- rss" in http: for just a sampling of what I mean.) To ease working with RSS data in different formats, the parser does not create the feed's parse tree verbatim. Instead it makes a few assumptions to "normalize" the parse tree into a more consistent form.With the refactoring of this module and the switch to a true tree structure, the normalization process has been simplified. Some of the version 2x proved to be problematic with more advanced and complex feeds.
Two significant changes were made with the release of version 4.0.
NAMESPACE PREFIXESThe following prefix and namespace combinations are recognized by default. Use "register_ns_prefix" to add more as needed.admin http://webns.net/mvcb/ ag http://purl.org/rss/1.0/modules/aggregation/ annotate http://purl.org/rss/1.0/modules/annotate/ atom http://www.w3.org/2005/Atom audio http://media.tangent.org/rss/1.0/ cc http://web.resource.org/cc/ company http://purl.org/rss/1.0/modules/company content http://purl.org/rss/1.0/modules/content/ cp http://my.theinfo.org/changed/1.0/rss/ dc http://purl.org/dc/elements/1.1/ dcterms http://purl.org/dc/terms/ email http://purl.org/rss/1.0/modules/email/ ev http://purl.org/rss/1.0/modules/event/ feedburner http://rssnamespace.org/feedburner/ext/1.0 foaf http://xmlns.com/foaf/0.1/ image http://purl.org/rss/1.0/modules/image/ itunes http://www.itunes.com/DTDs/Podcast-1.0.dtd l http://purl.org/rss/1.0/modules/link/ openSearch http://a9.com/-/spec/opensearchrss/1.0/ rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs http://www.w3.org/2000/01/rdf-schema# ref http://purl.org/rss/1.0/modules/reference/ reqv http://purl.org/rss/1.0/modules/richequiv/ rss091 http://purl.org/rss/1.0/modules/rss091# search http://purl.org/rss/1.0/modules/search/ slash http://purl.org/rss/1.0/modules/slash/ ss http://purl.org/rss/1.0/modules/servicestatus/ str http://hacks.benhammersley.com/rss/streaming/ sub http://purl.org/rss/1.0/modules/subscription/ sy http://purl.org/rss/1.0/modules/syndication/ tapi http://api.technorati.com/dtd/tapi-001.xml# taxo http://purl.org/rss/1.0/modules/taxonomy/ thr http://purl.org/rss/1.0/modules/threading/ trackback http://madskills.com/public/xml/rss/module/trackback/ wiki http://purl.org/rss/1.0/modules/wiki/ xhtml http://www.w3.org/1999/xhtml xml http://www.w3.org/XML/1998/namespace/ creativeCommons http://backend.userland.com/creativeCommonsRssModule METHODSThe following objects and methods are provided in this package.
DEPENDENCIESXML::SAX, XML::Elemental, Class::ErrorHandler, Class::XPath 1.4*Versions up to 1.4 have a design flaw that would cause it to choke on feeds with the / character in an attribute value. For example the Yahoo! feeds. SEE ALSOXML::RAIThe Feed Validator <http://www.feedvalidator.org/> What is RSS? <http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html> Raising the Bar on RSS Feed Quality "/www.oreillynet.com/pub/a/webservices/2002/11/19/ rssfeedquality.html" in http: The myth of RSS compatibility "/diveintomark.org/archives/2004/02/04/incompatible- rss" in http: AUTHOR & COPYRIGHTExcept where otherwise noted, XML::RSS::Parser is Copyright 2003-2005, Timothy Appnel, cpan@timaoutloud.org. All rights reserved.POD ERRORSHey! The above document had some coding errors, which are explained below:
Visit the GSP FreeBSD Man Page Interface. |