|
|
| |
URI::Fetch(3) |
User Contributed Perl Documentation |
URI::Fetch(3) |
URI::Fetch - Smart URI fetching/caching
use URI::Fetch;
## Simple fetch.
my $res = URI::Fetch->fetch('http://example.com/atom.xml')
or die URI::Fetch->errstr;
do_something($res->content) if $res->is_success;
## Fetch using specified ETag and Last-Modified headers.
$res = URI::Fetch->fetch('http://example.com/atom.xml',
ETag => '123-ABC',
LastModified => time - 3600,
)
or die URI::Fetch->errstr;
## Fetch using an on-disk cache that URI::Fetch manages for you.
my $cache = Cache::File->new( cache_root => '/tmp/cache' );
$res = URI::Fetch->fetch('http://example.com/atom.xml',
Cache => $cache
)
or die URI::Fetch->errstr;
URI::Fetch is a smart client for fetching HTTP pages, notably syndication
feeds (RSS, Atom, and others), in an intelligent, bandwidth- and time-saving
way. That means:
- GZIP support
If you have Compress::Zlib installed, URI::Fetch
will automatically try to download a compressed version of the content,
saving bandwidth (and time).
- Last-Modified and ETag support
If you use a local cache (see the Cache parameter to
fetch), URI::Fetch will keep track of the
Last-Modified and ETag headers from the server, allowing
you to only download pages that have been modified since the last time
you checked.
- Proper understanding of HTTP error codes
Certain HTTP error codes are special, particularly when
fetching syndication feeds, and well-written clients should pay special
attention to them. URI::Fetch can only do so much for you in this
regard, but it gives you the tools to be a well-written client.
The response from fetch gives you the raw HTTP response
code, along with special handling of 4 codes:
- 200 (OK)
Signals that the content of a page/feed was retrieved
successfully.
- 301 (Moved Permanently)
Signals that a page/feed has moved permanently, and that your
database of feeds should be updated to reflect the new URI.
- 304 (Not Modified)
Signals that a page/feed has not changed since it was last
fetched.
- 410 (Gone)
Signals that a page/feed is gone and will never be coming
back, so you should stop trying to fetch it.
If you make a request using a cache and get back a 304 response code (Not
Modified), then if the content was returned from the cache, then
"is_success()" will return true, and
"$response->content" will contain the
cached content.
I think this is the right behaviour, given the philosophy of
"URI::Fetch", but please let me (NEILB)
know if you disagree.
Fetches a page identified by the URI $uri.
On success, returns a URI::Fetch::Response object; on
failure, returns "undef".
%param can contain:
- LastModified
- ETag
LastModified and ETag can be supplied to force
the server to only return the full page if it's changed since the last
request. If you're writing your own feed client, this is recommended
practice, because it limits both your bandwidth use and the
server's.
If you'd rather not have to store the LastModified time
and ETag yourself, see the Cache parameter below (and the
SYNOPSIS above).
- Cache
If you'd like URI::Fetch to cache responses between
requests, provide the Cache parameter with an object supporting
the Cache API (e.g. Cache::File, Cache::Memory).
Specifically, an object that supports
"$cache->get($key)" and
"$cache->set($key, $value,
$expires)".
If supplied, URI::Fetch will store the page content,
ETag, and last-modified time of the response in the cache, and will pull
the content from the cache on subsequent requests if the page returns a
Not-Modified response.
- UserAgent
Optional. You may provide your own LWP::UserAgent instance.
Look into LWPx::ParanoidUserAgent if you're fetching URLs given to you
by possibly malicious parties.
- NoNetwork
Optional. Controls the interaction between the cache and HTTP
requests with If-Modified-Since/If-None-Match headers. Possible
behaviors are:
- false (default)
- If a page is in the cache, the origin HTTP server is always checked for a
fresher copy with an If-Modified-Since and/or If-None-Match header.
- 1
- If set to 1, the origin HTTP is never contacted,
regardless of the page being in cache or not. If the page is missing from
cache, the fetch method will return undef. If the page is in cache, that
page will be returned, no matter how old it is. Note that setting this
option means the URI::Fetch::Response object will never have the
http_response member set.
- "N", where N > 1
- The origin HTTP server is not contacted if the page is in cache
and the cached page was inserted in the last N seconds. If the
cached copy is older than N seconds, a normal HTTP request (full or cache
check) is done.
- ContentAlterHook
Optional. A subref that gets called with a scalar reference to
your content so you can modify the content before it's returned and
before it's put in cache.
For instance, you may want to only cache the <head>
section of an HTML document, or you may want to take a feed URL and
cache only a pre-parsed version of it. If you modify the scalarref given
to your hook and change it into a hashref, scalarref, or some blessed
object, that same value will be returned to you later on not-modified
responses.
- CacheEntryGrep
Optional. A subref that gets called with the
URI::Fetch::Response object about to be cached (with the contents
already possibly transformed by your
"ContentAlterHook"). If your subref
returns true, the page goes into the cache. If false, it doesn't.
- Freeze
- Thaw
Optional. Subrefs that get called to serialize and
deserialize, respectively, the data that will be cached. The cached data
should be assumed to be an arbitrary Perl data structure, containing
(potentially) references to arrays, hashes, etc.
Freeze should serialize the structure into a scalar; Thaw
should deserialize the scalar into a data structure.
By default, Storable will be used for freezing and
thawing the cached data structure.
- ForceResponse
Optional. A boolean that indicates a
URI::Fetch::Response should be returned regardless of the HTTP
status. By default "undef" is returned
when a response is not a "success" (200 codes) or one of the
recognized HTTP status codes listed above. The HTTP status message can
then be retreived using the "errstr"
method on the class.
<https://github.com/neilbowers/URI-Fetch>
URI::Fetch is free software; you may redistribute it and/or modify it
under the same terms as Perl itself.
Except where otherwise noted, URI::Fetch is Copyright 2004 Benjamin
Trott, ben+cpan@stupidfool.org. All rights reserved.
Currently maintained by Neil Bowers.
- Tim Appnel
- Mario Domgoergen
- Karen Etheridge
- Brad Fitzpatrick
- Jason Hall
- Naoya Ito
- Tatsuhiko Miyagawa
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |