![]() |
![]()
| ![]() |
![]()
NAMEWWW::RobotRules::Parser - Just Parse robots.txtSYNOPSISuse WWW::RobotRules::Parser; my $p = WWW::RobotRules::Parser->new; $p->parse($robots_txt_uri, $text); $p->parse_uri($robots_txt_uri); DESCRIPTIONWWW::RobotRules::Parser allows you to simply parse robots.txt files as described in http://www.robotstxt.org/wc/norobots.html. Unlike WWW::RobotRules (which is very cool), this module does not take into consideration your user agent name when parsing. It just parses the structure and returns a hash containing the whole set of rules. You can then use this to do whatever you like with it.I mainly wrote this to store away the parsed data structure else where for later use, without having to specify an user agent. METHODSnewCreates a new instance of WWW::RobotRules::Parserparse($uri, $text)Given the URI of the robots.txt file and its contents, parses the content and returns a data structure that looks like the following:{ '*' => [ '/private', '/also_private' ], 'Another UserAgent' => [ '/dont_look' ] } Where the key is the user agent name, and the value is an arrayref of all paths that are prohibited by that user agent parse_uri($uri)Given the URI of the robots.txt file, retrieves and parses the file.SEE ALSOWWW::RobotRulesAUTHORCopyright (c) 2006-2007 Daisuke Maki <daisuke@endeworks.jp>LICENSEThis program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.See http://www.perl.com/perl/misc/Artistic.html
|