|
NAMEFile::Stream - Regular expression delimited records from streamsSYNOPSISuse File::Stream; my $stream = File::Stream->new($filehandle); $/ = qr/\s*,\s*/; print "$_\n" while <$stream>; # or: ($handler, $stream) = File::Stream->new( $filehandle, read_length => 1024, separator => qr{to_be_used_instead_of_$/}, ); while(<$stream>) {...} my $line = $stream->readline(); # similar # extended usage: use URI; my $uri = URI->new('http://steffen-mueller.net'); my ($pre_match, $match) = $handler->find('literal_string', qr/regex/, $uri); # $match contains whichever argument to find() was found first. # $pre_match contains all that was before the first token that was found. # both the contents of $match and $pre_match have been removed from the # data stream (buffer). # Since version 2.10 of the module, you can use seek() and tell() on # File::Stream objects: my $position = tell($stream); # ... seek($stream, 0, $position); # rewind DESCRIPTIONPerl filehandles are streams, but sometimes they just aren't powerful enough. This module offers to have streams from filehandles searched with regexes and allows the global input record separator variable to contain regexes.Thus, readline() and the <> operator can now return records delimited by regular expression matches. There are some very important gripes with applying regular expressions to (possibly infinite) streams. Please read the CAVEATS section of this documentation carfully. EXPORTNone.newThe new() constructor takes a filehandle (or a glob reference) as first argument. The following arguments are interpreted as key/value pairs with the following parameters being defined:
The new() method returns a fresh File::Stream object that has been tied to be a filehandle and a filehandle. All the usual file operations should work on the filehandle and the File::Stream methods should work on the object. readlineThe readline method on a File::Stream object works just like the builtin except that it uses the objects record separator instead of $/ if it has been set via new() and honours regular expressions.This is also internally used when readline() is called on the tied filehandle. findFinds the first occurrance one of its arguments in the stream. For example,$stream_handler->find('a', 'b'); finds the first character 'a' or 'b' in the stream whichever comes first. Returns two strings: The data read from the stream before the match and the match itself. The arguments to find() may be regular expressions, but please see the CAVEATS section of this documentation about that. If any of the arguments is an object, it will be evaluated in stringification context and the result of that will be matched literally, ie. not as a regular expression. As with readline(), this is a method on the stream handler object. fill_bufferIt is unlikely that you will need to call this method directly. Reads more data from the internal filehandle into the buffer. First argument may be the number of bytes to read, otherwise the 'read_length' attribute is used.Again, call this on the handler object, not the file handle. CAVEATSThere are several important issues to keep in mind when using this module. First, setting $/ to a regular expression will most certainly break badly when $/ is used on filehandles that are not File::Stream object. Please consider setting the "separator" attribute of the File::Stream object instead for a more robust solution.In a later version of this module, either $/ may be tied to do magic that only applies the regex to File::Stream objects, or CORE::readline() might be overridden to use the builtin readline() whenever the handle at hand is not a File::Stream. It is currently unclear which would be less bad. Most importantly, however, there are some inherent problems with regular expressions applied to (possibly infinite) streams. The implementation of Perl's regular expression engine requires that the string you apply a regular expression to be in memory completely. That means applying a regular expression that matches infinitely long strings (like .*) to a stream will lead to the module reading in the whole file, or worse yet, an infinite string. Anchors like ^ or $ don't make sense with streams either, but since version 1.11 of File::Stream, the module throws a fatal error when finding an anchor in a regular expression. Using infinitely long matches on infinite streams may still result in your machine running out of memory. So don't do that! Since version 1.10, the buffer is extended whenever the regex reaches its end. That means it has to tokenize the regex and insert weird constructs in many places. This is a rather slow and fragile process. AUTHORSteffen Mueller, <stream-module at steffen-mueller dot net>Many thanks to Simon Cozens for his advice and the original idea, Autrijus Tang for much help with the fiendish regexes I couldn't handle, and Ben Tilly for suggesting the use of the ${} regex construct. Furthermore, since version 2.10, File::Stream includes a patch implementing "seek()" and "tell()" for File::Stream objects. The idea and code were kindly supplied by Phil Whineray. COPYRIGHT AND LICENSECopyright (C) 2003-2011 by Steffen MuellerThis library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.6 or, at your option, any later version of Perl 5 you may have available. SEE ALSOperltie, Tie::Handle, perlre, YAPE::RegexPerl6::Slurp
Visit the GSP FreeBSD Man Page Interface. |