|
|
| |
Iterator::File(3) |
User Contributed Perl Documentation |
Iterator::File(3) |
Iterator::File -- A file iterator, optionally stateful and verbose.
use Iterator::File;
## Simplest form...
$i = iterator_file( 'mydata.txt' );
while( $i++ ) {
&something_interesting( $i );
}
## Disable auto-chomp, emit status, and allow us to resume if ^C...
$i = iterator_file( 'mydata.txt',
'chomp' => 0,
'status' => 1,
'resume' => 1,
);
while( $i++ ) {
&something_interesting( $i );
}
## OO style...
$i = iterator_file( 'mydata.txt' );
while( $i->next() ) {
&something_interesting( $i->value() );
}
"Iterator_File" is an attempt to take some
repetition & tedium out of processing a flat file. Whenever doing so, I
found myself adapting prior scripts so that processes could be resumed, emit
status, etc. Hence an itch (and this module) was born.
- iterator_file($file, %config)
- Returns an "Iterator::File" object. See
%config section below for additional information
on options.
- new(%config)
- The constructor returns a new
"Iterator::File" object, handling
arugment defaults & validation, and automatically invoking
"initialize".
- initialize()
- Executes all startup work required before iteration. E.g., opening
resources, detecting if a prior process terminated early & resuming,
etc.
- next(), '++'
- Increment the iterator & return the new value.
- value(), string context
- Return the current value, without advancing.
- advance_to( $location )
- Advance the iterator to $location. If
$location is behind the current location,
behavior is undefined. (I.e., don't do that.)
- finish()
- Automatically invoked when the complete list is process. If the
process dies before the last item of the list, this process is
intentionally not invoked.
- chmop
- Automatically chomp each line. Default: enabled.
- verbose
- Enable verbose messaging for things such as temporary files. Default:
disabled.
Note: for status messages, see
"Status" below
- debug
- Enable debugging messages. It can also be enabled by setting the
environmental variable ITERATOR_FILE_DEBUG to something true (to avoid
modifying code to enable it). Default: disabled.
- resume
- If enabled, "Iterator::File" will keep
track of which lines you've seen, even between invokations. That way if
you program unexpectedly dies (e.g., via a bug or ^C), you can pick up
where you left off just by running your program again. Default:
disabled.
- repeat_on_resume
- If enabled, "Iterator::File" will error
on the side of giving you the same line twice between invocations. E.g.,
if your program were to be restarted after dieing on the 100th line,
"repeat_on_resume" would give you the
100th line on the 2nd invocation (verus the 101th). Default:
disabled.
- update_frequency
- How often to update state. For very large data sets with light individual
processing requirements, it may be worth setting to something other than
1. Default: 1.
- state_class
- Options:
"Iterator::File::State::TempFile" and
"Iterator::File::State::IPCShareable".
TempFile is the default and in a lot of cases should be good enough. If
you have philosophical objections to a frequently changing value living on
disk (or a really, really slow disk), you can used shared memory via
IPC::Sharable.
- status_method
- What algorithm to use to display status. Options are
"emit_status_logarithmic",
"emit_status_fixed_line_interval", and
"emit_status_fixed_time_interval".
"emit_status_fixed_time_interval"
will display status logarithmically. I.e., 1, 2, 3 ... 9, 10, 20, 30 ...
90, 100, 200, 300 ... 900, 1000, 2000, etc.
"emit_status_fixed_line_interval"
display status every X lines, where X is defined by
"status_line_interval".
"emit_status_fixed_time_interval"
display status every X lines, where X is defined by
"status_time_interval".
Default: emit_status_logarithmic.
- status_line_interval
- If "status_method" is
"emit_status_fixed_line_interval",
controls how frequently to display status. Default: 10 (lines).
- status_time_interval
- If "status_method" is
"emit_status_time_line_interval",
controls how frequently to display status. Default: 2 (seconds).
- status_filehandle
- Filehandle to use for printing status. Default: STDERR.
- status_line
- Format of status line. Default: "Processing row '%d'...\n".
Do not call chop or chomp on the iterator!! Unfortuntely, doing so
destorys your object & leaves you with a plain ol' string. :(
William Reardon, <wdr1@pobox.com>
Copyright (C) 2008 by William Reardon
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself, either Perl version 5.8.8 or,
at your option, any later version of Perl 5 you may have available.
Hey! The above document had some coding errors, which are explained
below:
- Around line 276:
- You forgot a '=back' before '=head1'
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |