 |
|
| |
PERL_PERFORMANCE(1) |
Makepp |
PERL_PERFORMANCE(1) |
makepp_perl_performance -- How to make Perl faster
The biggest tuning gains will usually come from algorithmic improvements. But
while these can be hard to find, there is also a lot you can do mechanically.
Makepp is a big heavy-duty program, where speed is a must. A lot
of effort has been put into optimizing it. This documents some general
things we have found. Currently the concrete tests leading to these results
have mostly been discarded, but I plan to gradually add them.
If you are looking at how to speedup makepp (beyond the Perl code
you put into your makefiles), look at makepp_speedup. This page is
completely independent of makepp, only intended to make our results
available to the Perl community. Some of these measures are common sence,
but you sometimes forget them. Others need measuring to believe them,
so:
- Profile your program
- Makepp comes with a module profiler.pm in its cvs repository. This
is first run as a program on a copy(!) of your code, which it instruments.
Then you run your copy and get configurable statistics per interval and a
final total on the most frequently called functions and on the most time
spent in functions (minus subcalls). Both are provided absolutely and in
caller-callee pairs. (Documentation within.)
This tells you which functions are the most promising
candidates for tuning. It also gives you a hint where your algorithm
might be wrong, either within surprisingly expensive functions, or
through surprisingly frequent calls.
- Time your solution
- Either one of
perl -Mstrict -MBenchmark -we 'my <initialization>; timethis -10, sub { <code> }'
time perl -Mstrict -we 'my <initialization>; for( 0..999_999 ) { <code> }'
when run on different variants of code you can think of, can
give surprising results. Even small modifications can matter a lot. Be
careful not to "measure" code that can get optimized away,
because you discard the result, or because it depends on constants.
Depending on your system, this will tell you in kb how fat
Perl got:
perl -Mstrict -we '<build huge data>; system "ps -ovsz $$"'
Below we only show the code within the
"-e" option as one liners.
- Use simple regexps
- Several matches combined with "||" are
faster than a big one with "|".
- Use precompiled regexps
- Instead of interpolating strings into regexps (except if the string will
never change and you use the "o"
modifier), precompile the regexp with
"qr//" and interpolate that.
- Use (?:...)
- If you don't use what the grouping matches, don't make Perl save it with
"(...)".
- Anchor at beginning of string
- Don't make Perl look through your whole string, if you want a match only
at the beginning.
- Don't anchor at end after greedy
- If you have a "*" or
"+" that will match till the end of
string, don't put a "$" after it.
- Use tr///
- This is twice as fast as s/// when it is applicable.
- Avoid object orientation
- Dynamic method lookup is slower in any language, and Perl, being loosely
typed, can never do it at compile time. Don't use it, unless you need the
benefit of polymorphism through inheritance. The following call methods
are ordered from slowest to fastest:
$o->method( ... ); # searched in class of $o and its @ISA
Class::method( $o, ... ); # static function, new stack
Class::method $o, ...; # static function, new stack, checked at compile time
&Class::method; # static function, reuse stack
This last form always possible if method (or normal function)
takes no arguments. If it does take arguments, watch out that you don't
inadvertently supply any optional ones! If you use this form a lot, it
is best to keep track of the minimum and maximum number of arguments
each function can take. Reusing a stack with extra arguments is no
problem, they'll get ignored.
- Don't modify stack
- The following sin is frequently found even in the Perl doc:
my $self = shift;
Unless you have a pertinent reason for this, use this:
my( $self, $x, $y, @z ) = @_;
- Use few functions and modules
- Every function (and that alas includes constants) takes up over 1kb for
it's mere existence. With each module requiring other ones, most of which
you never need, that can add up. Don't pull in a big module, just to
replace two lines of Perl code with a single more elegant looking function
call.
If you have a function only called in one place, and the two
combined would still be reasonably short, merge them with due
comments.
Don't have one function only call another with the same
arguments. Alias it instead:
*alias = \&function;
- Group calls to print
- Individual calls to print, or print with separate arguments are very
expensive. Build up the string in memory and print it in one go. If you
can accumulate over 3kb, syswrite is more efficient.
perl -MBenchmark -we 'timethis -10, sub { print STDERR $_ for 1..5 }' 2>/dev/null
perl -MBenchmark -we 'timethis -10, sub { print STDERR 1..5 }' 2>/dev/null
perl -MBenchmark -we 'timethis -10, sub { my $str = ""; $str .= $_ for 1..5; print STDERR $str }' 2>/dev/null
- Avoid hashes
- Perl becomes quite slow with many small hashes. If you don't need them,
use something else. Object orientation works just as well on an array,
except that the members can't be accessed by name. But you can use numeric
constants to name the members. For the sake of comparability we use plain
numeric keys here:
my $i = 0; our %a = map +($i++, $_), "a".."j"; timethis -10, sub { $b = $a{int rand 10} }
our @a = "a".."j"; timethis -10, sub { $b = $a[rand 10] }
my $i = 0; my %a = map +($i++, $_), "a".."j"; timethis -10, sub { $b = $a{int rand 10} }
my @a = "a".."j"; timethis -10, sub { $b = $a[rand 10] }
- Use int keys for ref sets
- When you need a unique reference representation, e.g. for set ops with
hashes, using the integer form of refs is three times as fast as using the
pretty printed default string representation. Caveat: the HP/UX 64bitall
variant of Perl, at least up to 5.8.8 has a buggy
"int" function, where this doesn't work
reliably. There a hex form is still a fair bit faster than default
strings.
my @list = map { bless { $_ => 1 }, "someclass" } 0..9; my( %a, %b );
timethis -10, sub { $a{$_} = 1 for @list };
timethis -10, sub { $b{int()} = 1 for @list };
timethis -10, sub { $b{sprintf '%x', $_} = 1 for @list }
- Beware of strings
- Perl is awful for always copying strings around, even if you're never
going to modify them. This wastes CPU and memory. Try to avoid that
wherever reasonably possible. If the string is a function parameter and
the function has a modest length, don't copy the string into a
"my" variable, access it with
$_[0] and document the function well. Elsewhere,
the aliasing feature of "for(each)" can
help. Or just use references to strings, which are fast to copy. If you
somehow ensure that same strings get stored only once, you can do
numerical comparison for equality.
- Avoid bit operations
- If you have disjoint bit patterns you can add them instead of or`ing them.
Shifting can be performed my multiplication or integer division. Retaining
only the lowest bits can be achieved with modulo.
Separate boolean hash members are faster than stuffing
everything into an integer with bit operations or into a string with
"vec".
- Use order of boolean operations
- If you only care whether an expression is true or false, check the cheap
things, like boolean variables, first, and call functions last.
- Use undef instead of 0
- It takes up a few percent less memory, at least as hash or list values.
You can still query it as a boolean.
my %x; $x{$_} = 0 for 0..999_999; system "ps -ovsz $$"
my %x; undef $x{$_} for 0..999_999; system "ps -ovsz $$"
my @x = (0) x 999_999; system "ps -ovsz $$"
my @x = (undef) x 999_999; system "ps -ovsz $$"
- Choose for or map
- These are definitely not equivalent. Depending on your use (i.e. the list
and the complexity of your code), one or the other may be faster.
my @l = 0..99;
for( 0..99_999 ) { map $a = " $_ ", @l }
for( 0..99_999 ) { map $a = " $_ ", 0..99 }
for( 0..99_999 ) { $a = " $_ " for @l }
for( 0..99_999 ) { $a = " $_ " for 0..99 }
- Don't alias $_
- While it is convenient, it is rather expensive, even copying reasonable
strings is faster. The last example is twice as fast as the first
"for".
my $x = "abcdefg"; my $b = 0;
for( "$x" ) { $b = 1 - $b if /g/ } # Copy needed only if modifying.
for( $x ) { $b = 1 - $b if /g/ }
local *_ = \$x; $b = 1 - $b if /g/;
local $_ = $x; $b = 1 - $b if /g/; # Copy cheaper than alias.
my $y = $x; $b = 1 - $b if $y =~ /g/;
Daniel Pfeiffer <occitan@esperanto.org>
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc.
|