|
|
| |
LMBENCH(3) |
LMBENCH |
LMBENCH(3) |
lmbench - benchmarking toolbox
#include ``lmbench.h''
typedef u_long iter_t
typedef (*benchmp_f)(iter_t iterations, void* cookie)
void benchmp(benchmp_f initialize, benchmp_f benchmark,
benchmp_f cleanup, int enough, int parallel, int warmup, int repetitions,
void* cookie)
uint64 get_n()
void milli(char *s, uint64 n)
void micro(char *s, uint64 n)
void nano(char *s, uint64 n) void mb(uint64
bytes)
void kb(uint64 bytes)
Creating benchmarks using the lmbench timing harness is easy. Since it is
so easy to measure performance using lmbench , it is possible to
quickly answer questions that arise during system design, development, or
tuning. For example, image processing
There are two attributes that are critical for performance,
latency and bandwidth, and lmbench´s timing harness makes it
easy to measure and report results for both. Latency is usually important
for frequently executed operations, and bandwidth is usually important when
moving large chunks of data.
There are a number of factors to consider when building
benchmarks.
The timing harness requires that the benchmarked operation be
idempotent so that it can be repeated indefinitely.
The timing subsystem, benchmp, is passed up to three
function pointers. Some benchmarks may need as few as one function pointer
(for benchmark).
- void benchmp(initialize, benchmark, cleanup, enough, parallel, warmup,
repetitions, cookie)
- measures the performance of benchmark repeatedly and reports the
median result. benchmp creates parallel sub-processes which
run benchmark in parallel. This allows lmbench to measure the
system's ability to scale as the number of client processes increases.
Each sub-process executes initialize before starting the
benchmarking cycle with iterations set to 0. It will call
initialize , benchmark , and cleanup with
iterations set to the number of iterations in the timing loop
several times in order to collect repetitions results. The calls to
benchmark are surrounded by start and stop call to
time the amount of time it takes to do the benchmarked operation
iterations times. After all the benchmark results have been
collected, cleanup is called with iterations set to 0 to cleanup
any resources which may have been allocated by initialize or
benchmark. cookie is a void pointer to a hunk of memory that
can be used to store any parameters or state that is needed by the
benchmark.
- void benchmp_getstate()
- returns a void pointer to the lmbench-internal state used during
benchmarking. The state is not to be used or accessed directly by clients,
but rather would be passed into benchmp_interval.
- iter_t benchmp_interval(void* state)
- returns the number of times the benchmark should execute its benchmark
loop during this timing interval. This is used only for weird benchmarks
which cannot implement the benchmark body in a function which can return,
such as the page fault handler. Please see lat_sig.c for sample
usage.
- uint64 get_n()
- returns the number of times loop_body was executed during the
timing interval.
- void milli(char *s, uint64 n)
- print out the time per operation in milli-seconds. n is the number
of operations during the timing interval, which is passed as a parameter
because each loop_body can contain several operations.
- void micro(char *s, uint64 n)
- print the time per opertaion in micro-seconds.
- void nano(char *s, uint64 n)
- print the time per operation in nano-seconds.
- void mb(uint64 bytes)
- print the bandwidth in megabytes per second.
- void kb(uint64 bytes)
- print the bandwidth in kilobytes per second.
Here is an example of a simple benchmark that measures the latency of the random
number generator lrand48():
- #include ``lmbench.h''
void
benchmark_lrand48(iter_t iterations, void* cookie) {
while(iterations-- > 0)
lrand48();
}
int
main(int argc, char *argv[])
{
benchmp(NULL, benchmark_lrand48, NULL, 0, 1, 0, TRIES, NULL);
micro( lrand48()", get_n());"
exit(0);
}
Here is a simple benchmark that measures and reports the bandwidth
of bcopy:
- #include ``lmbench.h''
#define MB (1024 * 1024)
#define SIZE (8 * MB)
struct _state {
int size;
char* a;
char* b;
};
void
initialize_bcopy(iter_t iterations, void* cookie) {
struct _state* state = (struct _state*)cookie;
if (!iterations) return;
state->a = malloc(state->size);
state->b = malloc(state->size);
if (state->a == NULL || state->b == NULL)
exit(1);
}
void
benchmark_bcopy(iter_t iterations, void* cookie) {
struct _state* state = (struct _state*)cookie;
while(iterations-- > 0)
bcopy(state->a, state->b, state->size);
}
void
cleanup_bcopy(iter_t iterations, void* cookie) {
struct _state* state = (struct _state*)cookie;
if (!iterations) return;
free(state->a);
free(state->b);
}
int
main(int argc, char *argv[])
{
struct _state state;
state.size = SIZE;
benchmp(initialize_bcopy, benchmark_bcopy, cleanup_bcopy,
0, 1, 0, TRIES, &state);
mb(get_n() * state.size);
exit(0);
}
A slightly more complex version of the bcopy benchmark
might measure bandwidth as a function of memory size and parallelism. The
main procedure in this case might look something like this:
- int
main(int argc, char *argv[])
{
int size, par;
struct _state state;
for (size = 64; size <= SIZE; size <<= 1) {
for (par = 1; par < 32; par <<= 1) {
state.size = size;
benchmp(initialize_bcopy, benchmark_bcopy,
cleanup_bcopy, 0, par, 0, TRIES, &state);
fprintf(stderr, d%d
mb(par * get_n() * state.size);
}
}
exit(0);
}
There are three environment variables that can be used to modify the
lmbench timing subsystem: ENOUGH, TIMING_O, and LOOP_O.
Development of lmbench is continuing.
lmbench(8), timing(3), reporting(3), results(3).
Carl Staelin and Larry McVoy
Comments, suggestions, and bug reports are always welcome.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |