|
|
| |
rwcount(1) |
SiLK Tool Suite |
rwcount(1) |
rwcount - Print traffic summary across time
rwcount [--bin-size=SIZE] [--load-scheme=LOADSCHEME]
[--start-time=START_TIME] [--end-time=END_TIME]
[--skip-zeroes] [--bin-slots] [--epoch-slots]
[--timestamp-format=FORMAT] [--no-titles]
[--no-columns] [--column-separator=CHAR]
[--no-final-delimiter] [{--delimited | --delimited=CHAR}]
[--print-filenames] [--copy-input=PATH] [--output-path=PATH]
[--pager=PAGER_PROG] [--site-config-file=FILENAME]
[{--legacy-timestamps | --legacy-timestamps={1,0}}]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwcount --help
rwcount --version
rwcount summarizes SiLK flow records across time. It counts the records
in the input stream, and groups their byte and packet totals into time bins.
rwcount produces textual output with one row for each bin.
rwcount reads SiLK Flow records from the files named on the
command line or from the standard input when no file names are specified and
--xargs is not present. To read the standard input in addition to the
named files, use "-" or
"stdin" as a file name. If an input file
name ends in ".gz", the file is
uncompressed as it is read. When the --xargs switch is provided,
rwcount reads the names of the files to process from the named text
file or from the standard input if no file name argument is provided to the
switch. The input to --xargs must contain one file name per line.
rwcount splits each flow record into bins whose size is
determined by the argument to the --bin-size switch. When that switch
is not provided, rwcount uses 30-second bins by default.
By default, the first row of data rwcount prints is the bin
containing the starting time of the earliest record that appears in the
input. rwcount then prints a row for every bin until it reaches the
bin containing the most recent ending time. Rows whose counts are zero are
printed unless the --skip-zero switch is specified.
The --start-time and --end-time switches tell
rwcount to use a specific time for the first row and the final row.
The --start-time switch always sets the time stamp on the first bin
to the specified time. With the --end-time switch, rwcount
computes a maximum end-time by setting any unspecified hour, minute, second,
and millisecond field to its maximum value, and the final bin is that which
contains the maximum end-time.
When --start-time and --end-time are both specified,
rwcount reserves the memory for the bins before it begins processing
the records. If the memory cannot be allocated, rwcount exits. If
this happens, try reducing the time span or increasing the bin-size.
A router or other flow generator summarizes the traffic it sees into records. In
addition to the five-tuple (source port and address, destination port and
address, and protocol), the record has its start time, end time, total byte
count, and total packet count. There is no way to know how the bytes and
packets were distributed during the duration of the record: their distribution
could be front-loaded, back-loaded, uniform, et cetera.
When the start and end times of a individual flow record put that
record into a single bin, rwcount can simply add that record's volume
(byte and packet counts) to the bin.
When the duration of a flow record causes it to span multiple
bins, rwcount must to told how to allocate the volume among the bins.
The --load-scheme switch determines this, and it has supports the
following allocation schemes:
- time-proportional
- Each bin a flow spans is allocated a percentage of the flow's volume
proportional to the amount of the flow's active time that spans the bin.
Specifically, rwcount divides the total volume of the flow by the
duration of the flow, and multiplies the quotient by the time spent in the
bin. This models a flow where the volume/second ratio is uniform
throughout the flow.
- bin-uniform
- Each bin a flow spans is allocated an equal portion of the flow's volume.
rwcount divides the volume of the flow by the number of bins the
flow spans, and adds the quotient to each of the bins. In this scheme, the
volume/bin ratio is uniform.
- start-spike
- The bin that contains the flow's start time is allocated all of the flow's
volume regardless of the flow's duration. rwcount adds the total
volume for the flow into the bin containing the start time of the flow.
This models a flow that is front-loaded to the point where the entire
volume is a single spike occurring in the initial millisecond of
flow.
- middle-spike
- The bin that contains the midpoint between the flow's start time and end
time is allocated all of the flow's volume regardless of the flow's
duration.
- end-spike
- The bin that contains the flow's end time is allocated all of the flow's
volume regardless of the flow's duration. This models a flow that is
back-loaded to the point where the entire volume is a single spike
occurring in final millisecond of the flow.
- maximum-volume
- Each bin the flow spans is allocated all of the flow's volume.
rwcount adds the entire volume for the flow into every bin
that contains any part of the flow. In theory, the distribution of the
bytes in the record could be a spike that occurs at any point during the
flow's duration. This scheme allows one to determine, in aggregate, the
maximum possible volume that could have occurred during this bin. In this
scheme, the "Records" column gives the
number of records that were active during the bin.
- minimum-volume
- For a record that spans multiple bins, each bin is allocated none
of the flow's volume. That is, rwcount acts as though the volume
for the flow occurred in some other bin. Since it is possible that a
record that spans multiple bins did not contribute any volume to the
current bin, this scheme allows one to determine, in aggregate, the
minimum possible volume that may have occurred during this bin. The
"Records" column in this scheme, as in
the "maximum-volume" scheme, gives the
number of flow records that were active during the bin.
Be aware that the "spike" load-schemes allocate the
entire flow to a single bin. This can create the impression that there is
more traffic occurring during a particular time window that the physical
network supports.
The "maximum-volume" and
"minimum-volume" schemes are used to
compute the maximum and minimum volumes that could have been transferred
during any one bin. "maximum-volume"
intentionally over-counts the flow volume and
"minimum-volume" intentionally
under-counts.
To see the effect of the various load-schemes, suppose
rwcount is using 60-second bins and the input contains two records.
The first record begins at 12:03:50, ends at 12:06:20, and contains 9,000
bytes (60 bytes/second for 150 seconds). This record may contribute to bins
at 12:03, 12:04, 12:05, and 12:06. The second record begins at 12:04:05 and
lasts 15 seconds; this record's volume always contributes its 200 bytes to
the 12:04 bin. The --load-scheme option splits the byte-counts of the
records as follows:
BIN 12:03:00 12:04:00 12:05:00 12:06:00
time-proportional 600 3800 3600 1200
bin-uniform 2250 2450 2250 2250
start-spike 9000 200 0 0
middle-spike 0 200 9000 0
end-spike 0 200 0 9000
maximum-volume 9000 9200 9000 9000
minimum-volume 0 200 0 0
For the record that spans multiple bins: the
"time-proportional" scheme assumes 60
bytes/second, the "bin-uniform" scheme
divides the volume evenly by the four bins, the
"middle-spike" scheme assumes all the
volume occurs at 12:05:05, the
"maximum-volume" scheme adds the volume to
every bin, and the "minimum-volume" scheme
ignores the record.
Option names may be abbreviated if the abbreviation is unique or is an exact
match for an option. A parameter to an option may be specified as
--arg=param or --arg param, though the
first form is required for options that take optional parameters.
- --bin-size=SIZE
- Denote the size of each time bin, in seconds; defaults to 30 seconds.
rwcount supports millisecond size bins; SIZE may be a
floating point value equal to or greater than than 0.001.
- --load-scheme=LOADSCHEME
- Specify how a flow record that spans multiple bins allocates its bytes and
packets among the bins. The default scheme is
"time-proportional", which assumes the
volume/second ratio of the flow record is constant. See the "Load
Scheme" section for additional information on the load-scheme
choices. The LOADSCHEME may be one of the following names or
numbers; names may be abbreviated to the shortest prefix that is
unique.
- time-proportional,4
- Allocate the volume in proportion to the amount of time the flow spent in
the bin.
- bin-uniform,0
- Allocate the volume evenly across the bins that contain any part of the
flow's duration.
- start-spike,1
- Allocate the entire volume to the bin containing the start time of the
flow.
- middle-spike,3
- Allocate the entire volume to the bin containing the time at the midpoint
of the flow.
- end-spike,2
- Allocate the entire volume to the bin containing the end time of the
flow.
- maximum-volume,5
- Allocate the entire volume to all of the bins containing any part
of the flow.
- minimum-volume,6
- Allocate the flow's volume to a bin only if the flow is completely
contained within the bin; otherwise ignore the flow.
- --start-time=START_TIME
- Set the time of the first bin to START_TIME. When this switch is
not given, the first bin is one that holds the starting time of the
earliest record. The START_TIME may be specified in a format of
"yyyy/mm/dd[:HH[:MM[:SS[.sss]]]]" (or
"T" may be used in place of
":" to separate the day and hour). The
time must be specified to at least day precision, and unspecified hour,
minute, second, and millisecond values are set to zero. Whether the date
strings represent times in UTC or the local timezone depend on how SiLK
was compiled, which can be determined from the
"Timezone support" setting in the output
from rwcount --version. Alternatively, the time may be specified as
seconds since the UNIX epoch, and an unspecified milliseconds value is set
to 0.
- --end-time=END_TIME
- Set the time of the final bin to END_TIME. When this switch is not
given, the final bin is one that holds the ending time of the latest
record. The format of END_TIME is the same as that for
START_TIME. Unspecified hour, minute, second, and millisecond
values are set to 23, 59, 59, and 999 respectively. When END_TIME
is specified as seconds since the UNIX epoch, an unspecified milliseconds
value is set to 999. When both --start-time and --end-time
are used, the END_TIME is adjusted so that the final bin represents
a complete interval.
- --skip-zeroes
- Disable printing of bins with no traffic. By default, all bins are
printed.
- --bin-slots
- Use the internal bin index as the label for each bin in the output; the
default is to label each bin with the time in a human-readable
format.
- --epoch-slots
- Use the UNIX epoch time (number of seconds since midnight UTC on
1970-01-01) as the label for each bin in the output; the default is to
label each bin with the time in a human-readable format. This switch is
equivalent to --timestamp-format=epoch. This switch is deprecated
as of SiLK 3.11.0, and it will be removed in the SiLK 4.0 release.
- --timestamp-format=FORMAT
- Specify the format and/or timezone to use when printing timestamps. When
this switch is not specified, the SILK_TIMESTAMP_FORMAT environment
variable is checked for a default format and/or timezone. If it is empty
or contains invalid values, timestamps are printed in the default format,
and the timezone is UTC unless SiLK was compiled with local timezone
support. FORMAT is a comma-separated list of a format and/or a
timezone. The format is one of:
- default
- Print the timestamps as
"YYYY/MM/DDThh:mm:ss".
- iso
- Print the timestamps as
"YYYY-MM-DD hh:mm:ss".
- m/d/y
- Print the timestamps as
"MM/DD/YYYY hh:mm:ss".
- epoch
- Print the timestamps as the number of seconds since 00:00:00 UTC on
1970-01-01.
When a timezone is specified, it is used regardless of the default
timezone support compiled into SiLK. The timezone is one of:
- utc
- Use Coordinated Universal Time to print timestamps.
- local
- Use the TZ environment variable or the local timezone.
- --no-titles
- Turn off column titles. By default, titles are printed.
- --no-columns
- Disable fixed-width columnar output.
- --column-separator=C
- Use specified character between columns and after the final column. When
this switch is not specified, the default of '|' is used.
- --no-final-delimiter
- Do not print the column separator after the final column. Normally a
delimiter is printed.
- --delimited
- --delimited=C
- Run as if --no-columns --no-final-delimiter
--column-sep=C had been specified. That is, disable
fixed-width columnar output; if character C is provided, it is used
as the delimiter between columns instead of the default '|'.
- --print-filenames
- Print to the standard error the names of input files as they are
opened.
- --copy-input=PATH
- Copy all binary SiLK Flow records read as input to the specified file or
named pipe. PATH may be "stdout"
or "-" to write flows to the standard
output as long as the --output-path switch is specified to redirect
rwcount's textual output to a different location.
- --output-path=PATH
- Write the textual output to PATH, where PATH is a filename,
a named pipe, the keyword "stderr" to
write the output to the standard error, or the keyword
"stdout" or
"-" to write the output to the standard
output (and bypass the paging program). If PATH names an existing
file, rwcount exits with an error unless the SILK_CLOBBER
environment variable is set, in which case PATH is overwritten. If
this switch is not given, the output is either sent to the pager or
written to the standard output.
- --pager=PAGER_PROG
- When output is to a terminal, invoke the program PAGER_PROG to view
the output one screen full at a time. This switch overrides the SILK_PAGER
environment variable, which in turn overrides the PAGER variable. If the
--output-path switch is given or if the value of the pager is
determined to be the empty string, no paging is performed and all output
is written to the terminal.
- --site-config-file=FILENAME
- Read the SiLK site configuration from the named file FILENAME. When
this switch is not provided, rwcount searches for the site
configuration file in the locations specified in the "FILES"
section.
- --legacy-timestamps
- --legacy-timestamps=NUM
- When NUM is not specified or is 1, this switch is equivalent to
--timestamp-format=m/d/y. Otherwise, the switch has no effect. This
switch is deprecated as of SiLK 3.0.0, and it will be removed in the SiLK
4.0 release.
- --xargs
- --xargs=FILENAME
- Read the names of the input files from FILENAME or from the
standard input if FILENAME is not provided. The input is expected
to have one filename per line. rwcount opens each named file in
turn and reads records from it as if the filenames had been listed on the
command line.
- --help
- Print the available options and exit.
- --version
- Print the version number and information about how SiLK was configured,
then exit the application.
- --start-epoch=START_TIME
- Alias the --start-time switch. This switch is deprecated as of SiLK
3.8.0.
- --end-epoch=START_TIME
- Alias the --end-time switch. This switch is deprecated as of SiLK
3.8.0.
In the following examples, the dollar sign
("$") represents the shell prompt. The text
after the dollar sign represents the command line. Lines have been wrapped for
improved readability, and the back slash
("\") is used to indicate a wrapped line.
To count all web traffic on Feb 12, 2009, into 1 hour bins:
$ rwfilter --pass=stdout --start-date=2009/02/12:00 \
--end-date=2009/02/12:23 --proto=6 --aport=80 \
| rwcount --bin-size=3600
Date| Records| Bytes| Packets|
2009/02/12T00:00:00| 1490.49| 578270918.16| 463951.55|
2009/02/12T01:00:00| 1459.33| 596455716.52| 457487.80|
2009/02/12T02:00:00| 1529.06| 562602842.44| 451456.41|
2009/02/12T03:00:00| 1503.89| 562683116.38| 455554.81|
2009/02/12T04:00:00| 1561.89| 590554569.78| 489273.81|
....
To bin the records according to their start times, use the
--load-scheme switch:
$ rwfilter ... --pass=stdout \
| rwcount --bin-size=3600 --load-scheme=1
Date| Records| Bytes| Packets|
2009/02/12T00:00:00| 1494.00| 580350969.00| 464952.00|
2009/02/12T01:00:00| 1462.00| 596145212.00| 457871.00|
2009/02/12T02:00:00| 1526.00| 561629416.00| 451088.00|
2009/02/12T03:00:00| 1502.00| 563500618.00| 455262.00|
2009/02/12T04:00:00| 1562.00| 589265818.00| 489279.00|
...
To bin the records by their end times:
$ rwfilter ... --pass=stdout \
| rwcount --bin-size=3600 --load-scheme=2
Date| Records| Bytes| Packets|
2009/02/12T00:00:00| 1488.00| 577132372.00| 463393.00|
2009/02/12T01:00:00| 1458.00| 596956697.00| 457376.00|
2009/02/12T02:00:00| 1530.00| 562806395.00| 451551.00|
2009/02/12T03:00:00| 1506.00| 562101791.00| 455671.00|
2009/02/12T04:00:00| 1562.00| 591408602.00| 489371.00|
...
To force the hourly bins to run from 30 minutes past the hour, use
the --start-time switch:
$ rwfilter ... --pass=stdout \
| rwcount --bin-size=3600 --start-time=2002/12/31:23:30
Date| Records| Bytes| Packets|
2009/02/12T00:30:00| 1483.26| 581251364.04| 456554.40|
2009/02/12T01:30:00| 1494.00| 575037453.00| 449280.00|
2009/02/12T02:30:00| 1486.36| 559700466.61| 447700.15|
2009/02/12T03:30:00| 1555.23| 588882400.58| 480724.48|
2009/02/12T04:30:00| 1537.79| 564756248.52| 472003.45|
...
- SILK_TIMESTAMP_FORMAT
- This environment variable is used as the value for
--timestamp-format when that switch is not provided. Since
SiLK 3.11.0.
- SILK_PAGER
- When set to a non-empty string, rwcount automatically invokes this
program to display its output a screen at a time. If set to an empty
string, rwcount does not automatically page its output.
- PAGER
- When set and SILK_PAGER is not set, rwcount automatically invokes
this program to display its output a screen at a time.
- SILK_CLOBBER
- The SiLK tools normally refuse to overwrite existing files. Setting
SILK_CLOBBER to a non-empty value removes this restriction.
- SILK_CONFIG_FILE
- This environment variable is used as the value for the
--site-config-file when that switch is not provided.
- SILK_DATA_ROOTDIR
- This environment variable specifies the root directory of data repository.
As described in the "FILES" section, rwcount may use this
environment variable when searching for the SiLK site configuration
file.
- SILK_PATH
- This environment variable gives the root of the install tree. When
searching for configuration files, rwcount may use this environment
variable. See the "FILES" section for details.
- TZ
- When the argument to the --timestamp-format switch includes
"local" or when a SiLK installation is
built to use the local timezone, the value of the TZ environment variable
determines the timezone in which rwcount displays timestamps. (If
both of those are false, the TZ environment variable is ignored.) If the
TZ environment variable is not set, the machine's default timezone is
used. Setting TZ to the empty string or 0 causes timestamps to be
displayed in UTC. For system information on the TZ variable, see
tzset(3) or environ(7). (To
determine if SiLK was built with support for the local timezone, check the
"Timezone support" value in the output
of rwcount --version.) The TZ environment variable is also used
when rwcount parses the timestamp specified in the
--start-time or --end-time switches if SiLK is built with
local timezone support.
- ${SILK_CONFIG_FILE}
- ${SILK_DATA_ROOTDIR}/silk.conf
- /data/silk.conf
- ${SILK_PATH}/share/silk/silk.conf
- ${SILK_PATH}/share/silk.conf
- /usr/local/share/silk/silk.conf
- /usr/local/share/silk.conf
- Possible locations for the SiLK site configuration file which are checked
when the --site-config-file switch is not provided.
rwfilter(1), rwuniq(1),
silk (7), tzset(3),
environ(7)
Unlike rwuniq(1), rwcount does not support counting
the number of distinct IPs in a bin. However, using the --bin-time
switch on rwuniq can provide time-based binning similar to what
rwcount supports. Note that rwuniq always bins by the each
record's start-time (similar to rwcount --load-factor=1), and there is
no support in rwuniq for dividing a SiLK record among multiple time
bins.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |