|
|
| |
rwcombine(1) |
SiLK Tool Suite |
rwcombine(1) |
rwcombine - Combine flows denoting a long-lived session into a single flow
rwcombine [--actions=ACTIONS] [--ignore-fields=FIELDS]
[--max-idle-time=NUM]
[{--print-statistics | --print-statistics=FILENAME}]
[--temp-directory=DIR_PATH] [--buffer-size=SIZE]
[--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD] [--print-filenames]
[--output-path=PATH] [--site-config-file=FILENAME]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwcombine --help
rwcombine --help-fields
rwcombine --version
rwcombine reads SiLK Flow records from one or more input sources,
searches for flow records where the attributes field denotes records
that were prematurely created or were continuations of prematurely created
flows, and attempts to combine those records into a single record. All the
unmodified SiLK records and the combined records are written to the file
specified by the --output-path switch or to the standard output when
the --output-path switch is not provided and the standard output is not
connected to a terminal.
Some flow exporters, such as yaf(1), provide
fields that describe characteristics about the flow record, and these
characteristics are stored in the attributes field of SiLK Flow
records. The two flags that rwcombine considers are:
- "T"
- The flow generator prematurely created a record for a long-lived session
due to the connection's lifetime reaching the active timeout
of the flow generator. (Also, when yaf is run with the
--silk switch, it prematurely creates a flow and marks it with
"T" if the byte count of the flow cannot
be stored in a 32-bit value.)
- "C"
- The flow generator created this flow as a continuation of long-running
connection, where the previous flow for this connection met a timeout.
(yaf only sets this flag when it is invoked with the --silk
switch.)
A very long-running session may be represented by multiple flow
records, where the first record is marked with the
"T" flag, the final record is marked with
the "C" flag, and intermediate records are
marked with both "C" (this record
continues an earlier flow) and "T" (this
record also met the active time-out). rwcombine attempts to combine
these multiple flow records into a single record.
The input to rwcombine does not need to be sorted. As part
of its processing, rwcombine may re-order the records before writing
them.
rwcombine reads SiLK Flow records from the files named on
the command line or from the standard input when no file names are specified
and --xargs is not present. To read the standard input in addition to
the named files, use "-" or
"stdin" as a file name. If an input file
name ends in ".gz", the file is
uncompressed as it is read. When the --xargs switch is provided,
rwcombine reads the names of the files to process from the named text
file or from the standard input if no file name argument is provided to the
switch. The input to --xargs must contain one file name per line.
The algorithm rwcombine uses to combine records is
- 1.
- rwcombine reads SiLK flow records, examines the attributes
field on each record, and immediately writes to the destination stream all
records where both the time-out flag
("T") and the continuation flag
("C") are not set. Records where one or
both of those flags are set are stored until all input records have been
read.
- 2.
- rwcombine groups the stored records into bins where the following
fields for each record in each bin are identical: sIP, dIP,
sPort, dPort, protocol, sensor, in,
out, nhIP, application, class, and
type.
- 3.
- For each bin, the records are stored by time (sTime and
elapsed).
- 4.
- Within a bin, rwcombine combines two records into a single record
when the attributes field of the first record has the
"T" (time-out) flag set and the second
record has the "C" (continuation) flag
set. When combining records, the bytes field and packets
fields are summed, the initialFlags from the first record is used,
the sessionFlags field becomes the bit-wise OR of both
sessionFlags fields and the second record's initialFlags
field, and the eTime is set to that of the second flow.
- 5.
- If the second record's "T" flag was set,
rwcombine checks to see if the third record's
"C" flag is set. If it is, the third
record becomes part of the new record.
- 6.
- The previous step repeats for the records in the bin until the bin
contains a single record, the most recently added record did not have the
"T" flag set, or the next record in the
bin does not have the "C" flag set.
- 7.
- After examining a bin, rwcombine writes the record(s) the bin
contains to the destination stream.
- 8.
- Steps 3 through 7 are repeated for each bin.
The --ignore-fields switch allows the user to remove fields
from the set that rwcombine uses when grouping records in Step 2.
When combining two records into one (Step 4), rwcombine
completely disregards the difference between the first record's end-time and
the second record's start-time (the idle time). To tell
rwcombine not to combine those records when the difference is greater
than a limit, specify that value as the argument to the
--max-idle-time switch.
To see information on the number of flows combined and the minimum
and maximum idle times, specify the --print-statistics switch.
During its processing, rwcombine will try to allocate a
large (near 2GB) in-memory array to hold the records. (You may use the
--buffer-size switch to change this maximum buffer size.) If more
records are read than will fit into memory, the in-core records are
temporarily stored on disk as described by the --temp-directory
switch. When all records have been read, the on-disk files are merged to
produce the output.
By default, the temporary files are stored in the /tmp
directory. Because the sizes of the temporary files may be large, it is
strongly recommended that /tmp not be used as the temporary
directory, and rwcombine will print a warning when /tmp is
used. To modify the temporary directory used by rwcombine, provide
the --temp-directory switch, set the SILK_TMPDIR environment
variable, or set the TMPDIR environment variable.
Option names may be abbreviated if the abbreviation is unique or is an exact
match for an option. A parameter to an option may be specified as
--arg=param or --arg param, though the first form
is required for options that take optional parameters.
- --actions=ACTIONS
- Select the type of action(s) that rwcombine should take to combine
the input records. The default action is
"all", and the following actions are
supported:
- all
- Perform all the actions described below.
- timeout
- Combine into a single flow record those records where the timeout flags in
the attributes field indicate that the flow exporter has divided a
long-lived session into multiple flow records.
This switch is provided for future expansion of rwcombine,
since at present rwcombine supports a single action. When writing a
script that uses rwcombine, specify --action=timeout for
compatibility with future versions of rwcombine.
- --ignore-fields=FIELDS
- Ignore the fields listed in FIELDS when determining if two flow
records should be grouped into the same bin; that is, treat FIELDS
as being identical across all flows. By default, rwcombine puts
records into a bin when the records have identical values for the
following fields: sIP, dIP, sPort, dPort, protocol, sensor, in, out, nhIP,
application, class, and type.
FIELDS is a comma separated list of field-names,
field-integers, and ranges of field-integers; a range is specified by
separating the start and end of the range with a hyphen (-).
Field-names are case-insensitive. Example:
--ignore-fields=sensor,12-15
The list of supported fields are:
- sIP,1
- source IP address
- dIP,2
- destination IP address
- sPort,3
- source port for TCP and UDP, or equivalent
- dPort,4
- destination port for TCP and UDP, or equivalent
- protocol,5
- IP protocol
- sensor,12
- name or ID of sensor at the collection point
- in,13
- router SNMP input interface or vlanId if packing tools were configured to
capture it (see sensor.conf(5))
- out,14
- router SNMP output interface or postVlanId
- nhIP,15
- router next hop IP
- class,20,type,21
- class and type of sensor at the collection point (represented internally
by a single value)
- application,29
- guess as to the content of the flow. Some software that generates flow
records from packet data, such as yaf(1), will
inspect the contents of the packets that make up a flow and use traffic
signatures to label the content of the flow. SiLK calls this label the
application; yaf refers to it as the appLabel. The
application is the port number that is traditionally used for that type of
traffic (see the /etc/services file on most UNIX systems). For
example, traffic that the flow generator recognizes as FTP will have a
value of 21, even if that traffic is being routed through the standard
HTTP/web port (80).
- --max-idle-time=NUM
- Do not combine flow records when the start time of the second flow record
begins NUM seconds after the end time of the first flow record.
NUM may be fractional. If not specified, the maximum idle time may
be considered infinite.
- --print-statistics
- --print-statistics=FILENAME
- Print to the standard error or to the specified FILENAME the number
of flows records read and written, the number of flows that did not
require combining, the number of flows combined, the number that could not
be combined, and minimum and maximum idle time between combined flow
records.
- --temp-directory=DIR_PATH
- Specify the name of the directory in which to store data files temporarily
when more records have been read that will fit into RAM. This switch
overrides the directory specified in the SILK_TMPDIR environment variable,
which overrides the directory specified in the TMPDIR variable, which
overrides the default, /tmp.
- --buffer-size=SIZE
- Set the maximum size of the buffer to use for holding the records, in
bytes. A larger buffer means fewer temporary files need to be created,
reducing the I/O wait times. The default maximum for this buffer is near
2GB. The SIZE may be given as an ordinary integer, or as a real
number followed by a suffix "K",
"M" or
"G", which represents the numerical
value multiplied by 1,024 (kilo), 1,048,576 (mega), and 1,073,741,824
(giga), respectively. For example, 1.5K represents 1,536 bytes, or one and
one-half kilobytes. (This value does not represent the absolute
maximum amount of RAM that rwcombine will allocate, since
additional buffers will be allocated for reading the input and writing the
output.)
- --output-path=PATH
- Write the binary SiLK Flow records to PATH, where PATH is a
filename, a named pipe, the keyword
"stderr" to write the output to the
standard error, or the keyword "stdout"
or "-" to write the output to the
standard output. If PATH names an existing file, rwcombine
exits with an error unless the SILK_CLOBBER environment variable is set,
in which case PATH is overwritten. If this switch is not given, the
output is written to the standard output. Attempting to write the binary
output to a terminal causes rwcombine to exit with an error.
- --note-add=TEXT
- Add the specified TEXT to the header of the output file as an
annotation. This switch may be repeated to add multiple annotations to a
file. To view the annotations, use the rwfileinfo(1)
tool.
- --note-file-add=FILENAME
- Open FILENAME and add the contents of that file to the header of
the output file as an annotation. This switch may be repeated to add
multiple annotations. Currently the application makes no effort to ensure
that FILENAME contains text; be careful that you do not attempt to
add a SiLK data file as an annotation.
- --compression-method=COMP_METHOD
- Specify the compression library to use when writing output files. If this
switch is not given, the value in the SILK_COMPRESSION_METHOD environment
variable is used if the value names an available compression method. When
no compression method is specified, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the
default chosen when SiLK was compiled. The valid values for
COMP_METHOD are determined by which external libraries were found
when SiLK was compiled. To see the available compression methods and the
default method, use the --help or --version switch. SiLK can
support the following COMP_METHOD values when the required
libraries are available.
- none
- Do not compress the output using an external library.
- zlib
- Use the zlib(3) library for compressing the output,
and always compress the output regardless of the destination. Using zlib
produces the smallest output files at the cost of speed.
- lzo1x
- Use the lzo1x algorithm from the LZO real time compression library
for compression, and always compress the output regardless of the
destination. This compression provides good compression with less memory
and CPU overhead.
- snappy
- Use the snappy library for compression, and always compress the
output regardless of the destination. This compression provides good
compression with less memory and CPU overhead. Since SiLK
3.13.0.
- best
- Use lzo1x if available, otherwise use snappy if available, otherwise use
zlib if available. Only compress the output when writing to a file.
- --print-filenames
- Print to the standard error the names of input files as they are
opened.
- --site-config-file=FILENAME
- Read the SiLK site configuration from the named file FILENAME. When
this switch is not provided, rwcombine searches for the site
configuration file in the locations specified in the "FILES"
section.
- --xargs
- --xargs=FILENAME
- Read the names of the input files from FILENAME or from the
standard input if FILENAME is not provided. The input is expected
to have one filename per line. rwcombine opens each named file in
turn and reads records from it as if the filenames had been listed on the
command line.
- --help
- Print the available options and exit.
- --help-fields
- Print the description and alias(es) of each field and exit.
- --version
- Print the version number and information about how SiLK was configured,
then exit the application.
In the following examples, the dollar sign
("$") represents the shell prompt. The text
after the dollar sign represents the command line. Lines have been wrapped for
improved readability, and the back slash
("\") is used to indicate a wrapped line.
Use rwfilter(1) to find ssh flow
records that involve the host 192.168.126.252. The output from
rwcut(1) shows the flow exporter split this long-lived
ssh session into multiple flow records:
$ rwfilter --saddr=192.168.126.252 --dport=22 --pass=- data.rw \
| rwcut --fields=flags,attributes,stime,etime
flags|attribut| sTime| eTime|
S PA |T |2009/02/13T00:29:59.563|2009/02/13T00:59:39.668|
PA |TC |2009/02/13T00:59:39.668|2009/02/13T01:29:19.478|
PA |TC |2009/02/13T01:29:19.478|2009/02/13T01:58:48.890|
PA |TC |2009/02/13T01:58:48.891|2009/02/13T02:28:43.599|
F PA | C |2009/02/13T02:28:43.600|2009/02/13T02:32:58.272|
Here is the other half of that conversation:
$ rwfilter --daddr=192.168.126.252 --sport=22 --pass=- data.rw \
| rwcut --fields=flags,attributes,stime,etime
flags|attribut| sTime| eTime|
S PA |T |2009/02/13T00:30:00.060|2009/02/13T00:59:39.667|
PA |TC |2009/02/13T00:59:39.670|2009/02/13T01:29:19.478|
PA |TC |2009/02/13T01:29:19.481|2009/02/13T01:58:48.890|
PA |TC |2009/02/13T01:58:48.893|2009/02/13T02:28:43.599|
F PA | C |2009/02/13T02:28:43.600|2009/02/13T02:32:58.271|
Use rwuniq(1) to compute the byte and packet
counts for that ssh session:
$ rwfilter --any-addr=192.168.126.252 --aport=22 --pass=- data.rw \
| rwuniq --fields=sip,dip,sport,dport --values=records,byte,packets
sIP| dIP|sPort|dPort|Records| Bytes|Packets|
10.11.156.107|192.168.126.252| 22|28975| 5|4677240| 3881|
192.168.126.252| 10.11.156.107|28975| 22| 5| 281939| 3891|
Invoke rwcombine on these records and store the result in
the file combined.rw:
$ rwfilter --any-addr=192.168.126.252 --aport=22 --pass=- data.rw \
| rwcombine --print-statistics --output-path=combined.rw
FLOW RECORD COUNTS:
Read: 10
Initially Complete: - 0 *
Sorted & Examined: = 10
Missing end: - 0 *
Missing start & end: - 0 *
Missing start: - 0 *
Prior to combining: = 10
Eliminated: - 8
Made complete: = 2 *
Written: 2 (sum of *)
IDLE TIMES:
Minimum: 0:00:00:00.000
Penultimate: 0:00:00:00.000
Maximum: 0:00:00:00.003
View the resulting records:
$ rwcut --fields=sip,dip,sport,dport,bytes,packets,flags combined.rw
sIP| dIP|sPort|dPort| bytes|packets| flags|
10.11.156.107|192.168.126.252| 22|28975|4677240| 3881|FS PA |
192.168.126.252| 10.11.156.107|28975| 22| 281939| 3891|FS PA |
$ rwcut --fields=sip,attributes,stime,etime combined.rw
sIP|attribut| sTime| eTime|
10.11.156.107| |2009/02/13T00:30:00.060|2009/02/13T02:32:58.271|
192.168.126.252| |2009/02/13T00:29:59.563|2009/02/13T02:32:58.272|
- SILK_TMPDIR
- When set and --temp-directory is not specified, rwcombine
writes the temporary files it creates to this directory. SILK_TMPDIR
overrides the value of TMPDIR.
- TMPDIR
- When set and SILK_TMPDIR is not set, rwcombine writes the temporary
files it creates to this directory.
- SILK_CLOBBER
- The SiLK tools normally refuse to overwrite existing files. Setting
SILK_CLOBBER to a non-empty value removes this restriction.
- SILK_COMPRESSION_METHOD
- This environment variable is used as the value for
--compression-method when that switch is not provided. Since
SiLK 3.13.0.
- SILK_CONFIG_FILE
- This environment variable is used as the value for the
--site-config-file when that switch is not provided.
- SILK_DATA_ROOTDIR
- This environment variable specifies the root directory of data repository.
As described in the "FILES" section, rwcombine may use
this environment variable when searching for the SiLK site configuration
file.
- SILK_PATH
- This environment variable gives the root of the install tree. When
searching for configuration files, rwcombine may use this
environment variable. See the "FILES" section for details.
- SILK_TEMPFILE_DEBUG
- When set to 1, rwcombine prints debugging messages to the standard
error as it creates, re-opens, and removes temporary files.
- ${SILK_CONFIG_FILE}
- ${SILK_DATA_ROOTDIR}/silk.conf
- /data/silk.conf
- ${SILK_PATH}/share/silk/silk.conf
- ${SILK_PATH}/share/silk.conf
- /usr/local/share/silk/silk.conf
- /usr/local/share/silk.conf
- Possible locations for the SiLK site configuration file which are checked
when the --site-config-file switch is not provided.
- ${SILK_TMPDIR}/
- ${TMPDIR}/
- /tmp/
- Directory in which to create temporary files.
rwfilter(1), rwcut(1),
rwuniq(1), rwfileinfo(1),
sensor.conf(5), silk(7),
yaf(1), zlib(3)
The first release of rwcombine occurred in SiLK 3.9.0.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |