|
|
| |
rwgroup(1) |
SiLK Tool Suite |
rwgroup(1) |
rwgroup - Tag similar SiLK records with a common next hop IP value
rwgroup
{--id-fields=KEY | --delta-field=FIELD --delta-value=DELTA}
[--objective] [--summarize] [--rec-threshold=THRESHOLD]
[--group-offset=IP]
[--note-add=TEXT] [--note-file-add=FILE] [--output-path=PATH]
[--copy-input=PATH] [--compression-method=COMP_METHOD]
[--site-config-file=FILENAME]
[--plugin=PLUGIN [--plugin=PLUGIN ...]]
[--python-file=PATH [--python-file=PATH ...]]
[--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[FILE]
rwgroup [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH ...] --help
rwgroup [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH ...] --help-fields
rwgroup --version
rwgroup reads sorted SiLK Flow records (c.f.
rwsort(1)) from the standard input or from a
single file name listed on the command line, marks records that form a
group with an identifier in the Next Hop IP field, and prints the
binary SiLK Flow records to the standard output. In some ways rwgroup
is similar to rwuniq(1), but rwgroup writes SiLK
flow records instead of textual output.
Two SiLK records are defined as being in the same group when the
fields specified in the --id-fields switch match exactly and when the
field listed in the --delta-field matches within the value given by
the --delta-value switch. Either --id-fields or
--delta-fields is required; both may be specified. A
--delta-value must be given when --delta-fields is
present.
The first group of records gets the identifier 0, and
rwgroup writes that value into each record's Next Hop IP field. The
ID for each subsequent group is incremented by 1. The --group-offset
switch may be used to set the identifier of the initial group.
The --rec-threshold switch may be used to only write groups
that contain a certain number of records. The --summarize switch
attempts to merge records in the same group to a single output record.
rwgroup requires that the records are sorted on the fields
listed in the --id-fields and --delta-fields switches. For
example, a call using
rwgroup --id-field=2 --delta-field=9 --delta-value=3
should read the output of
rwsort --field=2,9
otherwise the results are unpredictable.
Option names may be abbreviated if the abbreviation is unique or is an exact
match for an option. A parameter to an option may be specified as
--arg=param or --arg param, though the first form
is required for options that take optional parameters.
At least one value for --id-field or --delta-field
must be provided; rwgroup terminates with an error if no fields are
specified.
- --id-fields=KEY
- KEY contains the list of flow attributes (a.k.a. fields or columns)
that must match exactly for flows to be considered part of the same group.
Each field may be specified once only. KEY is a comma separated
list of field-names, field-integers, and ranges of field-integers; a range
is specified by separating the start and end of the range with a hyphen
(-). Field-names are case insensitive. Example:
--id-fields=stime,10,1-5
There is no default value for the --id-fields
switch.
The complete list of built-in fields that the SiLK tool suite
supports follows, though note that not all fields are present in all
SiLK file formats; when a field is not present, its value is 0.
- sIP,1
- source IP address
- dIP,2
- destination IP address
- sPort,3
- source port for TCP and UDP, or equivalent
- dPort,4
- destination port for TCP and UDP, or equivalent
- protocol,5
- IP protocol
- packets,pkts,6
- packet count
- bytes,7
- byte count
- flags,8
- bit-wise OR of TCP flags over all packets
- sTime,9
- starting time of flow (seconds resolution)
- duration,10
- duration of flow (seconds resolution)
- eTime,11
- end time of flow (seconds resolution)
- sensor,12
- name or ID of sensor at the collection point
- class,20
- class of sensor at the collection point
- type,21
- type of sensor at the collection point
- iType
- the ICMP type value for ICMP or ICMPv6 flows and zero for non-ICMP flows.
Internally, SiLK stores the ICMP type and code in the
"dPort" field, so there is no need have
both "dPort" and
"iType" or
"iCode" in the sort key. This field was
introduced in SiLK 3.8.1.
- iCode
- the ICMP code value for ICMP or ICMPv6 flows and zero for non-ICMP flows.
See note at "iType".
- icmpTypeCode,25
- equivalent to
"iType","iCode"
in --id-fields. This field may not be mixed with
"iType" or
"iCode", and this field is deprecated as
of SiLK 3.8.1. As of SiLK 3.8.1,
"icmpTypeCode" may no longer be used as
the argument to --delta-field; the
"dPort" field will provide an equivalent
result as long as the input is limited to ICMP flow records.
Many SiLK file formats do not store the following fields and their
values will always be 0; they are listed here for completeness:
- in,13
- router SNMP input interface or vlanId if packing tools were configured to
capture it (see sensor.conf(5))
- out,14
- router SNMP output interface or postVlanId
SiLK can store flows generated by enhanced collection software
that provides more information than NetFlow v5. These flows may support some
or all of these additional fields; for flows without this additional
information, the field's value is always 0.
- initialFlags,26
- TCP flags on first packet in the flow
- sessionFlags,27
- bit-wise OR of TCP flags over all packets except the first in the
flow
- attributes,28
- flow attributes set by the flow generator:
- "S"
- all the packets in this flow record are exactly the same size
- "F"
- flow generator saw additional packets in this flow following a packet with
a FIN flag (excluding ACK packets)
- "T"
- flow generator prematurely created a record for a long-running connection
due to a timeout. (When the flow generator yaf(1) is
run with the --silk switch, it will prematurely create a flow and
mark it with "T" if the byte count of
the flow cannot be stored in a 32-bit value.)
- "C"
- flow generator created this flow as a continuation of long-running
connection, where the previous flow for this connection met a timeout (or
a byte threshold in the case of yaf).
Consider a long-running ssh session that exceeds the flow
generator's active timeout. (This is the active timeout since the
flow generator creates a flow for a connection that still has activity). The
flow generator will create multiple flow records for this ssh session, each
spanning some portion of the total session. The first flow record will be
marked with a "T" indicating that it hit
the timeout. The second through next-to-last records will be marked with
"TC" indicating that this flow both timed
out and is a continuation of a flow that timed out. The final flow will be
marked with a "C", indicating that it was
created as a continuation of an active flow.
- application,29
- guess as to the content of the flow. Some software that generates flow
records from packet data, such as yaf, will inspect the contents of
the packets that make up a flow and use traffic signatures to label the
content of the flow. SiLK calls this label the application;
yaf refers to it as the appLabel. The application is the
port number that is traditionally used for that type of traffic (see the
/etc/services file on most UNIX systems). For example, traffic that
the flow generator recognizes as FTP will have a value of 21, even if that
traffic is being routed through the standard HTTP/web
port (80).
The following fields provide a way to label the IPs or ports on a
record. These fields require external files to provide the mapping from the
IP or port to the label:
- sType,16
- categorize the source IP address as
"non-routable",
"internal", or
"external" and group based on the
category. Uses the mapping file specified by the SILK_ADDRESS_TYPES
environment variable, or the address_types.pmap mapping file, as
described in addrtype(3).
- dType,17
- as sType for the destination IP address
- scc,18
- the country code of the source IP address. Uses the mapping file specified
by the SILK_COUNTRY_CODES environment variable, or the
country_codes.pmap mapping file, as described in
ccfilter (3).
- dcc,19
- as scc for the destination IP
- src-map-name
- label contained in the prefix map file associated with map-name. If
the prefix map is for IP addresses, the label is that associated with the
source IP address. If the prefix map is for protocol/port pairs, the label
is that associated with the protocol and source port. See also the
description of the --pmap-file switch below and the
pmapfilter(3) manual page.
- dst-map-name
- as src-map-name for the destination IP address
or the protocol and destination port.
- sval
- as src-map-name when no map-name is associated
with the prefix map file
- dval
- as dst-map-name when no map-name is associated
with the prefix map file
Finally, the list of built-in fields may be augmented by the
run-time loading of PySiLK code or plug-ins written in C (also called shared
object files or dynamic libraries), as described by the --python-file
and --plugin switches.
- --delta-field=FIELD
- Specify a single field that can differ by a specified delta-value among
the SiLK records that make up a group. The FIELD identifiers
include most of those specified for --id-fields. The exceptions are
that plug-in fields are not supported, nor are fields that do not have
numeric values (e.g., class, type, flags). The most common value for this
switch is "stime", which allows records
that are identical in the id-fields but temporally far apart to be
in different groups. The switch takes a single argument; multiple delta
fields cannot be specified. When this switch is specified, the
--delta-value switch is required.
- --delta-value=DELTA_VALUE
- Specify the acceptable difference between the values of the
--delta-field. The --delta-value switch is required when the
--delta-field switch is provided. For fields other than those
holding IPs, when two consecutive records have values less than or equal
to DELTA_VALUE, the records are considered members of the same
group. When the delta-field refers to an IP field, DELTA_VALUE is
the number of least significant bits of the IPs to remove
before comparing them. For example, when --delta-field=sIP
--delta-value=8 is specified, two records are the same group if their
source IPv4 addresses belong to the same /24 or if their source IPv6
addresses belong to the same /120. The --objective switch affects
the meaning of this switch.
- --objective
- Change the behavior of the --delta-value switch so that a record is
considered part of a group if the value of its --delta-field is
within the DELTA_VALUE of the first record in the group.
(When this switch is not specified, consecutive records are
compared.)
- --summarize
- Cause rwgroup to print (typically) a single record for each group.
By default, all records in each group having at least
--rec-threshold members is printed. When --summarize is
active, the record that is written for the group is the first record in
the group with the following modifications:
- The packets and bytes values are the sum of the packets and bytes values,
respectively, for all records in the group.
- The start-time value is the earliest start time for the records in the
group.
- The end-time value is the latest end time for the records in the
group.
- The flags and session-flags values are the bitwise-OR of all flags and
session-flags values, respectively, for the records in the group.
Note that multiple records for a group may be printed if the
bytes, packets, or elapsed time values are too large to be stored in a SiLK
flow record.
- --plugin=PLUGIN
- Augment the list of fields by using run-time loading of the plug-in
(shared object) whose path is PLUGIN. The switch may be repeated to
load multiple plug-ins. The creation of plug-ins is described in the
silk-plugin(3) manual page. When PLUGIN does
not contain a slash ("/"),
rwgroup will attempt to find a file named PLUGIN in the
directories listed in the "FILES" section. If rwgroup
finds the file, it uses that path. If PLUGIN contains a slash or if
rwgroup does not find the file, rwgroup relies on your
operating system's dlopen(3) call to find the file.
When the SILK_PLUGIN_DEBUG environment variable is non-empty,
rwgroup prints status messages to the standard error as it attempts
to find and open each of its plug-ins.
- --rec-threshold=THRESHOLD
- Specify the minimum number of SiLK records a group must contain before the
records in the group are written to the output stream. The default is 1;
i.e., write all records. The maximum threshold is 65535.
- --group-offset=IP
- Specify the value to write into the Next Hop IP for the records that
comprise the first group. The value IP may be an integer, or an
IPv4 or IPv6 address in the canonical presentation form. If not specified,
counting begins at 0. The value for each subsequent group is incremented
by 1.
- --note-add=TEXT
- Add the specified TEXT to the header of the output file as an
annotation. This switch may be repeated to add multiple annotations to a
file. To view the annotations, use the rwfileinfo(1)
tool.
- --note-file-add=FILENAME
- Open FILENAME and add the contents of that file to the header of
the output file as an annotation. This switch may be repeated to add
multiple annotations. Currently the application makes no effort to ensure
that FILENAME contains text; be careful that you do not attempt to
add a SiLK data file as an annotation.
- --copy-input=PATH
- Copy all binary SiLK Flow records read as input to the specified file or
named pipe. PATH may be "stdout"
or "-" to write flows to the standard
output as long as the --output-path switch is specified to redirect
rwgroup's output to a different location.
- --output-path=PATH
- Write the binary SiLK Flow records to PATH, where PATH is a
filename, a named pipe, the keyword
"stderr" to write the output to the
standard error, or the keyword "stdout"
or "-" to write the output to the
standard output. If PATH names an existing file, rwgroup
exits with an error unless the SILK_CLOBBER environment variable is set,
in which case PATH is overwritten. If this switch is not given, the
output is written to the standard output. Attempting to write the binary
output to a terminal causes rwgroup to exit with an error.
- --compression-method=COMP_METHOD
- Specify the compression library to use when writing output files. If this
switch is not given, the value in the SILK_COMPRESSION_METHOD environment
variable is used if the value names an available compression method. When
no compression method is specified, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the
default chosen when SiLK was compiled. The valid values for
COMP_METHOD are determined by which external libraries were found
when SiLK was compiled. To see the available compression methods and the
default method, use the --help or --version switch. SiLK can
support the following COMP_METHOD values when the required
libraries are available.
- none
- Do not compress the output using an external library.
- zlib
- Use the zlib(3) library for compressing the output,
and always compress the output regardless of the destination. Using zlib
produces the smallest output files at the cost of speed.
- lzo1x
- Use the lzo1x algorithm from the LZO real time compression library
for compression, and always compress the output regardless of the
destination. This compression provides good compression with less memory
and CPU overhead.
- snappy
- Use the snappy library for compression, and always compress the
output regardless of the destination. This compression provides good
compression with less memory and CPU overhead. Since SiLK
3.13.0.
- best
- Use lzo1x if available, otherwise use snappy if available, otherwise use
zlib if available. Only compress the output when writing to a file.
- --site-config-file=FILENAME
- Read the SiLK site configuration from the named file FILENAME. When
this switch is not provided, rwgroup searches for the site
configuration file in the locations specified in the "FILES"
section.
- --help
- Print the available options and exit. Specifying switches that add new
fields or additional switches before --help will allow the output
to include descriptions of those fields or switches.
- --help-fields
- Print the description and alias(es) of each field and exit. Specifying
switches that add new fields before --help-fields will allow the
output to include descriptions of those fields.
- --version
- Print the version number and information about how SiLK was configured,
then exit the application.
- --pmap-file=PATH
- --pmap-file=MAPNAME:PATH
- Load the prefix map file located at PATH and create fields named
src-map-name and dst-map-name where map-name is
either the MAPNAME part of the argument or the map-name specified
when the file was created (see rwpmapbuild(1)). If no
map-name is available, rwgroup names the fields
"sval" and
"dval". Specify PATH as
"-" or
"stdin" to read from the standard input.
The switch may be repeated to load multiple prefix map files, but each
prefix map must use a unique map-name. The --pmap-file switch(es)
must precede the --fields switch. See also
pmapfilter(3).
- --python-file=PATH
- When the SiLK Python plug-in is used, rwgroup reads the Python code
from the file PATH to define additional fields that can be used as
part of the group key. This file should call
register_field() for each field it wishes to define.
For details and examples, see the silkpython(3) and
pysilk(3) manual pages.
rwgroup requires sorted data. The application works by comparing
records in the order that the records are received (similar to the UNIX
uniq(1) command), odd orders will produce odd groupings.
In the following example, the dollar sign
("$") represents the shell prompt. The text
after the dollar sign represents the command line. Lines have been wrapped for
improved readability, and the back slash
("\") is used to indicate a wrapped line.
As a rule of thumb, the --id-fields and
--delta-field parameters should match
rwsort(1)'s call, with --delta-field being the
last parameter. A call to group all web traffic by queries from the same
addresses (field=2) within 10 seconds (field=9) of the first query from that
address will be:
$ rwfilter --proto=6 --dport=80 --pass=stdout \
| rwsort --field=2,9 \
| rwgroup --id-field=2 --delta-field=9 --delta-value=10 \
--objective
- PYTHONPATH
- This environment variable is used by Python to locate modules. When
--python-file is specified, rwgroup must load the Python
files that comprise the PySiLK package, such as silk/__init__.py.
If this silk/ directory is located outside Python's normal search
path (for example, in the SiLK installation tree), it may be necessary to
set or modify the PYTHONPATH environment variable to include the parent
directory of silk/ so that Python can find the PySiLK module.
- SILK_PYTHON_TRACEBACK
- When set, Python plug-ins will output traceback information on Python
errors to the standard error.
- SILK_COUNTRY_CODES
- This environment variable allows the user to specify the country code
mapping file that rwgroup uses when computing the scc and dcc
fields. The value may be a complete path or a file relative to the
SILK_PATH. See the "FILES" section for standard locations of
this file.
- SILK_ADDRESS_TYPES
- This environment variable allows the user to specify the address type
mapping file that rwgroup uses when computing the sType and dType
fields. The value may be a complete path or a file relative to the
SILK_PATH. See the "FILES" section for standard locations of
this file.
- SILK_CLOBBER
- The SiLK tools normally refuse to overwrite existing files. Setting
SILK_CLOBBER to a non-empty value removes this restriction.
- SILK_COMPRESSION_METHOD
- This environment variable is used as the value for
--compression-method when that switch is not provided. Since
SiLK 3.13.0.
- SILK_CONFIG_FILE
- This environment variable is used as the value for the
--site-config-file when that switch is not provided.
- SILK_DATA_ROOTDIR
- This environment variable specifies the root directory of data repository.
As described in the "FILES" section, rwgroup may use this
environment variable when searching for the SiLK site configuration
file.
- SILK_PATH
- This environment variable gives the root of the install tree. When
searching for configuration files and plug-ins, rwgroup may use
this environment variable. See the "FILES" section for
details.
- SILK_PLUGIN_DEBUG
- When set to 1, rwgroup prints status messages to the standard error
as it attempts to find and open each of its plug-ins. In addition, when an
attempt to register a field fails, rwgroup prints a message
specifying the additional function(s) that must be defined to register the
field in rwgroup. Be aware that the output can be rather
verbose.
- ${SILK_ADDRESS_TYPES}
- ${SILK_PATH}/share/silk/address_types.pmap
- ${SILK_PATH}/share/address_types.pmap
- /usr/local/share/silk/address_types.pmap
- /usr/local/share/address_types.pmap
- Possible locations for the address types mapping file required by the
sType and dType fields.
- ${SILK_CONFIG_FILE}
- ${SILK_DATA_ROOTDIR}/silk.conf
- /data/silk.conf
- ${SILK_PATH}/share/silk/silk.conf
- ${SILK_PATH}/share/silk.conf
- /usr/local/share/silk/silk.conf
- /usr/local/share/silk.conf
- Possible locations for the SiLK site configuration file which are checked
when the --site-config-file switch is not provided.
- ${SILK_COUNTRY_CODES}
- ${SILK_PATH}/share/silk/country_codes.pmap
- ${SILK_PATH}/share/country_codes.pmap
- /usr/local/share/silk/country_codes.pmap
- /usr/local/share/country_codes.pmap
- Possible locations for the country code mapping file required by the scc
and dcc fields.
- ${SILK_PATH}/lib64/silk/
- ${SILK_PATH}/lib64/
- ${SILK_PATH}/lib/silk/
- ${SILK_PATH}/lib/
- /usr/local/lib64/silk/
- /usr/local/lib64/
- /usr/local/lib/silk/
- /usr/local/lib/
- Directories that rwgroup checks when attempting to load a
plug-in.
rwfilter(1), rwfileinfo(1),
rwsort(1), rwuniq(1),
rwpmapbuild(1), addrtype(3),
ccfilter(3), pmapfilter(3),
pysilk(3), silkpython(3),
silk-plugin(3), sensor.conf(5),
uniq(1), silk(7),
yaf(1), dlopen(3),
zlib (3)
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |