NAME

PySiLK - Silk in Python

DESCRIPTION

This document describes the features of PySiLK, the SiLK Python extension. It documents the objects and methods that allow one to read, manipulate, and write SiLK Flow records, IPsets, Bags, and Prefix Maps (pmaps) from within python(1). PySiLK may be used in a stand-alone Python script or as a plug-in from within the SiLK tools rwfilter(1), rwcut(1), rwgroup(1), rwsort(1), rwstats(1), and rwuniq(1). This document describes the objects and methods that PySiLK provides; the details of using those from within a plug-in are documented in the silkpython(3) manual page.

The SiLK Python extension defines the following objects and modules:

IPAddr object: Represents an IP Address.
IPv4Addr object: Represents an IPv4 Address.
IPv6Addr object: Represents an IPv6 Address.
IPWildcard object: Represents CIDR blocks or SiLK IP wildcard addresses.
IPSet object: Represents a SiLK IPset.
PrefixMap object: Represents a SiLK Prefix Map.
Bag object: Represents a SiLK Bag.
TCPFlags object: Represents TCP flags.
RWRec object: Represents a SiLK Flow record.
SilkFile object: Represents a channel for writing to or reading from SiLK Flow files.
FGlob object: Allows retrieval of filenames in a SiLK data store. See also the silk.site module.
silk.site module: Defines several functions that relate to the SiLK site configuration and allow iteration over the files in a SiLK data store.
silk.plugin module: Defines functions that may only be used in SiLK Python plug-ins.

The SiLK Python extension provides the following functions:

silk.get_configuration(name=None): When name is None, return a dictionary whose keys specify aspects of how SiLK was compiled. When name is provided, return the dictionary value for that key, or None when name is an unknown key. The dictionary's keys and their meanings are:

COMPRESSION_METHODS: A list of strings specifying the compression methods that were compiled into this build of SiLK. The list will contain one or more of "NO_COMPRESSION", "ZLIB", "LZO1X", and/or "SNAPPY".
INITIAL_TCPFLAGS_ENABLED: True if SiLK was compiled with support for initial TCP flags; False otherwise.
IPV6_ENABLED: True if SiLK was compiled with IPv6 support; False otherwise.
SILK_VERSION: The version of SiLK linked with PySiLK, as a string.
TIMEZONE_SUPPORT: The string "UTC" if SiLK was compiled to use UTC, or the string "local" if SiLK was compiled to use the local timezone.

Since SiLK 3.8.1.

silk.ipv6_enabled(): Return True if SiLK was compiled with IPv6 support, False otherwise.
silk.initial_tcpflags_enabled(): Return True if SiLK was compiled with support for initial TCP flags, False otherwise.
silk.init_country_codes(filename=None): Initialize PySiLK's country code database. filename should be the path to a country code prefix map, as created by rwgeoip2ccmap (1). If filename is not supplied, SiLK will look first for the file specified by $SILK_COUNTRY_CODES, and then for a file named country_codes.pmap in $SILK_PATH/share/silk, $SILK_PATH/share, /usr/local/share/silk, and /usr/local/share. (The latter two assume that SiLK was installed in /usr/local.) Will throw a RuntimeError if loading the country code prefix map fails.
silk.silk_version(): Return the version of SiLK linked with PySiLK, as a string.

IPAddr Object

An IPAddr object represents an IPv4 or IPv6 address. These two types of addresses are represented by two subclasses of IPAddr: IPv4Addr and IPv6Addr.

class silk.IPAddr(address)

The constructor takes a string address, which must be a string representation of either an IPv4 or IPv6 address, or an IPAddr object. IPv6 addresses are only accepted if silk.ipv6_enabled() returns True. The IPAddr object that the constructor returns will be either an IPv4Addr object or an IPv6Addr object.

For compatibility with releases prior to SiLK 2.2.0, the IPAddr constructor will also accept an integer address, in which case it converts that integer to an IPv4Addr object. This behavior is deprecated. Use the IPv4Addr and IPv6Addr constructors instead.

Examples:

 >>> addr1 = IPAddr('192.160.1.1')
 >>> addr2 = IPAddr('2001:db8::1428:57ab')
 >>> addr3 = IPAddr('::ffff:12.34.56.78')
 >>> addr4 = IPAddr(addr1)
 >>> addr5 = IPAddr(addr2)
 >>> addr6 = IPAddr(0x10000000) # Deprecated as of SiLK 2.2.0

Supported operations and methods:

Inequality Operations: In all the below inequality operations, whenever an IPv4 address is compared to an IPv6 address, the IPv4 address is converted to an IPv6 address before comparison. This means that IPAddr("0.0.0.0") == IPAddr("::ffff:0.0.0.0").

addr1 == addr2: Return True if addr1 is equal to addr2; False otherwise.
addr1 != addr2: Return False if addr1 is equal to addr2; True otherwise.
addr1 < addr2: Return True if addr1 is less than addr2; False otherwise.
addr1 <= addr2: Return True if addr1 is less than or equal to addr2; False otherwise.
addr1 >= addr2: Return True if addr1 is greater than or equal to addr2; False otherwise.
addr1 > addr2: Return True if addr1 is greater than addr2; False otherwise.

addr.is_ipv6()

Return True if addr is an IPv6 address, False otherwise.

addr.isipv6()

(DEPRECATED in SiLK 2.2.0) An alias for is_ipv6().

addr.to_ipv6()

If addr is an IPv6Addr, return a copy of addr. Otherwise, return a new IPv6Addr mapping addr into the ::ffff:0:0/96 prefix.

addr.to_ipv4()

If addr is an IPv4Addr, return a copy of addr. If addr is in the ::ffff:0:0/96 prefix, return a new IPv4Addr containing the IPv4 address. Otherwise, return None.

int(addr)

Return the integer representation of addr. For an IPv4 address, this is a 32-bit number. For an IPv6 address, this is a 128-bit number.

str(addr)

Return a human-readable representation of addr in its canonical form.

addr.padded()

Return a human-readable representation of addr which is fully padded with zeroes. With IPv4, it will return a string of the form "xxx.xxx.xxx.xxx". With IPv6, it will return a string of the form "xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx".

addr.octets()

Return a tuple of integers representing the octets of addr. The tuple's length is 4 for an IPv4 address and 16 for an IPv6 address.

addr.mask(mask)

Return a copy of addr masked by the IPAddr mask.

When both addresses are either IPv4 or IPv6, applying the mask is straightforward.

If addr is IPv6 but mask is IPv4, mask is converted to IPv6 and then the mask is applied. This may result in an odd result.

If addr is IPv4 and mask is IPv6, addr will remain an IPv4 address if masking mask with "::ffff:0000:0000" results in "::ffff:0000:0000", (namely, if bytes 10 and 11 of mask are 0xFFFF). Otherwise, addr is converted to an IPv6 address and the mask is performed in IPv6 space, which may result in an odd result.

addr.mask_prefix(prefix)

Return a copy of addr masked by the high prefix bits. All bits below the prefixth bit will be set to zero. The maximum value for prefix is 32 for an IPv4Addr, and 128 for an IPv6Addr.

addr.country_code()

Return the two character country code associated with addr. If no country code is associated with addr, return None. The country code association is initialized by the silk.init_country_codes() function. If init_country_codes() is not called before calling this method, it will act as if init_country_codes() was called with no argument.

IPv4Addr Object

An IPv4Addr object represents an IPv4 address. IPv4Addr is a subclass of IPAddr, and supports all operations and methods that IPAddr supports.

class silk.IPv4Addr(address)

The constructor takes a string address, which must be a string representation of IPv4 address, an IPAddr object, or an integer. A string will be parsed as an IPv4 address. An IPv4Addr object will be copied. An IPv6Addr object will be converted to an IPv4 address, or throw a ValueError if the conversion is not possible. A 32-bit integer will be converted to an IPv4 address.

Examples:

 >>> addr1 = IPv4Addr('192.160.1.1')
 >>> addr2 = IPv4Addr(IPAddr('::ffff:12.34.56.78'))
 >>> addr3 = IPv4Addr(addr1)
 >>> addr4 = IPv4Addr(0x10000000)

IPv6Addr Object

An IPv6Addr object represents an IPv6 address. IPv6Addr is a subclass of IPAddr, and supports all operations and methods that IPAddr supports.

class silk.IPv6Addr(address)

The constructor takes a string address, which must be a string representation of either an IPv6 address, an IPAddr object, or an integer. A string will be parsed as an IPv6 address. An IPv6Addr object will be copied. An IPv4Addr object will be converted to an IPv6 address. A 128-bit integer will be converted to an IPv6 address.

Examples:

 >>> addr1 = IPAddr('2001:db8::1428:57ab')
 >>> addr2 = IPv6Addr(IPAddr('192.160.1.1'))
 >>> addr3 = IPv6Addr(addr1)
 >>> addr4 = IPv6Addr(0x100000000000000000000000)

IPWildcard Object

An IPWildcard object represents a range or block of IP addresses. The IPWildcard object handles iteration over IP addresses with for x in wildcard.

class silk.IPWildcard(wildcard)

The constructor takes a string representation wildcard of the wildcard address. The string wildcard can be an IP address, an IP with a CIDR notation, an integer, an integer with a CIDR designation, or an entry in SiLK wildcard notation. In SiLK wildcard notation, a wildcard is represented as an IP address in canonical form with each octet (IPv4) or hexadectet (IPv6) represented by one of following: a value, a range of values, a comma separated list of values and ranges, or the character 'x' used to represent the entire octet or hexadectet. IPv6 wildcard addresses are only accepted if silk.ipv6_enabled() returns True. The wildcard element can also be an IPWildcard, in which case a duplicate reference is returned.

Examples:

 >>> a = IPWildcard('1.2.3.0/24')
 >>> b = IPWildcard('ff80::/16')
 >>> c = IPWildcard('1.2.3.4')
 >>> d = IPWildcard('::ffff:0102:0304')
 >>> e = IPWildcard('16909056')
 >>> f = IPWildcard('16909056/24')
 >>> g = IPWildcard('1.2.3.x')
 >>> h = IPWildcard('1:2:3:4:5:6:7.x')
 >>> i = IPWildcard('1.2,3.4,5.6,7')
 >>> j = IPWildcard('1.2.3.0-255')
 >>> k = IPWildcard('::2-4')
 >>> l = IPWildcard('1-2:3-4:5-6:7-8:9-a:b-c:d-e:0-ffff')
 >>> m = IPWildcard(a)

Supported operations and methods:

addr in wildcard: Return True if addr is in wildcard, False otherwise.
addr not in wildcard: Return False if addr is in wildcard, True otherwise.
string in wildcard: Return the result of IPAddr(string) in wildcard.
string not in wildcard: Return the result of IPAddr(string) not in wildcard.
wildcard.is_ipv6(): Return True if wildcard contains IPv6 addresses, False otherwise.
str(wildcard): Return the string that was used to construct wildcard.

IPSet Object

An IPSet object represents a set of IP addresses, as produced by rwset(1) and rwsetbuild(1). The IPSet object handles iteration over IP addresses with for x in set, and iteration over CIDR blocks using for x in set. cidr_iter().

In the following documentation, and ip_iterable can be any of:

an IPAddr object representing an IP address
the string representation of a valid IP address
an IPWildcard object
the string representation of an IPWildcard
an iterable of any combination of the above
another IPSet object

class silk.IPSet([ip_iterable]): The constructor creates an empty IPset. If an ip_iterable is supplied as an argument, each member of ip_iterable will be added to the IPset.

Other constructors, all class methods:

silk.IPSet.load(path): Create an IPSet by reading a SiLK IPset file. path must be a valid location of an IPset.

Other class methods:

silk.IPSet.supports_ipv6(): Return whether this implementation of IPsets supports IPv6 addresses.

Supported operations and methods:

In the lists of operations and methods below,

set is an IPSet object
addr can be an IPAddr object or the string representation of an IP address.
set2 is an IPSet object. The operator versions of the methods require an IPSet object.
ip_iterable is an iterable over IP addresses as accepted by the IPSet constructor. Consider ip_iterable as creating a temporary IPSet to perform the requested method.

The following operations and methods do not modify the IPSet:

set.cardinality(): Return the cardinality of set.
len(set): Return the cardinality of set. In Python 2.x, this method will raise OverflowError if the number of IPs in the set cannot be represented by Python's Plain Integer type--that is, if the value is larger than "sys.maxint". The cardinality() method will not raise this exception.
set.is_ipv6(): Return True if set is a set of IPv6 addresses, and False if it a set of IPv4 addresses. For the purposes of this method, IPv4-in-IPv6 addresses (that is, addresses in the ::ffff:0:0/96 prefix) are considered IPv6 addresses.
addr in set: Return True if addr is a member of set; False otherwise.
addr not in set: Return False if addr is a member of set; True otherwise.
set.copy(): Return a new IPSet with a copy of set.
set.issubset(ip_iterable)
set <= set2: Return True if every IP address in set is also in set2. Return False otherwise.
set.issuperset(ip_iterable)
set >= set2: Return True if every IP address in set2 is also in set. Return False otherwise.
set.union(ip_iterable[, ...])
set | other | ...: Return a new IPset containing the IP addresses in set and all others.
set.intersection(ip_iterable[, ...])
set & other & ...: Return a new IPset containing the IP addresses common to set and others.
set.difference(ip_iterable[, ...])
set - other - ...: Return a new IPset containing the IP addresses in set but not in others.
set.symmetric_difference(ip_iterable)
set ^ other: Return a new IPset containing the IP addresses in either set or in other but not in both.
set.isdisjoint(ip_iterable): Return True when none of the IP addresses in ip_iterable are present in set. Return False otherwise.
set.cidr_iter(): Return an iterator over the CIDR blocks in set. Each iteration returns a 2-tuple, the first element of which is the first IP address in the block, the second of which is the prefix length of the block. Can be used as for (addr, prefix) in set.cidr_iter().
set.save(filename, compression=DEFAULT): Save the contents of set in the file filename. The compression determines the compression method used when outputting the file. Valid values are the same as those in silk.silkfile_open().

The following operations and methods will modify the IPSet:

set.add(addr): Add addr to set and return set. To add multiple IP addresses, use the add_range() or update() methods.
set.discard(addr): Remove addr from set if addr is present; do nothing if it is not. Return set. To discard multiple IP addresses, use the difference_update() method. See also the remove() method.
set.remove(addr): Similar to discard(), but raise KeyError if addr is not a member of set.
set.pop(): Remove and return an arbitrary address from set. Raise KeyError if set is empty.
set.clear(): Remove all IP addresses from set and return set.
set.convert(version): Convert set to an IPv4 IPset if version is 4 or to an IPv6 IPset if version is 6. Return set. Raise ValueError if version is not 4 or 6. If version is 4 and set contains IPv6 addresses outside of the ::ffff:0:0/96 prefix, raise ValueError and leave set unchanged.
set.add_range(start, end): Add all IP addresses between start and end, inclusive, to set. Raise ValueError if end is less than start.
set.update(ip_iterable[, ...])
set |= other | ...: Add the IP addresses specified in others to set; the result is the union of set and others.
set.intersection_update(ip_iterable[, ...])
set &= other & ...: Remove from set any IP address that does not appear in others; the result is the intersection of set and others.
set.difference_update(ip_iterable[, ...])
set -= other | ...: Remove from set any IP address found in others; the result is the difference of set and others.
set.symmetric_difference_update(ip_iterable)
set ^= other: Update set, keeping the IP addresses found in set or in other but not in both.

RWRec Object

An RWRec object represents a SiLK Flow record.

class silk.RWRec([rec],[field=value],...)

This constructor creates an empty RWRec object. If an RWRec rec is supplied, the constructor will create a copy of it. The variable rec can be a dictionary, such as that supplied by the as_dict() method. Initial values for record fields can be included.

Example:

 >>> recA = RWRec(input=10, output=20)
 >>> recB = RWRec(recA, output=30)
 >>> (recA.input, recA.output)
 (10, 20)
 >>> (recB.input, recB.output)
 (10, 30)

Instance attributes:

Accessing or setting attributes on an RWRec whose descriptions mention functions in the silk.site module causes the silk.site.init_site() function to be called with no argument if it has not yet been called successfully---that is, if silk.site.have_site_config() returns False.

rec.application: The service port of the flow rec as set by the flow meter if the meter supports it, a 16-bit integer. The yaf(1) flow meter refers to this value as the appLabel. The default application value is 0.
rec.bytes: The count of the number of bytes in the flow rec, a 32-bit integer. The default bytes value is 0.
rec.classname: (READ ONLY) The class name assigned to the flow rec, a string. This value is first member of the tuple returned by the "rec.classtype" attribute, which see.
rec.classtype: A 2-tuple containing the classname and the typename of the flow rec. Getting the value returns the result of silk.site.classtype_from_id(rec.classtype_id). If that function throws an error, the result is a 2-tuple containing the string "?" and a string representation of "rec.classtype_id". Setting the value to (class,type) sets rec.classtype_id to the result of silk.site.classtype_id(class,type). If that function throws an error because the (class,type) pair is unknown, rec is unchanged and ValueError is thrown.
rec.classtype_id: The ID for the class and type of the flow rec, an 8-bit integer. The default classtype_id value is 255. Changes to this value are reflected in the "rec.classtype" attribute. The classtype_id attribute may be set to a value that is considered invalid by the silk.site.
rec.dip: The destination IP of the flow rec, an IPAddr object. The default dip value is IPAddr('0.0.0.0'). May be set using a string containing a valid IP address.
rec.dport: The destination port of the flow rec, a 16-bit integer. The default dport value is 0. Since the destination port field is also used to store the values for the ICMP type and code, setting this value may modify rec.icmptype and rec.icmpcode.
rec.duration: The duration of the flow rec, a datetime.timedelta object. The default duration value is 0. Changing the rec.duration attribute will modify the rec.etime attribute such that (rec.etime - rec.stime) == the new rec.duration. The maximum possible duration is datetime.timedelta(milliseconds=0xffffffff). See also rec.duration_secs.
rec.duration_secs: The duration of the flow rec in seconds, a float that includes fractional seconds. The default duration_secs value is 0. Changing the rec.duration_secs attribute will modify the rec.etime attribute in the same way as changing rec.duration. The maximum possible duration_secs value is 4294967.295.
rec.etime: The end time of the flow rec, a datetime.datetime object. The default etime value is the UNIX epoch time, datetime.datetime(1970,1,1,0,0). Changing the rec.etime attribute modifies the flow record's duration. If the new duration would become negative or would become larger than RWRec supports, a ValueError will be raised. See also rec.etime_epoch_secs.
rec.etime_epoch_secs: The end time of the flow rec as a number of seconds since the epoch time, a float that includes fractional seconds. Epoch time is 1970-01-01 00:00:00 UTC. The default etime_epoch_secs value is 0. Changing the rec.etime_epoch_secs attribute modifies the flow record's duration. If the new duration would become negative or would become larger than RWRec supports, a ValueError will be raised.
rec.initial_tcpflags: The TCP flags on the first packet of the flow rec, a TCPFlags object. The default initial_tcpflags value is None. The rec.initial_tcpflags attribute may be set to a new TCPFlags object, or a string or number which can be converted to a TCPFlags object by the TCPFlags() constructor. Setting rec.initial_tcpflags when rec.session_tcpflags is None sets the latter to TCPFlags(''). Setting rec.initial_tcpflags or rec.session_tcpflags sets rec.tcpflags to the binary OR of their values. Trying to set rec.initial_tcpflags when rec.protocol is not 6 (TCP) will raise an AttributeError.
rec.icmpcode: The ICMP code of the flow rec, an 8-bit integer. The default icmpcode value is 0. The value is only meaningful when rec.protocol is ICMP (1) or when rec.is_ipv6() is True and rec.protocol is ICMPv6 (58). Since a record's ICMP type and code are stored in the destination port, setting this value may modify rec.dport.
rec.icmptype: The ICMP type of the flow rec, an 8-bit integer. The default icmptype value is 0. The value is only meaningful when rec.protocol is ICMP (1) or when rec.is_ipv6() is True and rec.protocol is ICMPv6 (58). Since a record's ICMP type and code are stored in the destination port, setting this value may modify rec.dport.
rec.input: The SNMP interface where the flow rec entered the router or the vlanId if the packing tools are configured to capture it (see sensor.conf(5)), a 16-bit integer. The default input value is 0.
rec.nhip: The next-hop IP of the flow rec as set by the router, an IPAddr object. The default nhip value is IPAddr('0.0.0.0'). May be set using a string containing a valid IP address.
rec.output: The SNMP interface where the flow rec exited the router or the postVlanId if the packing tools are configured to capture it (see sensor.conf(5)), a 16-bit integer. The default output value is 0.
rec.packets: The packet count for the flow rec, a 32-bit integer. The default packets value is 0.
rec.protocol: The IP protocol of the flow rec, an 8-bit integer. The default protocol value is 0. Setting rec.protocol to a value other than 6 (TCP) causes rec.initial_tcpflags and rec.session_tcpflags to be set to None.
rec.sensor: The name of the sensor where the flow rec was collected, a string. Getting the value returns the result of silk.site.sensor_from_id(rec.sensor_id). If that function throws an error, the result is a string representation of "rec.sensor_id" or the string "?" when sensor_id is 65535. Setting the value to sensor_name sets rec.sensor_id to the result of silk.site.sensor_id(sensor_name). If that function throws an error because sensor_name is unknown, rec is unchanged and ValueError is thrown.
rec.sensor_id: The ID of the sensor where the flow rec was collected, a 16-bit integer. The default sensor_id value is 65535. Changes to this value are reflected in the "rec.sensor" attribute. The sensor_id attribute may be set to a value that is considered invalid by silk.site.
rec.session_tcpflags: The union of the flags of all but the first packet in the flow rec, a TCPFlags object. The default session_tcpflags value is None. The rec.session_tcpflags attribute may be set to a new TCPFlags object, or a string or number which can be converted to a TCPFlags object by the TCPFlags() constructor. Setting rec.session_tcpflags when rec.initial_tcpflags is None sets the latter to TCPFlags(''). Setting rec.initial_tcpflags or rec.session_tcpflags sets rec.tcpflags to the binary OR of their values. Trying to set rec.session_tcpflags when rec.protocol is not 6 (TCP) will raise an AttributeError.
rec.sip: The source IP of the flow rec, an IPAddr object. The default sip value is IPAddr('0.0.0.0'). May be set using a string containing a valid IP address.
rec.sport: The source port of the flow rec, an integer. The default sport value is 0.
rec.stime: The start time of the flow rec, a datetime.datetime object. The default stime value is the UNIX epoch time, datetime.datetime(1970,1,1,0,0). Modifying the rec.stime attribute will modify the flow's end time such that rec.duration is constant. The maximum possible stime is 2038-01-19 03:14:07 UTC. See also rec.etime_epoch_secs.
rec.stime_epoch_secs: The start time of the flow rec as a number of seconds since the epoch time, a float that includes fractional seconds. Epoch time is 1970-01-01 00:00:00 UTC. The default stime_epoch_secs value is 0. Changing the rec.stime_epoch_secs attribute will modify the flow's end time such that rec.duration is constant. The maximum possible stime_epoch_secs is 2147483647 (2^31-1).
rec.tcpflags: The union of the TCP flags of all packets in the flow rec, a TCPFlags object. The default tcpflags value is TCPFlags(' '). The rec.tcpflags attribute may be set to a new TCPFlags object, or a string or number which can be converted to a TCPFlags object by the TCPFlags() constructor. Setting rec.tcpflags sets rec.initial_tcpflags and rec.session_tcpflags to None. Setting rec.initial_tcpflags or rec.session_tcpflags changes rec.tcpflags to the binary OR of their values.
rec.timeout_killed: Whether the flow rec was closed early due to timeout by the collector, a boolean. The default timeout_killed value is False.
rec.timeout_started: Whether the flow rec is a continuation from a timed-out flow, a boolean. The default timeout_started value is False.
rec.typename: (READ ONLY) The type name of the flow rec, a string. This value is second member of the tuple returned by the "rec.classtype" attribute, which see.
rec.uniform_packets: Whether the flow rec contained only packets of the same size, a boolean. The default uniform_packets value is False.

Supported operations and methods:

rec.is_icmp(): Return True if the protocol of rec is 1 (ICMP) or if the protocol of rec is 58 (ICMPv6) and rec.is_ipv6() is True. Return False otherwise.
rec.is_ipv6(): Return True if rec contains IPv6 addresses, False otherwise.
rec.is_web(): Return True if rec can be represented as a web record, False otherwise. A record can be represented as a web record if the protocol is TCP (6) and either the source or destination port is one of 80, 443, or 8080.
rec.as_dict(): Return a dictionary representing the contents of rec. Implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False.
rec.to_ipv4(): Return a new copy of rec with the IP addresses (sip, dip, and nhip) converted to IPv4. If any of these addresses cannot be converted to IPv4, (that is, if any address is not in the ::ffff:0:0/96 prefix) return None.
rec.to_ipv6(): Return a new copy of rec with the IP addresses (sip, dip, and nhip) converted to IPv6. Specifically, the function maps the IPv4 addresses into the ::ffff:0:0/96 prefix.
str(rec): Return the string representation of rec.as_dict().
rec1 == rec2: Return True if rec1 is structurally equivalent to rec2. Return False otherwise.
rec1 != rec2: Return True if rec1 is not structurally equivalent to rec2 Return False otherwise.

SilkFile Object

A SilkFile object represents a channel for writing to or reading from SiLK Flow files. A SiLK file open for reading can be iterated over using for rec in file.

Creation functions:

silk.silkfile_open(filename, mode, compression=DEFAULT, notes=[], invocations=[]): This function takes a filename, a mode, and a set of optional keyword parameters. It returns a SilkFile object. The mode should be one of the following constant values:

silk.READ: Open file for reading
silk.WRITE: Open file for writing
silk.APPEND: Open file for appending

The filename should be the path to the file to open. A few filenames are treated specially. The filename stdin maps to the standard input stream when the mode is READ. The filenames stdout and stderr map to the standard output and standard error streams respectively when the mode is WRITE. A filename consisting of a single hyphen (-) maps to the standard input if the mode is READ, and to the standard output if the mode is WRITE.

The compression parameter may be one of the following constants. (This list assumes SiLK was built with the required libraries. To check which compression methods are available at your site, see silk.get_configuration("COMPRESSION_METHODS")).

silk.DEFAULT: Use the default compression scheme compiled into SiLK.
silk.NO_COMPRESSION: Use no compression.
silk.ZLIB: Use zlib block compression (as used by gzip(1)).
silk.LZO1X: Use lzo1x block compression.
silk.SNAPPY: Use snappy block compression.

If notes or invocations are set, they should be list of strings. These add annotation and invocation headers to the file. These values are visible by the rwfileinfo(1) program.

Examples:

 >>> myinputfile = silkfile_open('/path/to/file', READ)
 >>> myoutputfile = silkfile_open('/path/to/file', WRITE,
                                  compression=LZO1X,
                                  notes=['My output file',
                                         'another annotation'])

silk.silkfile_fdopen(fileno, mode, filename=None, compression=DEFAULT, notes=[], invocations=[]): This function takes an integer file descriptor, a mode, and a set of optional keyword parameters. It returns a SilkFile object. The filename parameter is used to set the value of the name attribute of the resulting object. All other parameters work as described in the silk.silkfile_open() function.

Deprecated constructor:

class silk.SilkFile(filename, mode, compression=DEFAULT, notes=[], invocations=[]): This constructor creates a SilkFile object. The parameters are identical to those used by the silkfile_open() function. This constructor is deprecated as of SiLK 3.0.0. For future compatibility, please use the silkfile_open() function instead of the SilkFile() constructor to create SilkFile objects.

Instance attributes:

file.name: The filename that was used to create file.
file.mode: The mode that was used to create file. Valid values are READ, WRITE, or APPEND.

Instance methods:

file.read(): Return an RWRec representing the next record in the SilkFile file. If there are no records left in the file, return None.
file.write(rec): Write the RWRec rec to the SilkFile file. Return None.
file.next(): A SilkFile object is its own iterator. For example, iter(file) returns file. When the SilkFile is used as an iterator, the next() method is called repeatedly. This method returns the next record, or raises StopIteration once the end of file is reached
file.skip(count): Skip the next count records in file and return the number of records skipped. If the return value is less than count, the end of the file has been reached. At end of file, return 0. Since SiLK 3.19.1.
file.notes(): Return the list of annotation headers for the file as a list of strings.
file.invocations(): Return the list of invocation headers for the file as a list of strings.
file.close(): Close the file and return None.

PrefixMap Object

A PrefixMap object represents an immutable mapping from IP addresses or protocol/port pairs to labels. PrefixMap objects are created from SiLK prefix map files as created by rwpmapbuild(1).

class silk.PrefixMap(filename): The constructor creates a prefix map initialized from the filename. The PrefixMap object will be of one of the two subtypes of PrefixMap: an AddressPrefixMap or a ProtoPortPrefixMap.

Supported operations and methods:

pmap[key]: Return the string label associated with key in pmap. key must be of the correct type: either an IPAddr if pmap is an AddressPrefixMap, or a 2-tuple of integers (protocol, port), if pmap is a ProtoPortPrefixMap. The method raises TypeError when the type of the key is incorrect.
pmap.get(key, default=None): Return the string label associated with key in pmap. Return the value default if key is not in pmap, or if key is of the wrong type or value to be a key for pmap.
pmap.values(): Return a tuple of the labels defined by the PrefixMap pmap.
pmap.iterranges(): Return an iterator that will iterate over ranges of contiguous values with the same label. The return values of the iterator will be the 3-tuple (start, end, label), where start is the first element of the range, end is the last element of the range, and label is the label for that range.

Bag Object

A Bag object is a representation of a multiset. Each key represents a potential element in the set, and the key's value represents the number of times that key is in the set. As such, it is also a reasonable representation of a mapping from keys to integers.

Please note, however, that despite its set-like properties, Bag objects are not nearly as efficient as IPSet objects when representing large contiguous ranges of key data.

In PySiLK, the Bag object is designed to look and act similar to Python dictionary objects, and in many cases Bags and dicts can be used interchangeably. There are differences, however, the primary of which is that bag[key] returns a value for all values in the key range of the bag. That value will be an integer zero for all key values that have not been incremented.

class silk.Bag(mapping=None, key_type=None, key_len=None, counter_type=None, counter_len=None): The constructor creates a bag. All arguments are optional, and can be used as keyword arguments.
If mapping is included, the bag is initialized from that mapping. Valid mappings are:

a Bag
a key/value dictionary
an iterable of key/value pairs

The key_type and key_len arguments describe the key field of the bag. The key_type should be a string from the list of valid types below. The key_len should be an integer describing the number of bytes that will represent values of key_type. The key_type argument is case-insensitive.

If key_type is not specified, it defaults to 'any-ipv6', unless silk.ipv6_enabled() is False, in which case the default is 'any-ipv4'. The one exception to this is when key_type is not specified, but key_len is specified with a value of less than 16. In this case, the default type is 'custom'.

Note: Key types that specify IPv6 addresses are not valid if silk.ipv6_enabled() returns False. An error will be thrown if they are used in this case.

If key_len is not specified, it defaults to the default number of bytes for the given key_type (which can be determined by the chart below). If specified, key_len must be one of the following integers: 1, 2, 4, 16.

The counter_type and counter_len arguments describe the counter value of the bag. The counter_type should be a string from the list of valid types below. The counter_len should be an integer describing the number of bytes that will represent valid of counter_type. The counter_type argument is case insensitive.

If counter_type is not specified, it defaults to 'custom'.

If counter_len is not specified, it defaults to 8. Currently, 8 is the only valid value of counter_len.

Here is the list of valid key and counter types, along with their default key_len values:

'sIPv4', 4
'dIPv4', 4
'sPort', 2
'dPort', 2
'protocol', 1
'packets', 4
'bytes', 4
'flags', 1
'sTime', 4
'duration', 4
'eTime', 4
'sensor', 2
'input', 2
'output', 2
'nhIPv4', 4
'initialFlags', 1
'sessionFlags', 1
'attributes', 1
'application', 2
'class', 1
'type', 1
'icmpTypeCode', 2
'sIPv6', 16
'dIPv6', 16
'nhIPv6', 16
'records', 4
'sum-packets', 4
'sum-bytes', 4
'sum-duration', 4
'any-ipv4', 4
'any-ipv6', 16
'any-port', 2
'any-snmp', 2
'any-time', 4
'custom', 4

Deprecation Notice: For compatibility with SiLK 2.x, the key_type argument may be a Python class. An object of the key_type class must be constructable from an integer, and it must possess an __int__() method which retrieves that integer from the object. Regardless of the maximum integer value supported by the key_type class, internally the bag will store the keys as type 'custom' with length 4.

Other constructors, all class methods:

silk.Bag.ipaddr(mapping, counter_type=None, counter_len=None): Creates a Bag using 'any-ipv6' as the key type (or 'any-ipv4' if silk.ipv6_enabled() is False). counter_type and counter_len are used as in the standard Bag constructor. Equivalent to Bag(mapping).
silk.Bag.integer(mapping, key_len=None, counter_type=None, counter_len=None): Creates a Bag using 'custom' as the key_type (integer bag). key_len, counter_type, and counter_len are used as in the standard Bag constructor. Equivalent to Bag(mapping, key_type='custom').
silk.Bag.load(path, key_type=None): Creates a Bag by reading a SiLK bag file. path must be a valid location of a bag. When present, the key_type argument is used as in the Bag constructor, ignoring the key type specified in the bag file. When key_type is not provided and the bag file does not contain type information, the key is set to 'custom' with a length of 4.
silk.Bag.load_ipaddr(path): Creates an IP address bag from a SiLK bag file. Equivalent to Bag.load( path, key_type = IPv4Addr). This constructor is deprecated as of SiLK 3.2.0.
silk.Bag.load_integer(path): Creates an integer bag from a SiLK bag file. Equivalent to Bag.load( path, key_type = int). This constructor is deprecated as of SiLK 3.2.0.

Constants:

silk.BAG_COUNTER_MAX: This constant contains the maximum possible value for Bag counters.

Other class methods:

silk.Bag.field_types(): Returns a tuple of strings which are valid key_type or counter_type values.
silk.Bag.type_merge(type_a, type_b): Given two types from Bag.field_types(), returns the type that would be given (by default) to a bag that is a result of the co-mingling of two bags of the given types. For example: Bag.type_merge('sport','dport') == 'any-port'.

Supported operations and methods:

In the lists of operations and methods below,

bag and bag2 are Bag objects
key and key2 are IPAddrs for bags that contain IP addresses, or integers for other bags
value and value2 are integers which represent the counter associated a key in the bag
ipset is an IPSet object
ipwildcard is an IPWildcard object

The following operations and methods do not modify the Bag:

bag.get_info(): Return information about the keys and counters of the bag. The return value is a dictionary with the following keys and values:

'key_type': The current key type, as a string.
'key_len': The current key length in bytes.
'counter_type': The current counter type, as a string.
'counter_len': The current counter length in bytes.

The keys have the same names as the keyword arguments to the bag constructor. As a result, a bag with the same key and value information as an existing bag can be generated by using the following idiom: Bag(**bag.get_info()).

bag.copy(): Return a new Bag which is a copy of bag.
bag[key]: Return the counter value associated with key in bag.
bag[key:key2] or bag[key,key2,...]: Return a new Bag which contains only the elements in the key range [key, key2), or a new Bag containing only the given elements in the comma-separated list. In point of fact, the argument(s) in brackets can be any number of comma separated keys or key ranges. For example: bag[1,5,15:18,20] will return a bag which contains the elements 1, 5, 15, 16, 17, and 20 from bag.
bag[ipset]: Return a new Bag which contains only elements in bag that are also contained in ipset. This is only valid for IP address bags. The ipset can be included as part of a comma-separated list of slices, as above.
bag[ipwildcard]: Return a new Bag which contains only elements that are also contained in ipwildcard. This is only valid for IP address bags. The ipwildcard can be included as part of a comma-separated list of slices, as above.
key in bag: Return True if bag[key] is non-zero, False otherwise.
bag.get(key, default=None): Return bag[key] if key is in bag, otherwise return default.
bag.items(): Return a list of (key, value ) pairs for all keys in bag with non-zero values. This list is not guaranteed to be sorted in any order.
bag.iteritems(): Return an iterator over (key, value) pairs for all keys in bag with non-zero values. This iterator is not guaranteed to iterate over items in any order.
bag.sorted_iter(): Return an iterator over (key, value) pairs for all keys in bag with non-zero values. This iterator is guaranteed to iterate over items in key-sorted order.
bag.keys(): Return a list of keys for all keys in bag with non-zero values. This list is guaranteed to be in key-sorted order.
bag.iterkeys(): Return an iterkeys over keys for all keys in bag with non-zero values. This iterator is not guaranteed to iterate over keys in any order.
bag.values(): Return a list of values for all keys in bag with non-zero values. The list is guaranteed to be in key-sorted order.
bag.itervalues(): Return an iterator over values for all keys in bag with non-zero values. This iterator is not guaranteed iterate over values in any order, but the order is consistent with that returned by iterkeys().
bag.group_iterator(bag2): Return an iterator over keys and values of a pair of Bags. For each key which is in either bag or bag2, this iterator will return a (key, value, value2) triple, where value is bag.get(key), and value2 is bag.get(key). This iterator is guaranteed to iterate over triples in key order.
bag + bag2: Add two bags together. Return a new Bag for which newbag[key] = bag[key] + bag2[key] for all keys in bag and bag2. Will raise an OverflowError if the resulting value for a key is greater than BAG_COUNTER_MAX. If the two bags are of different types, the resulting bag will be of a type determined by Bag.type_merge() .
bag - bag2: Subtract two bags. Return a new Bag for which newbag [key] = bag[ key] - bag2[key] for all keys in bag and bag2, as long as the resulting value for that key would be non-negative. If the resulting value for a key would be negative, the value of that key will be zero. If the two bags are of different types, the resulting bag will be of a type determined by Bag. type_merge().
bag.min(bag2): Return a new Bag for which newbag[key] = min(bag[key], bag2[key]) for all keys in bag and bag2.
bag.max(bag2): Return a new Bag for which newbag[key] = max(bag[key], bag2[key]) for all keys in bag and bag2.
bag.div(bag2): Divide two bags. Return a new Bag for which newbag [key] = bag[ key] / bag2[key]) rounded to the nearest integer for all keys in bag and bag2, as long as bag2[key] is non-zero. newbag [key] = 0 when bag2[key] is zero. If the two bags are of different types, the resulting bag will be of a type determined by Bag.type_merge().
bag * integer
integer * bag: Multiple a bag by a scalar. Return a new Bag for which newbag[key] = bag[key] * integer for all keys in bag.
bag.intersect(set_like): Return a new Bag which contains bag[key] for each key where key in set_like is true. set_like is any argument that supports Python's in operator, including Bags, IPSets, IPWildcards, and Python sets, lists, tuples, et cetera.
bag.complement_intersect(set_like): Return a new Bag which contains bag[key] for each key where key in set_like is not true.
bag.ipset(): Return an IPSet consisting of the set of IP address key values from bag with non-zero values. This only works if bag is an IP address bag.
bag.inversion(): Return a new integer Bag for which all values from bag are inserted as key elements. Hence, if two keys in bag have a value of 5, newbag[5] will be equal to two.
bag == bag2: Return True if the contents of bag are equivalent to the contents of bag2, False otherwise.
bag != bag2: Return False if the contents of bag are equivalent to the contents of bag2, True otherwise.
bag.save(filename, compression=DEFAULT): Save the contents of bag in the file filename. The compression determines the compression method used when outputting the file. Valid values are the same as those in silk.silkfile_open().

The following operations and methods will modify the Bag:

bag.clear(): Empty bag, such that bag[key] is zero for all keys.
bag[key] = value: Set the number of key in bag to value.
del bag[key]: Remove key from bag, such that bag[key] is zero.
bag.update(mapping): For each item in mapping, bag is modified such that for each key in mapping, the value for that key in bag will be set to the mapping's value. Valid mappings are those accepted by the Bag() constructor.
bag.add(key[, key2[, ...]]): Add one of each key to bag. This is the same as incrementing the value for each key by one.
bag.add(iterable): Add one of each key in iterable to bag. This is the same as incrementing the value for each key by one.
bag.remove(key[, key2[, ...]]): Remove one of each key from bag. This is the same as decrementing the value for each key by one.
bag.remove(iterable): Remove one of each key in iterable from bag. This is the same as decrementing the value for each key by one.
bag.incr(key, value = 1): Increment the number of key in bag by value. value defaults to one.
bag.decr(key, value = 1): Decrement the number of key in bag by value. value defaults to one.
bag += bag2: Equivalent to bag = bag + bag2, unless an OverflowError is raised, in which case bag is no longer necessarily valid. When an error is not raised, this operation takes less memory than bag = bag + bag2. This operation can change the type of bag, as determined by Bag.type_merge() .
bag -= bag2: Equivalent to bag = bag - bag2. This operation takes less memory than bag = bag - bag2. This operation can change the type of bag, as determined by Bag.type_merge() .
bag *= integer: Equivalent to bag = bag * integer, unless an OverflowError is raised, in which case bag is no longer necessarily valid. When an error is not raised, this operation takes less memory than bag = bag * integer.
bag.constrain_values(min=None, max=None): Remove key from bag if that key's value is less than min or greater than max. At least one of min or max must be specified.
bag.constrain_keys(min=None, max=None): Remove key from bag if that key is less than min, or greater than max. At least one of min or max must be specified.

TCPFlags Object

A TCPFlags object represents the eight bits of flags from a TCP session.

class silk.TCPFlags(value)

The constructor takes either a TCPFlags value, a string, or an integer. If a TCPFlags value, it returns a copy of that value. If an integer, the integer should represent the 8-bit representation of the flags. If a string, the string should consist of a concatenation of zero or more of the characters "F", "S", "R", "P", "A", "U", "E", and "C"---upper or lower-case---representing the FIN, SYN, RST, PSH, ACK, URG, ECE, and CWR flags. Spaces in the string are ignored.

Examples:

 >>> a = TCPFlags('SA')
 >>> b = TCPFlags(5)

Instance attributes (read-only):

flags.fin: True if the FIN flag is set on flags, False otherwise
flags.syn: True if the SYN flag is set on flags, False otherwise
flags.rst: True if the RST flag is set on flags, False otherwise
flags.psh: True if the PSH flag is set on flags, False otherwise
flags.ack: True if the ACK flag is set on flags, False otherwise
flags.urg: True if the URG flag is set on flags, False otherwise
flags.ece: True if the ECE flag is set on flags, False otherwise
flags.cwr: True if the CWR flag is set on flags, False otherwise

Supported operations and methods:

~flags: Return the bitwise inversion (not) of flags
flags1 & flags2: Return the bitwise intersection (and) of the flags from flags1 and flags2
flags1 | flags2: Return the bitwise union (or) of the flags from flags1 and flags2.
flags1 ^ flags2: Return the bitwise exclusive disjunction (xor) of the flags from flags1 and flags2.
int(flags): Return the integer value of the flags set in flags.
str(flags): Return a string representation of the flags set in flags.
flags.padded(): Return a string representation of the flags set in flags. This representation will be padded with spaces such that flags will line up if printed above each other.
flags: When used in a setting that expects a boolean, return True if any flag value is set in flags. Return False otherwise.
flags.matches(flagmask): Given flagmask, a string of the form high_flags/mask_flags, return True if the flags of flags match high_flags after being masked with mask_flags; False otherwise. Given a flagmask without the slash ("/"), return True if all bits in flagmask are set in flags. I.e., a flagmask without a slash is interpreted as "flagmask/flagmask".

Constants:

The following constants are defined:

silk.TCP_FIN: A TCPFlags value with only the FIN flag set
silk.TCP_SYN: A TCPFlags value with only the SYN flag set
silk.TCP_RST: A TCPFlags value with only the RST flag set
silk.TCP_PSH: A TCPFlags value with only the PSH flag set
silk.TCP_ACK: A TCPFlags value with only the ACK flag set
silk.TCP_URG: A TCPFlags value with only the URG flag set
silk.TCP_ECE: A TCPFlags value with only the ECE flag set
silk.TCP_CWR: A TCPFlags value with only the CWR flag set

FGlob Object

An FGlob object is an iterable object which iterates over filenames from a SiLK data store. It does this internally by calling the rwfglob (1) program. The FGlob object assumes that the rwfglob program is in the PATH, and will raise an exception when used if not.

Note: It is generally better to use the silk.site.repository_iter() function from the "silk.site Module" instead of the FGlob object, as that function does not require the external rwfglob program. However, the FGlob constructor allows you to use a different site configuration file every time, whereas the silk.site.init_site() function only supports a single site configuration file.

class silk.FGlob(classname=None, type=None, sensors=None, start_date=None, end_date=None, data_rootdir=None, site_config_file=None): Although all arguments have defaults, at least one of classname, type, sensors, start_date must be specified. The arguments are:

classname: if given, should be a string representing the class name. If not given, defaults based on the site configuration file, silk.conf(5).
type: if given, can be either a string representing a type name or comma-separated list of type names, or can be a list of strings representing type names. If not given, defaults based on the site configuration file, silk.conf.
sensors: if given, should be either a string representing a comma-separated list of sensor names or IDs, and integer representing a sensor ID, or a list of strings or integers representing sensor names or IDs. If not given, defaults to all sensors.
start_date: if given, should be either a string in the format "YYYY/MM/DD[:HH]", a date object, a datetime object (which will be used to the precision of one hour), or a time object (which is used for the given hour on the current date). If not given, defaults to start of current day.
end_date: if given, should be either a string in the format "YYYY/MM/DD[:HH]", a date object, a datetime object (which will be used to the precision of one hour), or a time object (which is used for the given hour on the current date). If not given, defaults to start_date. The end_date cannot be specified without a start_date.
data_rootdir: if given, should be a string representing the directory in which to find the packed SiLK data files. If not given, defaults to the value in the SILK_DATA_ROOTDIR environment variable or the compiled-in default (/data).
site_config_file: if given, should be a string representing the path of the site configuration file, silk.conf. If not given, defaults to the value in the SILK_CONFIG_FILE environment variable or $SILK_DATA_ROOTDIR/silk.conf.

An FGlob object can be used as a standard iterator. For example:

 for filename in FGlob(classname="all", start_date="2005/09/22"):
     for rec in silkfile_open(filename):
         ...

silk.site Module

The silk.site module contains functions that load the SiLK site file, and query information from that file.

silk.site.init_site(siteconf=None, rootdir=None)

Initializes the SiLK system's site configuration. The siteconf parameter, if given, should be the path and name of a SiLK site configuration file (see silk.conf(5)). If siteconf is omitted, the value specified in the environment variable SILK_CONFIG_FILE will be used as the name of the configuration file. If SILK_CONFIG_FILE is not set, the module looks for a file named silk.conf in the following directories: the directory specified by the rootdir argument, the directory specified in the SILK_DATA_ROOTDIR environment variable; the data root directory that is compiled into SiLK (/data); the directories $SILK_PATH/share/silk/ and $SILK_PATH/share/.

The rootdir parameter, if given, should be the path to a SiLK data repository that a configuration that matches the SiLK site configuration. If rootdir is omitted, the value specified in the SILK_DATA_ROOTDIR environment variable will be used, or if that variable is not set, the data root directory that is compiled into SiLK (/data). The rootdir may be specified without a siteconf argument by using rootdir as a keyword argument. I.e., init_site(rootdir="/data").

This function should not generally be called explicitly unless one wishes to use a non-default site configuration file.

The init_site() function can only be called successfully once. The return value of init_site() will be true if the site configuration was successful, or False if a site configuration file was not found. If a siteconf parameter was specified but not found, or if a site configuration file was found but did not parse properly, an exception will be raised instead. Once init_site() has been successfully invoked, silk.site.have_site_config() will return True, and subsequent invocations of init_site() will raise a RuntimeError exception.

Some silk.site methods and RWRec members require information from the silk.conf file, and when these methods are called or members accessed, the silk.site.init_site() function is implicitly invoked with no arguments if it has not yet been called successfully. The list of functions, methods, and attributes that exhibit this behavior include: silk.site.sensors(), silk.site.classtypes(), silk.site.classes(), silk.site.types() , silk.site.default_types(), silk.site.default_class(), silk.site.class_sensors(), silk.site.sensor_id(), silk.site.sensor_from_id(), silk.site.classtype_id(), silk.site.classtype_from_id(), silk.site.set_data_rootdir(), silk.site.repository_iter(), silk.site.repository_silkfile_iter(), silk.site. repository_full_iter(), rwrec.as_dict(), rwrec.classname, rwrec.typename, rwrec.classtype, and rwrec .sensor.

silk.site.have_site_config()

Return True if silk.site.init_site() has been called and was able to successfully find and load a SiLK configuration file, False otherwise.

silk.site.set_data_rootdir(rootdir)

Change the current SiLK data root directory once the silk.conf file has been loaded. This function can be used to change the directory used by the silk.site iterator functions. To change the SiLK data root directory before loading the silk.conf file, call silk.site.init_site() with a rootdir argument. set_data_rootdir() implicitly calls silk.site.init_site() with no arguments before changing the root directory if silk.site.have_site_config() returns False.

silk.site.get_site_config()

Return the current path to the SiLK site configuration file. Before silk.site.init_site() is called successfully, this will return the place that init_site() called with no arguments will first look for a configuration file. After init_site() has been successfully called, this will return the path to the file that init_site() loaded.

silk.site.get_data_rootdir()

Return the current SiLK data root directory.

silk.site.sensors()

Return a tuple of valid sensor names. Implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False. Returns an empty tuple if no site file is available.

silk.site.classes()

Return a tuple of valid class names. Implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False. Returns an empty tuple if no site file is available.

silk.site.types(class)

Return a tuple of valid type names for class class. Implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False. Throws KeyError if no site file is available or if class is not a valid class.

silk.site.classtypes()

Return a tuple of valid (class name, type name) tuples. Implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False. Returns an empty tuple if no site file is available.

silk.site.default_class()

Return the default class name. Implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False. Returns None if no site file is available.

silk.site.default_types(class)

Return a tuple of default types associated with class class. Implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False. Throws KeyError if no site file is available or if class is not a valid class.

silk.site.class_sensors(class)

Return a tuple of sensors that are in class class. Implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False. Throws KeyError if no site file is available or if class is not a valid class.

silk.site.sensor_classes(sensor)

Return a tuple of classes that are associated with sensor. Implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False. Throws KeyError if no site file is available or if sensor is not a valid sensor.

silk.site.sensor_description(sensor)

Return the sensor description as a string, or None if there is no description. Implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False. Throws KeyError if no site file is available or if sensor is not a valid sensor.

silk.site.sensor_id(sensor)

Return the numeric sensor ID associated with the string sensor. Implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False. Throws KeyError if no site file is available or if sensor is not a valid sensor.

silk.site.sensor_from_id(id)

Return the sensor name associated with the numeric sensor ID id. Implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False. Throws KeyError if no site file is available or if id is not a valid sensor identifier.

silk.site.classtype_id( (class, type) )

Return the numeric ID associated with the tuple (class, type). Implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False. Throws KeyError if no site file is available, if class is not a valid class, or if type is not a valid type in class.

silk.site.classtype_from_id(id)

Return the (class, type) name pair associated with the numeric ID id. Implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False. Throws KeyError if no site file is available or if id is not a valid identifier.

silk.site.repository_iter(start=None, end=None, classname=None, types=None, classtypes=None, sensors=None)

Return an iterator over file names in a SiLK repository. The repository is assumed to be in the data root directory that is returned by silk.site. get_data_rootdir() and to conform to the format of the current site configuration. This function implicitly calls silk.site.init_site() with no arguments if silk.site.have_site_config() returns False. See also silk.site.repository_full_iter() and silk.site.repository_silkfile_iter().

The following types are accepted for start and end:

a datetime.datetime object, which is considered to be specified to hour precision
a datetime.date object, which is considered to be specified to day precision
a string in the SiLK date format "YYYY/MM/DD[:HH]", where the timezone depends on how SiLK was compiled; check the value of silk.get_configuration("TIMEZONE_SUPPORT").

The rules for interpreting start and end are:

When both start and end are specified to hour precision, files from all hours within that time range are returned.
When start is specified to day precision, the hour specified in end (if any) is ignored, and files for all dates between midnight at start and the end of the day represented by end are returned.
When end is not specified and start is specified to day precision, files for that complete day are returned.
When end is not specified and start is specified to hour precision, files for that single hour are returned.
When neither start nor end are specified, files for the current day are returned.
It is an error to specify end without start, or to give an end that proceeds start.

To specify classes and types, either use the classname and types parameters or use the classtypes parameter. It is an error to use classname or types when classtypes is specified.

The classname parameter should be a named class that appears in silk.site.classes(). If neither classname nor classtypes are specified, classname will default to that returned by silk.site.default_class().

The types parameter should be either a named type that appears in silk.site.types(classname) or a sequence of said named types. If neither types nor classtypes is specified, types will default to silk.site.default_types(classname).

The classtypes parameter should be a sequence of (classname, type) pairs. These pairs must be in the sequence returned by silk.site.classtypes().

The sensors parameter should be either a sensor name or a sequence of sensor names from the sequence returned by silk.site.sensors(). If sensors is left unspecified, it will default to the list of sensors supported by the given class(es).

silk.site.repository_silkfile_iter(start=None, end=None, classname=None, types=None, classtypes=None, sensors=None): Works similarly to silk.site.repository_iter() except the file names that repository_iter() would return are opened as SilkFile objects and returned.
silk.site.repository_full_iter(start=None, end=None, classname=None, types=None, classtypes=None, sensors=None): Works similarly to silk.site.repository_iter(). Unlike repository_iter(), this iterator's output will include the names of files that do not exist in the repository. The iterator returns (filename, bool) pairs where the bool value represents whether the given filename exists. For more information, see the description of the --print-missing-files switch in rwfglob(1).

silk.plugin Module

silk.plugin is a module to support using PySiLK code as a plug-in to the rwfilter(1), rwcut(1), rwgroup(1), rwsort(1), rwstats(1), and rwuniq(1) applications. The module defines the following methods, which are described in the silkpython(3) manual page:

silk.plugin.register_switch(switch_name, handler=handler, [arg=needs_arg], [help=help_string]): Define the command line switch --switch_name that can be used by the PySiLK plug-in.
silk.plugin.register_filter(filter, [finalize=finalize], [initialize=initialize]): Register the callback function filter that can be used by rwfilter to specify whether the flow record passes or fails.
silk.plugin.register_field(field_name, [add_rec_to_bin=add_rec_to_bin,] [bin_compare=bin_compare,] [bin_bytes=bin_bytes,] [bin_merge=bin_merge,] [bin_to_text=bin_to_text,] [column_width=column_width,] [description=description,] [initial_value=initial_value,] [initialize=initialize,] [rec_to_bin=rec_to_bin,] [rec_to_text=rec_to_text]): Define the new key field or aggregate value field named field_name. Key fields can be used in rwcut, rwgroup, rwsort, rwstats, and rwuniq. Aggregate value fields can be used in rwstats and rwuniq. Creating a field requires specifying one or more callback functions---the functions required depend on the application(s) where the field will be used. To simplify field creation for common field types, the remaining functions can be used instead.
silk.plugin.register_int_field(field_name, int_function, min, max, [width]): Create the key field field_name whose value is an unsigned integer.
silk.plugin.register_ipv4_field(field_name, ipv4_function, [width]): Create the key field field_name whose value is an IPv4 address.
silk.plugin.register_ip_field(field_name, ipv4_function, [width]): Create the key field field_name whose value is an IPv4 or IPv6 address.
silk.plugin.register_enum_field(field_name, enum_function, width, [ordering]): Create the key field field_name whose value is a Python object (often a string).
silk.plugin.register_int_sum_aggregator(agg_value_name, int_function, [max_sum], [width]): Create the aggregate value field agg_value_name that maintains a running sum as an unsigned integer.
silk.plugin.register_int_max_aggregator(agg_value_name, int_function, [max_max], [width]): Create the aggregate value field agg_value_name that maintains the maximum unsigned integer value.
silk.plugin.register_int_min_aggregator(agg_value_name, int_function, [max_min], [width]): Create the aggregate value field agg_value_name that maintains the minimum unsigned integer value.

EXAMPLE

Using PySiLK

The following is an example using the PySiLK bindings. The code is meant to show some standard PySiLK techniques, but is not otherwise meant to be useful.

The code reads each record in a SiLK flow file, checks whether the record's source port is 80/tcp or 8080/tcp and its volume is larger than 3 packets and 120 bytes, stores the destination IP of matching records in an IPset, and writes the IPset to a destination file. In addition, it prints the number of unique destination addresses and the addresses themselves to the standard output. Additional explanations can be found in-line in the comments.

 #! /usr/bin/python

 # Use print functions (Compatible with Python 3.0; Requires 2.6+)
 from __future__ import print_function #Python2.6 or later required

 # Import the PySiLK bindings
 from silk import *

 # Import sys for the command line arguments.
 import sys

 # Main function
 def main():
     if len(sys.argv) != 3:
         print ("Usage: %s infile outset" % sys.argv[0])
         sys.exit(1)

     # Open a silk flow file for reading
     infile = silkfile_open(sys.argv[1], READ)

     # Create an empty IPset
     destset = IPSet()

     # Loop over the records in the file
     for rec in infile:

       # Do comparisons based on rwrec field values
       if (rec.protocol == 6 and rec.sport in [80, 8080] and
           rec.packets > 3 and rec.bytes > 120):

           # Add the dest IP of the record to the IPset
           destset.add(rec.dip)

     # Save the IPset for future use
     try:
         destset.save(sys.argv[2])
     except:
         sys.exit("Unable to write to %s" % sys.argv[2])

     # count the items in the set
     count = 0
     for addr in destset:
         count = count + 1

     print("%d addresses" % count)

     # Another way to do the same
     print("%d addresses" % len(destset))

     # Print the ip blocks in the set
     for base_prefix in destset.cidr_iter():
         print("%s/%d" % base_prefix)

 # Call the main() function when this program is started
 if __name__ == '__main__':
     main()

Adjusting the Class and Type Fields of a Flow File

Normally SiLK flow records get stamped with a class as flow records are recorded in the repository. However, if you are importing raw packet data or need to change some records that inadvertantly have the wrong class/type, PySiLK makes it easy to fix.

The example below sets the class to "all" and assigns a type of "in", "inweb", "out", or "outweb" to each record in an input file. The direction (in or out) is defined by an IPset that represents the internal network (traffic that neither comes from nor goes to the internal network is discarded in this example). Web/non-web flows are separated based on port.

 #! /usr/bin/python

 from __future__ import print_function #Python2.6 or later required
 from silk import *
 import silk.site
 import sys                              # for command line args
 from datetime import timedelta          # for date math

 webports    = (80,443,8080)
 inwebtype   = ("all","inweb")
 intype      = ("all","in")
 outwebtype  = ("all","outweb")
 outtype     = ("all","out")

 def main():
     if len(sys.argv) != 4:
         print("Usage:  %s infile setfile outfile" % sys.argv[0])
         sys.exit(1)

     #  open the SiLK file for reading
     infile = silkfile_open(sys.argv[1], READ)

     #  open the set file which represents my internal network
     #print(sys.argv[2])
     setfile = IPSet.load (sys.argv[2])

     # open the modified output file
     outfile = silkfile.open(sys.argv[3], WRITE)

     #  loop over the records in the file, shift time and write the update:
     for rec in infile:
         #
         #  If the src ip is in the set, it's going out.
         #  If the dst ip is in the set, it's coming in.
         #  If neither IP is in the set, discard the record.
         #
         if (rec.sport in webports) or (rec.dport in webports):
             if rec.sip in setfile:
                 rec.classtype = outwebtype
                 outfile.write(rec)
             elif rec.dip in setfile:
                 rec.classtype = inwebtype
                 outfile.write(rec)
         else:
             if rec.sip in setfile:
                 rec.classtype = outtype
                 outfile.write(rec)
             elif rec.dip in setfile:
                 rec.classtype = intype
                 outfile.write(rec)

     # clean up
     outfile.close()
     infile.close()

 if __name__ == '__main__':
     main()

Changing Timestamps in a Flow File

On occasion you may find that you need to adjust all the timestamps for a SiLK flow file. For example, the flow file came from a packet capture file that was collected in a different time zone and had to be shifted a number of hours. Another possibility is if you need to adjust files because you determine the clock time was off.

It is relatively simple to change the timestamps using PySiLK. The sample code for changing data to another time zone is shown below; a minor change would shift the data by seconds instead of hours.

 #! /usr/bin/python

 from __future__ import print_function #Python2.6 or later required
 from silk import *
 import sys                              # for command line args
 from datetime import timedelta          # for date math

 def main():
     if len(sys.argv) != 4:
         print ("Usage:  %s infile offset-hours outfile" % sys.argv[0])
         sys.exit(1)

     #  open the SiLK file for reading
     infile = silkfile_open(sys.argv[1], READ)

     #  create the time offset object
     offset = timedelta(hours=int(sys.argv[2]))

     # open the modified output file
     outfile = silkfile_open(sys.argv[3], WRITE)

     #  loop over the records in the file, shift time and write the update:
     for rec in infile:
         rec.stime = rec.stime + offset
         outfile.write(rec)

     # clean up
     outfile.close()
     infile.close()

 if __name__ == '__main__':
     main()

Grouping FTP Flow Records

The following script attempts to group all flows representing one direction of an FTP session and print them together. It takes as an argument the name of a file containing raw SiLK records sorted by start time and port number ("rwsort --fields=stime,sport"). The script extracts from the file all flows that potentially represent FTP traffic. We define a possible FTP flow as any flow where:

the source port is 21 (FTP control channel)
the source port is 20 (FTP data transfer port )
both the source port and destination port are ephemeral (data transfer)

If a flow record has a source port of 21, the script adds the source and destination address to the list of possible FTP groups. The script categorizes each data transfer flow (source port 20 or ephemeral to ephemeral) according to its source and destination IP address pair. If a flow from the control channel with the same source and destination IP address exists the source and destination ports in the flow are added to the list of ports associated with the control channel interaction, otherwise the script lists the data transfer as being unclassified. After the entire file is processed, all FTP sessions that have been grouped are displayed.

 #! /usr/bin/python

 from __future__ import print_function #Python2.6 or later required
 # import the necessary modules
 import silk
 import sys

 # Test that the argument number is correct
 if (len(sys.argv) != 2):
     print("Must supply a SiLK data file.")
     sys.exit()

 # open the SiLK file for reading
 rawFile=silk.silkfile_open(sys.argv[1], silk.READ)

 # Initialize the record structure
 # Unclassified will be the record ephemeral to ephemeral
 # connections that don't appear to have a control channel
 interactions = {"Unclassified":[]}

 # Count of records processed
 count = 0

 # Process the input file
 for rec in rawFile:
     count += 1
     key="%15s <--> %15s"%(rec.sip,rec.dip)
     if (rec.sport==21):
         if not key in interactions:
             interactions[key] = []
     else:
         if key in interactions:
             interactions[key].append("%5d <--> %5d"%(rec.sport,rec.dport))
         else:
             interactions["Unclassified"].append(
                 "%15s:%5d <--> %15s:%5d"%(rec.sip,rec.sport,rec.dip,rec.dport))

 # Print the count of all records
 print(str(count) + " records processed")

 # Print the groups of FTP flows
 keyList = sorted(interactions.keys())
 for key in keyList:
     print("\n" + key + " " + str(len(interactions[key])))
     if (key != "Unclassified"):
         for line in interactions[key]:
             print("   " + line)

Example output of the script:

 184 records processed

 xxx.xxx.xxx.236 <--> yyy.yyy.yyy.231 3
       20 <--> 56180
       20 <--> 56180
       20 <--> 58354

 Unclassified 158

ENVIRONMENT

The following environment variables affect the tools in the SiLK tool suite.

SILK_CONFIG_FILE: This environment variable contains the location of the site configuration file, silk.conf. This variable will be used by silk.site.init_site() if no argument is passed to that method.
SILK_DATA_ROOTDIR: This variable gives the root of directory tree where the data store of SiLK Flow files is maintained, overriding the location that is compiled into the tools (/data). This variable will be used by the FGlob constructor unless an explicit data_rootdir value is specified. In addition, the silk.site.init_site() may search for the site configuration file, silk.conf, in this directory.
SILK_COUNTRY_CODES: This environment variable gives the location of the country code mapping file that the silk.init_country_codes() function will use when no name is given to that function. The value of this environment variable may be a complete path or a file relative to the SILK_PATH. See the "FILES" section for standard locations of this file.
SILK_CLOBBER: The SiLK tools normally refuse to overwrite existing files. Setting SILK_CLOBBER to a non-empty value removes this restriction.
SILK_PATH: This environment variable gives the root of the install tree. When searching for configuration files, PySiLK may use this environment variable. See the "FILES" section for details.
PYTHONPATH: This is the search path that Python uses to find modules and extensions. The SiLK Python extension described in this document may be installed outside Python's installation tree; for example, in SiLK's installation tree. It may be necessary to set or modify the PYTHONPATH environment variable so Python can find the SiLK extension.
PYTHONVERBOSE: If the SiLK Python extension fails to load, setting this environment variable to a non-empty string may help you debug the issue.
SILK_PYTHON_TRACEBACK: When set, Python plug-ins (see silkpython(3)) will output trace back information regarding Python errors to the standard error.
PATH: This is the standard search path for executable programs. The FGlob constructor will invoke the rwfglob(1) program; the directory containing rwfglob should be included in the PATH.
TZ: When a SiLK installation is built to use the local timezone (to determine if this is the case, check the value of silk.get_configuration("TIMEZONE_SUPPORT")), the value of the TZ environment variable determines the timezone in which silk.site. repository_iter() parses timestamp strings. If the TZ environment variable is not set, the default timezone is used. Setting TZ to 0 or the empty string causes timestamps to be parsed as UTC. The value of the TZ environment variable is ignored when the SiLK installation uses utc. For system information on the TZ variable, see tzset(3).

FILES

${SILK_CONFIG_FILE}
ROOT_DIRECTORY/silk.conf
${SILK_PATH}/share/silk/silk.conf
${SILK_PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf: Possible locations for the SiLK site configuration file which are checked when no argument is passed to silk.site.init_site().
${SILK_COUNTRY_CODES}
${SILK_PATH}/share/silk/country_codes.pmap
${SILK_PATH}/share/country_codes.pmap
/usr/local/share/silk/country_codes.pmap
/usr/local/share/country_codes.pmap: Possible locations for the country code mapping file used by silk.init_country_codes() when no name is given to the function.
${SILK_DATA_ROOTDIR}/
/data/: Locations for the root directory of the data repository. The silk.site. init_site() may search for the site configuration file, silk.conf, in this directory.