|
NAMEsilkpython - SiLK Python plug-inSYNOPSISrwfilter --python-file=FILENAME [--python-file=FILENAME ...] ... rwfilter --python-expr=PYTHON_EXPRESSION ... rwcut --python-file=FILENAME [--python-file=FILENAME ...] --fields=FIELDS ... rwgroup --python-file=FILENAME [--python-file=FILENAME ...] --id-fields=FIELDS ... rwsort --python-file=FILENAME [--python-file=FILENAME ...] --fields=FIELDS ... rwstats --python-file=FILENAME [--python-file=FILENAME ...] --fields=FIELDS --values=VALUES ... rwuniq --python-file=FILENAME [--python-file=FILENAME ...] --fields=FIELDS --values=VALUES ... DESCRIPTIONThe SiLK Python plug-in provides a way to use PySiLK (the SiLK extension for python(1) described in pysilk(3)) to extend the capability of several SiLK tools.
To extend the SiLK tools using PySiLK, the user writes a Python file that calls Python functions defined in the silk.plugin Python module and described in this manual page. When the user specifies the --python-file switch to a SiLK application, the application loads the Python file and makes the new functionality available. The following sections will describe
Typically you will not need to explicitly import the silk.plugin module, since the --python-file switch does this for you. In a module used by a Python plug-in, the module can gain access to the functions defined in this manual page by importing them from silk.plugin: from silk.plugin import * Hint: If you want to check whether the Python code in FILENAME is defining the switches and fields you expect, you can load the Python file and examine the output of --help, for example: rwcut --python-file=FILENAME --help User-defined command line switchesCommand line switches can be added and handled from within a SiLK Python plug-in. In order to add a new switch, use the following function:register_switch(switch_name, handler=handler_func, [arg=needs_arg], [help=help_string])
rwfilter usageWhen used in conjunction with rwfilter(1), the SiLK Python plug-in allows users to define arbitrary partitioning criteria using the SiLK extension to the Python programming language. To use this capability, the user creates a Python file and specifies its name with the --python-file switch in rwfilter. The file should call the register_filter() function for each filter that it wants to create:register_filter(filter_func, [finalize=finalize_func], [initialize=initialize_func])
If register_filter() is called multiple times, the filter_func(), initialize_func() , and finalize_func() functions will be invoked in the order in which the register_filter() functions were seen. NOTE: For backwards compatibility, when the file named by --python-file does not call register_filter(), rwfilter will search the Python file for functions named rwfilter() and finalize(). If it finds the rwfilter() function, rwfilter will act as if the file contained: register_filter(rwfilter, finalize=finalize) The --python-file switch requires the user to create a file containing Python code. To allow the user to write a small filtering check in Python, rwfilter supports the --python-expr switch. The value of the switch should be a Python expression whose result determines whether a given record passes or fails, using the same criterion as the filter_func() function described above. In the expression, the variable "rec" is bound to the current silk.RWRec object. There is no support for the initialize_func() and finalize_func() functions. The user may consider --python-expr=PYTHON_EXPRESSION as being implemented by from silk import * def temp_filter(rec): return (PYTHON_EXPRESSION) register_filter(temp_filter) The --python-file and --python-expr switches allow for much flexibility but at the cost of speed: converting a SiLK Flow record into an RWRec is expensive relative to most operations in rwfilter. The user should use rwfilter's built-in partitioning switches to whittle down the input as much as possible, and only use the Python code to do what is difficult or impossible to do otherwise. Simple field registration functionsThe silk.plugin module defines a function that can be used to define fields for use in rwcut, rwgroup, rwsort, rwstats, and rwuniq. That function is powerful, but it is also complex. To make it easy to define fields for the common cases, the silk.plugin provides the functions described in this section that create a key field or an aggregate value field. The advanced function is described later in this manual page ("Advanced field registration function").Once you have created a key field or aggregate value field, you must include the field's name in the argument to the --fields or --values switch to tell the application to use the field. Integer key field The following function is used to create a key field whose value is an unsigned integer. register_int_field(field_name, int_function, min, max, [width])
IPv4 address key field This function is used to create a key field whose value is an IPv4 address. (See also register_ip_field()). register_ipv4_field(field_name, ipv4_function, [width])
IP address key field The next function is used to create a key field whose value is an IPv4 or IPv6 address. register_ip_field(field_name, ip_function, [width])
This key field requires more memory internally than fields registered by the register_ipv4_field() function. If SiLK is compiled without IPv6 support, register_ip_field() works exactly like register_ipv4_field(), including the default width of 15. Enumerated object key field The following function is used to create a key field whose value is any Python object. The maximum number of different objects that can be represented is 4,294,967,296, or 2^32. register_enum_field(field_name, enum_function, width, [ordering])
Integer sum aggregate value field This function is used to create an aggregate value field that maintains a running unsigned integer sum. register_int_sum_aggregator(agg_value_name, int_function, [max_sum], [width])
Integer maximum aggregate value field The following function is used to create an aggregate value field that maintains the maximum unsigned integer value. register_int_max_aggregator(agg_value_name, int_function, [max_max], [width])
Integer minimum aggregate value field This function is used to create an aggregate value field that maintains the minimum unsigned integer value. register_int_min_aggregator(agg_value_name, int_function, [max_min], [width])
Advanced field registration functionThe previous section provided functions to register a key field or an aggregate value field when dealing with common objects. When you need to use a complex object, or you want more control over how the object is handled in PySiLK, you can use the register_field() function described in this section.Many of the arguments to the register_field() function are callback functions that you must create and that the application will invoke. (The simple registration functions above have already taken care of defining these callback functions.) Often the callback functions for handling fields will either take (as a parameter) or return a representation of a numeric value that can be processed from C. The most efficient way to handle these representations is as a string containing binary characters, including the null byte. We will use the term "byte sequence" for these representations; other possible terms include "array of bytes", "byte strings", or "binary values". For hints on creating byte sequences from Python, see the "Byte sequences" section below. To define a new field or aggregate value, the user calls: register_field(field_name, [add_rec_to_bin=add_rec_to_bin_func,] [bin_compare=bin_compare_func,] [bin_bytes=bin_bytes_value,] [bin_merge=bin_merge_func,] [bin_to_text=bin_to_text_func,] [column_width=column_width_value,] [description=description_string,] [initial_value=initial_value,] [initialize=initialize_func,] [rec_to_bin=rec_to_bin_func,] [rec_to_text=rec_to_text_func]) Although the keyword arguments to register_field() are all optional from Python's perspective, certain keyword arguments must be present before an application will define the key or aggregate value. The following table summarizes the keyword arguments used by each application. An "F" means the argument is required for a key field, an "A" means the argument is required for an aggregate value field, "f" and "a" mean the application will use the argument for a key field or an aggregate value if the argument is present, and a dot means the application completely ignores the argument. rwcut rwgroup rwsort rwstats rwuniq add_rec_to_bin . . . A A bin_compare . . . A . bin_bytes . F F F,A F,A bin_merge . . . A A bin_to_text . . . F,A F,A column_width F . . F,A F,A description f f f f,a f,a initial_value . . . a a initialize f f f f,a f,a rec_to_bin . F F F F rec_to_text F . . . . The following sections describe how to use register_field() in each application. rwcut usageThe purpose of rwcut(1) is to print attributes of (or attributes derived from) every SiLK record it reads as input. A plug-in used by rwcut must produce a printable (textual) attribute from a SiLK record. To define a new attribute, the register_field() method should be called as shown:register_field(field_name, column_width=column_width_value, rec_to_text=rec_to_text_func, [description=description_string,] [initialize=initialize_func])
If the rec_to_text argument is not present, the register_field() function will do nothing when called from rwcut. If the column_width argument is missing, rwcut will complain that the textual width of the plug-in field is 0. rwgroup and rwsort usageThe rwsort(1) tool sorts SiLK records by their attributes or attributes derived from them. rwgroup(1) reads sorted SiLK records and writes a common value into the next hop IP field of all records that have common attributes. The output from both of these tools is a stream of SiLK records (the output typically includes every record that was read as input). A plug-in used by these tools must return a value that the application can use internally to compare records. To define a new field that may be included in the --id-fields switch to rwgroup or the --fields switch to rwsort, the register_field() method should be invoked as follows:register_field(field_name, bin_bytes=bin_bytes_value, rec_to_bin=rec_to_bin_func, [description=description_string,] [initialize=initialize_func])
If the rec_to_bin argument is not present, the register_field() function will do nothing when called from rwgroup or rwsort. If the bin_bytes argument is missing, rwgroup or rwsort will complain that the binary width of the plug-in field is 0. rwstats and rwuniq usagerwstats(1) and rwuniq(1) group SiLK records into bins based on key fields. Once a record is matched to a bin, the record is used to update the aggregate values (e.g., the sum of bytes) that are being computed, and the record is discarded. Once all records have been processed, the key fields and the aggregate values are printed.Key Field A plug-in used by rwstats or rwuniq for creating a new key field must return a value that the application can use internally to compare records, and there must be a function that converts that value to a printable representation. The following invocation of register_field() will produce a key field that can be used in the --fields switch of rwstats or rwuniq: register_field(field_name, bin_bytes=bin_bytes_value, bin_to_text=bin_to_text_func, column_width=column_width_value, rec_to_bin=rec_to_bin_func, [description=description_string,] [initialize=initialize_func]) The arguments are:
Aggregate Value A plug-in used by rwstats or rwuniq for creating a new aggregate value must be able to use a SiLK record to update an aggregate value, take two aggregate values and merge them to a new value, and convert that aggregate value to a printable representation. To use an aggregate value for ordering the bins in rwstats, the plug-in must also define a function to compare two aggregate values. The aggregate values are represented as byte sequences. To define a new aggregate value in rwstats, the user calls: register_field(agg_value_name, add_rec_to_bin=add_rec_to_bin_func, bin_bytes=bin_bytes_value, bin_merge=bin_merge_func, bin_to_text=bin_to_text_func, column_width=column_width_value, [bin_compare=bin_compare_func,] [description=description_string,] [initial_value=initial_value,] [initialize=initialize_func]) The call to define a new aggregate value in rwuniq is nearly identical: register_field(agg_value_name, add_rec_to_bin=add_rec_to_bin_func, bin_bytes=bin_bytes_value, bin_merge=bin_merge_func, bin_to_text=bin_to_text_func, column_width=column_width_value, [description=description_string,] [initial_value=initial_value,] [initialize=initialize_func]) The arguments are:
Byte sequencesThe rwgroup, rwsort, rwstats, and rwuniq programs make extensive use of "byte sequences" (a.k.a., "array of bytes", "byte strings", or "binary values") in their plug-in functions. The byte sequences are used in both key fields and aggregate values.When used as key fields, the values can represent uniqueness or indicate sort order. Two records with the same byte sequence for a field will be considered identical with respect to that field. When sorting, the byte sequences are compared in network byte order. That is, the most significant byte is compared first, followed by the next-most-significant byte, etc. This equates to string comparison starting with the left-hand side of the string. When used as an aggregate field, the byte sequences are expected to behave more like numbers, with the ability to take binary record and add a value to it, or to merge (e.g., add) two byte sequences outside the context of a SiLK record. Every byte sequence has an associated length, which is passed into the register_field() function in the bin_bytes argument. The length determines how many values the byte sequence can represent. A byte sequence with a length of 1 can represent up to 256 unique values (from 0 to 255 inclusive). A byte sequence with a length of 2 can represent up to 65536 unique values (0 to 65535). To generalize, a byte sequence with a length of n can represent up to 2^(8n) unique values (0 to 2^(8n)-1). How byte sequences are represented in Python depends on the version of Python. Python represents a sequence of characters using either the bytes type (introduced in 2.6) or the unicode type. The bytes type can encode byte sequences while the unicode type cannot. In Python 2, the str (string) type was an alias for bytes, so that any Python 2 string is in effect a byte sequence. In Python 3, str is an alias for unicode, thus Python 3 strings are unicode objects and cannot represent byte sequences. Python does not make conversions between integers and byte sequences particularly natural. As a result, here are some pointers on how to do these conversions: Use the bytes() and ord() methods If you converting a single integer value that is less than 256, the easiest way to convert it to a byte sequence is to use the bytes() function; to convert it back, use the ord() function. seq = bytes([num]) num = ord(seq) The bytes() function takes a list of integers between 0 and 255 inclusive, and returns a bytes sequence of the length of that list. To convert a single byte, use a list of a single element. The ord() function takes a byte sequence of a single byte and returns an integer between 0 and 255. Note: In versions of Python earlier than 2.6, use the chr() function instead of the bytes() function. It takes a single number as its argument. chr() will work in Python 2.6 and 2.7 as well, but there are compatibility problems in Python 3.x. Use the struct module When the value you are converting to a byte sequence is 255 or greater, you have to go with another option. One of the simpler options is to use Python's built-in struct module. With this module, you can encode a number or a set of numbers into a byte sequence and convert the result back using a struct.Struct object. Encoding the numbers to a byte sequence uses the object's pack() method. To convert that byte sequence back to the number or set of numbers, use the object's unpack() method. The length of the resulting byte sequences can be found in the size attribute of the struct. Struct() object. A formatting string is used to indicate how the numbers are encoded into binary. For example: import struct # Set up the format for two 64-bit numbers two64 = struct.Struct("!QQ) # Encode two 64-bit numbers as a byte sequence seq = two64.pack(num1, num2) #Unpack a byte sequence back into two 64-bit numbers (num1, num2) = two64.unpack(seq) #Length of the encoded byte sequence bin_bytes = two64.size In the above, "Q" represents a single unsigned 64-bit number (an unsigned long long or quad). The "!" at the beginning of the string forces network byte order. (For sort comparison purposes, always pack in network byte order.) Here is another example, which encodes a signed 16-bit integer and a floating point number: import struct # Set up the format for a 16-bit signed integer and a float obj = struct.Struct("!hf") #Encode a 16-bit signed integer and a float as a byte sequence seq = obj.pack(intval, floatval) #Unpack a byte sequence back into a 16-bit signed integer and a float (intval, floatval) = obj.unpack(seq) #Length of the encoded byte sequence bin_bytes = obj.size Note that unpack() returns a sequence. When unpacking a single value, assign the result of unpack to (variable_name,), as shown: import struct u32 = struct.Struct("!I") #Encode an unsigned 32-bit integer as a byte sequence seq = u32.pack(num1) #Unpack a byte sequence back into a unsigned 32-bit integer (num1,) = struct.unpack(seq) #Length of the encoded byte sequence bin_bytes = u32.size The full list of codes can be found in the Python library documentation for the struct module, <http://docs.python.org/library/struct.html>. Note: Python versions prior to 2.5 do not include support for the struct.Struct object. For older versions of Python, you have to use struct's functional interface. For example: import struct #Encode a 16-bit signed integer and a float as a byte sequence seq = struct.pack("!hf", intval, floatval) #Unpack a byte sequence back into a 16-bit signed integer and a float (intval, floatval) = struct.unpack("!hf", seq) #Length of the encoded byte sequence bin_bytes = struct.calcsize("!hf") This method works in Python 2.5 and above as well, but is inherently slower, as it requires re-evaluation of the format string for each packing and unpacking operation. Only use this if there is a need to inter-operate with older versions of Python. Use the array module The Python array module provides another way to create byte sequences. Beware that the array module does not provide an automatic way to encode the values in network byte order. OPTIONSThe following options are available when the SiLK Python plug-in is used from rwfilter.
The following options are available when the SiLK Python plug-in is used from rwcut, rwgroup, rwsort, rwstats, or rwuniq:
EXAMPLESIn the following examples, the dollar sign ("$") represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash ("\") is used to indicate a wrapped line.rwfilter --python-exprSuppose you want to find traffic destined to a particular host, 10.0.0.23, that is either ICMP or coming from 1434/udp. If you attempt to use:$ rwfilter --daddr=10.0.0.23 --proto=1,17 --sport=1434 \ --pass=outfile.rw flowrec.rw the --sport option will not match any of the ICMP traffic, and your result will not contain ICMP records. To avoid having to use two invocations of rwfilter, you can use the SiLK Python plugin to do the check in a single pass: $ rwfilter --daddr=10.0.0.23 --proto=1,17 \ --python-expr 'rec.protocol==1 or rec.sport==1434' \ --pass=outfile.rw flowrec.rw Since the Python code is slower than the C code used internally by rwfilter, we want to limit the number of records processed in Python as much as possible. We use the rwfilter switches to do the address check and protocol check, and in Python we only need to check whether the record is ICMP or if the source port is 1434 (if the record is not ICMP we know it is UDP because of the --proto switch). rwfilter --python-fileTo see all records whose protocol is different from the preceding record, use the following Python code. The code also prints a message to the standard output on completion.import sys def filter(rec): global lastproto if rec.protocol != lastproto: lastproto = rec.protocol return True return False def initialize(): global lastproto lastproto = None def finalize(): sys.stdout.write("Finished processing records.\n") register_filter(filter, initialize = initialize, finalize = finalize) The preceding file, if called lastproto.py, can be used like this: $ rwfilter --python-file lastproto.py --pass=outfile.rw flowrec.rw Note: Be careful when using a Python plug-in to write to the standard output, since the Python output could get intermingled with the output from --pass=stdout and corrupt the SiLK output file. In general, printing to the standard error is safer. Command line switchThe following code registers the command line switch "count-protocols". This switch is similar to the standard --protocol switch on rwfilter, in that it passes records whose protocol matches a value specified in a list. In addition, when rwfilter exits, the plug-in prints a count of the number of records that matched each specified protocol.import sys from silk.plugin import * pro_count = {} def proto_count(rec): global pro_count if rec.protocol in pro_count.keys(): pro_count[rec.protocol] += 1 return True return False def print_counts(): for p,c in pro_count.iteritems(): sys.stderr.write("%3d|%10d|\n" % (p, c)) def parse_protocols(protocols): global pro_count for p in protocols.split(","): pro_count[int(p)] = 0 register_filter(proto_count, finalize = print_counts) register_switch("count-protocols", handler=parse_protocols, help="Like --proto, but prints count of flow records") When this code is saved to the file count-proto.py, it can be used with rwfilter as shown to get a count of TCP and UDP flow records: $ rwfilter --start-date=2008/08/08 --type=out \ --python-file=count-proto.py --count-proto=6,17 \ --print-statistics=/dev/null rwfilter does not know that the plug-in will be generating output, and rwfilter will complain unless an output switch is given, such as --pass or --print-statistics. Since our plug-in is printing the data we want, we send the output to /dev/null. Create integer key field with simple APIThis example creates a field that contains the sum of the source and destination port. While this value may not be interesting to display in rwcut, it provides a way to sort fields so traffic between two low ports will usually be sorted before traffic between a low port and a high port.def port_sum(rec): return rec.sport + rec.dport register_int_field("port-sum", port_sum) If the above code is saved in a file named portsum.py, it can be used to sort traffic prior to printing it (low-port to low-port will appear first): $ rwfilter --start-date=2008/08/08 --type=out,outweb \ --proto=6,17 --pass=stdout \ | rwsort --python-file=portsum.py --fields=port-sum \ | rwcut To see high-port to high-port traffic first, reverse the sort: $ rwfilter --start-date=2008/08/08 --type=out,outweb \ --proto=6,17 --pass=stdout \ | rwsort --python-file=portsum.py --fields=port-sum \ --reverse \ | rwcut Create IP key field with simple APISiLK stores uni-directional flows. For network conversations that cross the network border, the source and destination hosts are swapped depending on the direction of the flow. For analysis, you often want to know the internal and external hosts.The following Python plug-in file defines two new fields: "internal-ip" will display the destination IP for an incoming flow, and the source IP for an outgoing flow, and "external-ip" field shows the reverse. import silk # for convenience, create lists of the types in_types = ['in', 'inweb', 'innull', 'inicmp'] out_types = ['out', 'outweb', 'outnull', 'outicmp'] def internal(rec): "Returns the IP Address of the internal side of the connection" if rec.typename in out_types: return rec.sip else: return rec.dip def external(rec): "Returns the IP Address of the external side of the connection" if rec.typename in in_types: return rec.sip else: return rec.dip register_ip_field("internal-ip", internal) register_ip_field("external-ip", external) If the above code is saved in a file named direction.py, it can be used to show the internal and external IP addresses and flow direction for all traffic on 1434/udp from Aug 8, 2008. $ rwfilter --start-date=2008/08/08 --type=all \ --proto=17 --aport=1434 --pass=stdout \ | rwcut --python-file direction.py \ --fields internal-ip,external-ip,3-12 Create enumerated key field with simple APIThis example expands the previous example. Suppose instead of printing the internal and external IP address, you wanted to group by the label associated with the internal and external addresses in a prefix map file. The pmapfilter(3) manual page specifies how to print labels for source and destination IP addresses, but it does not support internal and external IPs.Here we take the previous example, add a command line switch to specify the path to a prefix map file, and have the internal and external functions return the label. import silk # for convenience, create lists of the types in_types = ['in', 'inweb', 'innull', 'inicmp'] out_types = ['out', 'outweb', 'outnull', 'outicmp'] # handler for the --int-ext-pmap command line switch def set_pmap(arg): global pmap pmap = silk.PrefixMap(arg) labels = pmap.values() width = max(len(x) for x in labels) register_enum_field("internal-label", internal, width, labels) register_enum_field("external-label", external, width, labels) def internal(rec): "Returns the label for the internal side of the connection" global pmap if rec.typename in out_types: return pmap[rec.sip] else: return pmap[rec.dip] def external(rec): "Returns the label for the external side of the connection" global pmap if rec.typename in in_types: return pmap[rec.sip] else: return pmap[rec.dip] register_switch("int-ext-pmap", handler=set_pmap, help="Prefix map file for internal-label, external-label") Assuming the above is saved in the file int-ext-pmap.py, the following will group the flows by the internal and external labels contained in the file ip-map.pmap. $ rwfilter --start-date=2008/08/08 --type=all \ --proto=17 --aport=1434 --pass=stdout \ | rwuniq --python-file int-ext-pmap.py \ --int-ext-pmap ip-map.pmap \ --fields internal-label,external-label Create minimum/maximum integer value field with simple APIThe following example will create new aggregate fields to print the minimum and maximum byte values:register_int_min_aggregator("min-bytes", lambda rec: rec.bytes, (1 << 32) - 1) register_int_max_aggregator("max-bytes", lambda rec: rec.bytes, (1 << 32) - 1) The lambda expression allows one to create an anonymous function. In this code, we need to return the number of bytes for the given record, and we can easily do that with the anonymous function. Since the SiLK bytes field is 32 bits, the maximum 32-bit number is passed the registration functions. Assuming the code is stored in a file bytes.py, it can be used with rwuniq to see the minimum and maximum byte counts for each source IP address: $ rwuniq --python-file=bytes.py --fields=sip \ --values=records,bytes,min-bytes,max-bytes Create IP key for rwcut with advanced APIThis example is similar to the simple IP example above, but it uses the advanced API. It also creates another field to indicate the direction of the flow, and it does not print the IPs when the traffic does not cross the border. Note that this code has to determine the column width itself.import silk, os # for convenience, create lists of the types in_types = ['in', 'inweb', 'innull', 'inicmp'] out_types = ['out', 'outweb', 'outnull', 'outicmp'] internal_only = ['int2int'] external_only = ['ext2ext'] # determine the width of the IP field depending on whether SiLK # was compiled with IPv6 support, and allow the IP_WIDTH environment # variable to override that width. ip_len = 15 if silk.ipv6_enabled(): ip_len = 39 ip_len = int(os.getenv("IP_WIDTH", ip_len)) def cut_internal(rec): "Returns the IP Address of the internal side of the connection" if rec.typename in in_types: return rec.dip if rec.typename in out_types: return rec.sip if rec.typename in internal_only: return "both" if rec.typename in external_only: return "neither" return "unknown" def cut_external(rec): "Returns the IP Address of the external side of the connection" if rec.typename in in_types: return rec.sip if rec.typename in out_types: return rec.dip if rec.typename in internal_only: return "neither" if rec.typename in external_only: return "both" return "unknown" def internal_external_direction(rec): """Generates a string pointing from the sip to the dip, assuming internal is on the left, and external is on the right.""" if rec.typename in in_types: return "<---" if rec.typename in out_types: return "--->" if rec.typename in internal_only: return "-><-" if rec.typename in external_only: return "<-->" return "????" register_field("internal-ip", column_width = ip_len, rec_to_text = cut_internal) register_field("external-ip", column_width = ip_len, rec_to_text = cut_external) register_field("int_to_ext", column_width = 4, rec_to_text = internal_external_direction) The cut_internal() and cut_external() functions may return an IPAddr object instead of a string. For those cases, the Python str() function is invoked automatically to convert the IPAddr to a string. If the above code is saved in a file named direction.py, it can be used to show the internal and external IP addresses and flow direction for all traffic on 1434/udp from Aug 8, 2008. $ rwfilter --start-date=2008/08/08 --type=all \ --proto=17 --aport=1434 --pass=stdout \ | rwcut --python-file direction.py \ --fields internal-ip,int_to_ext,external-ip,3-12 Create integer key field for rwsort with the advanced APIThe following example Python plug-in creates one new field, "lowest_port", for use in rwsort. Using this field will sort records based on the lesser of the source port or destination port; for example, flows where either the source or destination port is 22 will occur before flows where either port is 25. This example shows using the Python struct module with multiple record attributes.import struct portpair = struct.Struct("!HH") def lowest_port(rec): if rec.sport < rec.dport: return portpair.pack(rec.sport, rec.dport) else: return portpair.pack(rec.dport, rec.sport) register_field("lowest_port", bin_bytes = portpair.size, rec_to_bin = lowest_port) To use this example to sort the records in flowrec.rw, one saves the code to the file sort.py and uses it as shown: $ rwsort --python-file=sort.py --fields=lowest_port \ flowrec.rw > outfile.rw Create integer key for rwstats and rwuniq with advanced APIThe following example defines two key fields for use by rwstats or rwuniq: "prefixed-sip" and "prefixed-dip". Using these fields, the user can count flow records based on the source and/or destination IPv4 address blocks (CIDR blocks). The default CIDR prefix is 16, but it can be changed by specifying the --prefix switch that the example creates. This example uses the Python struct module to convert between the IP address and a binary string.import os, struct from silk import * default_prefix = 16 u32 = struct.Struct("!L") def set_mask(prefix): global mask mask = 0xFFFFFFFF # the value we are handed is a string prefix = int(prefix) if 0 < prefix < 32: mask = mask ^ (mask >> prefix) # Convert from an IPv4Addr to a byte sequence def cidr_to_bin(ip): if ip.is_ipv6(): raise ValueError, "Does not support IPv6" return u32.pack(int(ip) & mask) # Convert from a byte sequence to an IPv4Addr def cidr_bin_to_text(string): (num,) = u32.unpack(string) return IPv4Addr(num) register_field("prefixed-sip", column_width = 15, rec_to_bin = lambda rec: cidr_to_bin(rec.sip), bin_to_text = cidr_bin_to_text, bin_bytes = u32.size) register_field("prefixed-dip", column_width = 15, rec_to_bin = lambda rec: cidr_to_bin(rec.dip), bin_to_text = cidr_bin_to_text, bin_bytes = u32.size) register_switch("prefix", handler=set_mask, help="Set prefix for prefixed-sip/prefixed-dip fields") set_mask(default_prefix) The lambda expression allows one to create an anonymous function. In this code, the lambda function is used to pass the appropriate IP address into the cidr_to_bin() function. To write the code without the lambda would require separate functions for the source and destination IP addresses: def sip_cidr_to_bin(rec): return cidr_to_bin(rec.sip) def dip_cidr_to_bin(rec): return cidr_to_bin(rec.dip) The lambda expression helps to simplify the code. If the code is saved in the file mask.py, it can be used as follows to count the number of flow records seen in the /8 of each source IP address. The flow records are read from flowrec.rw. The --ipv6-policy=ignore switch is used to restrict processing to IPv4 addresses. $ rwuniq --ipv6-policy=ignore --python-file mask.py \ --prefix 8 --fields prefixed-sip flowrec.rw Create new average bytes value field for rwstats and rwuniqThe following example creates a new aggregate value that can be used by rwstats and rwuniq. The value is "avg-bytes", a value that calculates the average number of bytes seen across all flows that match the key. It does this by maintaining running totals of the byte count and number of flows.import struct fmt = struct.Struct("QQ") initial = fmt.pack(0, 0) textsize = 15 textformat = "%%%d.2f" % textsize # add byte and flow count from 'rec' to 'current' def avg_bytes(rec, current): (total, count) = fmt.unpack(current) return fmt.pack(total + rec.bytes, count + 1) # return printable representation def avg_to_text(bin): (total, count) = fmt.unpack(bin) return textformat % (float(total) / count) # merge two encoded values. def avg_merge(rec1, rec2): (total1, count1) = fmt.unpack(rec1) (total2, count2) = fmt.unpack(rec2) return fmt.pack(total1 + total2, count1 + count2) # compare two encoded values def avg_compare(rec1, rec2): (total1, count1) = fmt.unpack(rec1) (total2, count2) = fmt.unpack(rec2) # Python 2: #return cmp((float(total1) / count1), (float(total2) / count2)) # Python 3: avg1 = float(total1) / count1 avg2 = float(total2) / count2 if avg1 < avg2: return -1 return avg1 > avg2 register_field("avg-bytes", column_width = textsize, bin_bytes = fmt.size, add_rec_to_bin = avg_bytes, bin_to_text = avg_to_text, bin_merge = avg_merge, bin_compare = avg_compare, initial_value = initial) To use this code, save it as avg-bytes.py, specify the name of the Python file in the --python-file switch, and list the field in the --values switch: $ rwuniq --python-file=avg-bytes.py --fields=sip \ --values=avg-bytes infile.rw This particular example will compute the average number of bytes per flow for each distinct source IP address in the file infile.rw. Create integer key field for all tools that use fieldsThe following example Python plug-in file defines two fields, "sport-service" and "dport-service". These fields convert the source port and destination port to the name of the "service" as defined in the file /etc/services; for example, port 80 is converted to "http". This plug-in can be used by any of rwcut, rwgroup, rwsort, rwstats, or rwuniq.import os,socket,struct u16 = struct.Struct("!H") # utility function to convert number to a service name, # or to a string if no service is defined def num_to_service(num): try: serv = socket.getservbyport(num) except socket.error: serv = "%d" % num return serv # convert the encoded port to a service name def bin_to_service(bin): (port,) = u16.unpack(bin) return num_to_service(port) # width of service columns can be specified with the # SERVICE_WIDTH environment variable; default is 12 col_width = int(os.getenv("SERVICE_WIDTH", 12)) register_field("sport-service", bin_bytes = u16.size, column_width = col_width, rec_to_text = lambda rec: num_to_service(rec.sport), rec_to_bin = lambda rec: u16.pack(rec.sport), bin_to_text = bin_to_service) register_field("dport-service", bin_bytes = u16.size, column_width = col_width, rec_to_text = lambda rec: num_to_service(rec.dport), rec_to_bin = lambda rec: u16.pack(rec.dport), bin_to_text = bin_to_service) If this file is named service.py, it can be used by rwcut to print the source port and its service: $ rwcut --python-file service.py \ --fields sport,sport-service flowrec.rw Although the plug-in can be used with rwsort, the records will be sorted in the same order as the numerical source port or destination port. $ rwsort --python-file service.py \ --fields sport-service flowrec.rw > outfile.rw When used with rwuniq, it can count flows, bytes, and packets indexed by the service of the destination port: $ rwuniq --python-file service.py --fields dport-service \ --values=flows,bytes,packets flowrec.rw Create human-readable fields for all tools that use fieldsThe following example adds two fields, "hu-bytes" and "hu-packets", which can be used as either key fields or aggregate value fields. The example uses the formatting capabilities of netsa-python (<http://tools.netsa.cert.org/netsa-python/index.html>) to present the bytes and packets fields in a more human-friendly manner.When used as a key, the "hu-bytes" field presents the value 1234567 as 1205.6Ki or as 1234.6k when the HUMAN_USE_BINARY environment variable is set to "False". When used as a key, the "hu-packets" field adds a comma (or the character specified by the HUMAN_THOUSANDS_SEP environment variable) to the display of the packets field. The value 1234567 becomes 1,234,567. The "hu-bytes" and "hu-packets" fields can also be used as aggregate value fields, in which case they compute the sum of the bytes and packets, respectively, and display it as for the key field. The code for the plug-in is shown here, and an example of using the plug-in follows the code. import silk, silk.plugin import os, struct from netsa.data.format import num_prefix, num_fixed # Whether the use Base-2 (True) or Base-10 (False) values for # Kibi/Mebi/Gibi/Tebi/... vs Kilo/Mega/Giga/Tera/... use_binary = True if (os.getenv("HUMAN_USE_BINARY")): if (os.getenv("HUMAN_USE_BINARY").lower() == "false" or os.getenv("HUMAN_USE_BINARY") == "0"): use_binary = False else: use_binary = True # Character to use for Thousands separator thousands_sep = ',' if (os.getenv("HUMAN_THOUSANDS_SEP")): thousands_sep = os.getenv("HUMAN_THOUSANDS_SEP") # Number of significant digits sig_fig=5 # Use a 64-bit number for packing the bytes or packets data fmt = struct.Struct("Q") initial = fmt.pack(0) ### Bytes functions # add_rec_to_bin def hu_ar2b_bytes(rec, current): global fmt (cur,) = fmt.unpack(current) return fmt.pack(cur + rec.bytes) # rec_to_binary def hu_r2b_bytes(rec): global fmt return fmt.pack(rec.bytes) # bin_to_text def hu_b2t_bytes(current): global use_binary, sig_fig, fmt (cur,) = fmt.unpack(current) return num_prefix(cur, use_binary=use_binary, sig_fig=sig_fig) # rec_to_text def hu_r2t_bytes(rec): global use_binary, sig_fig return num_prefix(rec.bytes, use_binary=use_binary, sig_fig=sig_fig) ### Packets functions # add_rec_to_bin def hu_ar2b_packets(rec, current): global fmt (cur,) = fmt.unpack(current) return fmt.pack(cur + rec.packets) # rec_to_binary def hu_r2b_packets(rec): global fmt return fmt.pack(rec.packets) # bin_to_text def hu_b2t_packets(current): global thousands_sep, fmt (cur,) = fmt.unpack(current) return num_fixed(cur, dec_fig=0, thousands_sep=thousands_sep) # rec_to_text def hu_r2t_packets(rec): global thousands_sep return num_fixed(rec.packets, dec_fig=0, thousands_sep=thousands_sep) ### Non-specific functions # bin_compare def hu_bin_compare(cur1, cur2): if (cur1 < cur2): return -1 return (cur1 > cur2) # bin_merge def hu_bin_merge(current1, current2): global fmt (cur1,) = fmt.unpack(current1) (cur2,) = fmt.unpack(current2) return fmt.pack(cur1 + cur2) ### Register the fields register_field("hu-bytes", column_width=10, bin_bytes=fmt.size, rec_to_text=hu_r2t_bytes, rec_to_bin=hu_r2b_bytes, bin_to_text=hu_b2t_bytes, add_rec_to_bin=hu_ar2b_bytes, bin_merge=hu_bin_merge, bin_compare=hu_bin_compare, initial_value=initial) register_field("hu-packets", column_width=10, bin_bytes=fmt.size, rec_to_text=hu_r2t_packets, rec_to_bin=hu_r2b_packets, bin_to_text=hu_b2t_packets, add_rec_to_bin=hu_ar2b_packets, bin_merge=hu_bin_merge, bin_compare=hu_bin_compare, initial_value=initial) This shows an example of the plug-in's invocation and output when the code below is stored in the file human.py. $ rwstats --count=5 --no-percent --python-file=human.py \ --fields=proto,hu-bytes,hu-packets \ --values=records,hu-bytes,hu-packets data.rw INPUT: 501876 Records for 305417 Bins and 501876 Total Records OUTPUT: Top 5 Bins by Records pro| hu-bytes|hu-packets| Records| hu-bytes|hu-packets| 17| 328| 1| 15922| 4.98Mi| 15,922| 17| 76.0| 1| 15482| 1.12Mi| 15,482| 1| 840| 10| 5895| 4.72Mi| 58,950| 17| 68.0| 1| 4249| 282Ki| 4,249| 17| 67.0| 1| 4203| 275Ki| 4,203| Identifying SMTP ServersTo demonstrate the use of --python-file in rwfilter(1), we walk through a Python plug-in script that evaluates the behavior of a set of IP addresses and determines if the host is likely to be an SMTP server or relay. We expect (based on traffic studies) that more than 85% of a legitimate SMTP server's activity is devoted to sending or providing mail. If we find that the host exhibits this behavior, we include the IP address in a set called SMTP.set. Regardless of if the IP address is included in the set, we pass all records that appear to be legitimate mail flows.We run the rwfilter command as follows: $ rwfilter --start-date=2008/4/21 --end-date=2008/4/21 \ --type=out,outweb --sipset=possible_SMTP_servers.set \ --python-file=SMTP.py --print-statistics This command first collects all records of type "out" and "outweb" that have a start date on April 21, 2008. Since there are no additional command line options to filter records, all records are passed to the "rwfilter(rec)" function in SMTP.py. "rec" is an instance of the object "RWRec", which represent the record being passed. The function "rwfilter(rec)" in SMTP.py begins by importing the global variable "counts" and "smtpports". "counts" is a dictionary indexed by source IP address and contains an array of size two, where the first element is the total number of bytes that the IP address has transferred and the second element is the number of bytes that the source address has transferred that are likely to be related to mail delivery. Using the source IP address from the record, the function retrieves the current byte counts from the "counts" dictionary. If this is the first occurrence of the IP address, a new entry is added. The function then adds the byte count of this record to the total byte count and determines if the record is a mail delivery message. If it is a mail message, the function adds the bytes to the total of bytes transferred as mail and returns True. Otherwise, a value of False is returned. After rwfilter processes all records it calls the "finalize()" function, which evaluates the collection of IP addresses. If the percentage of bytes that the host transferred in mail operations is greater than 85% of the total bytes transferred, the IP address is added to a final set of SMTP servers. The final set of SMTP servers is then saved to the SMTP.set file, and rwfilter exits. from silk import * # Collection of ports commonly used by SMTP servers smtpports = set([25, 109, 110, 143, 220, 273, 993, 995, 113]) # Minimum percentage of mail traffic before being considered a mail server threshold = 0.85 # Collection of byte counts counts = dict() # This function is run over all records. # Input: An instance of the RWRec class representing the # current record being processesed # Output: True or false value indicating if the record passes # or fails the filter def rwfilter(rec): # Import the global variables needed for processing the record global smtpports, counts # Pull data from the record sip = rec.sip bytes = rec.bytes # Get a reference to the current data on the IP address in question data = counts.setdefault(sip, [0, 0]) # Update the total byte count for the IP address data[0] += bytes # Is the flow mail related? If so add the byte count to the mail bytes if (rec.protocol == 6 and rec.sport in smtpports and rec.packets > 3 and rec.bytes > 120): data[1] += bytes return True # If not mail related, fail the record return False # This is run after all records have been processed def finalize(): # Import the global vriables needed to evaluate the results global counts, threshold # The IP set of SMTP servers smtp = IPSet() # Iterate through all of the IP addresses. for ip, data in counts.iteritems(): if (float(data[1]) / data[0]) > threshold: smtp.add(ip) # Generate the IPset of all smtp servers. smtp.save('smtp.set') # Register these functions with rwfilter register_filter(rwfilter, finalize=finalize) UPGRADING LEGACY PLUGINSSome functions were marked as deprecated in SiLK 2.0, and have been removed in SiLK 3.0.Prior to SiLK 2.0, the register_field() function was called register_plugin_field(), and it had the following signature: register_plugin_field(field_name, [bin_len=bin_bytes_value,] [bin_to_text=bin_to_text_func,] [text_len=column_width_value,] [rec_to_bin=rec_to_bin_func,] [rec_to_text=rec_to_text_func]) To convert from register_plugin_field to register_field, change text_len to column_width, and change bin_len to bin_bytes. (Even older code may use field_len; this should be changed to column_width as well.) The register_filter() function was introduced in SiLK 2.0. In versions of SiLK prior to SiLK 3.0, when rwfilter was invoked with --python-file and the named Python file did not call register_filter(), rwfilter would search the Python input for functions named rwfilter() and finalize(). If it found the rwfilter() function, rwfilter would act as if the file contained: register_filter(rwfilter, finalize=finalize) To update your pre-SiLK 2.0 rwfilter plug-ins, simply add the above line to your Python file. ENVIRONMENT
SEE ALSOpysilk(3), rwfilter(1), rwcut(1), rwgroup(1), rwsort(1), rwstats(1), rwuniq(1), pmapfilter(3), silk(7), python(1), <http://docs.python.org/>
Visit the GSP FreeBSD Man Page Interface. |