NAME

ipaggcreate - produce aggregate statistics of network traffic or trace

SYNOPSIS

ipaggcreate [-r | -i | --netflow-summary] [--src, --dst, --sport, --dport, ...] [other options] [files or interfaces]

DESCRIPTION

The ipaggcreate program reads IP packets from one or more data sources, maps each packet to a label (such as "source address 192.4.10.9" or "length 10"), and outputs a simply-formatted "aggregate" file reporting the number of packets or bytes observed per label. The resulting file is easy to process with text-based tools. (But see the --binary option, which generates a compressed, quick-to-process binary file.)

Here are a couple lines of ipaggcreate output, from `ipaggcreate -s /home/kohler/largedump.gz':

  !IPAggregate 1.0
  !creator "src/ipaggcreate -s /home/kohler/largedump.gz"
  !counts packets
  !times 976937726.638704 977337361.804592 399635.165888
  !num_nonzero 1437
  !ip
  4.2.49.2 1
  4.2.49.4 1
  4.17.143.9 1
  4.21.203.29 104

The `-s' option, which is equivalent to `--src', tells ipaggcreate to categorize each packet by its source IP address. `/home/kohler/largedump.gz' is a compressed tcpdump(1) file. Each data line represents a label; the first field is the label number (here, an IP source address), and the second field the number of packets that had that label. Labels with 0 counts are not reported.

OPTIONS

Data Sources

Data source options tell ipaggcreate what kind of data source to use: tcpdump(1) raw-packet files (--tcpdump), live network interfaces (--interface), NetFlow summary files (--netflow-summary), ipsumdump output files (--ipsumdump), DAG or NLANR-formatted files (--dag, --nlanr), or others.

Non-option arguments specify the files, or interfaces, to read. For example, `ipaggcreate -r eth0 eth1' will read two tcpdump(1) files, named "eth0" and "eth1"; `ipaggcreate -i eth0 eth1' will read from two live network interfaces, "eth0" and "eth1".

Options that read files read from the standard input when you supply a single dash "-" as a filename, or when you give no filenames at all.

--tcpdump, -r

Read from one or more files produced by tcpdump(1)'s -w option (also known as "pcap files"). Stop when all the files are exhausted. This is the default. Files (except for standard input) may be compressed by gzip(1) or bzip2(1); ipsumdump will uncompress them on the fly.

--interface, -i

Read from live network interfaces. When run this way, ipsumdump will continue until interrupted with SIGINT or SIGHUP. When stopped, ipsumdump appends a comment to its output file, indicating how many packets were dropped by the kernel before output.

--ipsumdump

Read from one or more ipsumdump files. Any packet characteristics not specified by the input files are set to 0.

--format=format

Read from one or more ipsumdump files, using the specified default format. The format should be a space-separated list of content types; see ToIPSummaryDump(n) for a list.

--dag[=encap]

Read from one or more DAG-formatted trace files. For new-style ERF dumps, which contain encapsulation type information, just say --dag. For old-style dumps, you must supply the right encap argument: "ATM" for ATM RFC-1483 encapsulation (the most common), "ETHER" for Ethernet, "PPP" for PPP, "IP" for raw IP, "HDLC" for Cisco HDLC, "PPP_HDLC" for PPP HDLC, or "SUNATM" for Sun ATM. See <http://dag.cs.waikato.ac.nz/>.

--nlanr

Read from one or more NLANR-formatted trace files (fr, fr+, or tsh format). See <http://pma.nlanr.net/Traces/>.

--ip-addresses

Read files containing IP addresses, one address per line. The label must be either --src or --dst.

--tu-summary

Read TCP/UDP summary files. Each line represents one packet, and carries the following information: timestamp, source address, source port, destination address, destination port, protocol, payload length. For example:

  976937735.345744 18.26.4.9 22 64.55.139.202 26876 T 0
  976937770.197008 128.10.5.110 63749 64.55.139.202 113 T 5

--bro-conn-summary

Read Bro connection summary files. Each line represents one connection attempt, and carries the following information: timestamp, source address, destination address, direction (inbound/outbound).

--netflow-summary

Read from one or more NetFlow summary files. These are line-oriented ASCII files; blank lines, and lines starting with '!' or '#', are ignored. Other lines should contain 15 or more fields separated by vertical bars '|'. Ipsumdump pays attention to some of these fields:

  Field  Meaning                       Example
  -----  ----------------------------  ----------
  0      Source IP address             192.4.1.32
  1      Destination IP address        18.26.4.44
  5      Packet count in flow          5
  6      Byte count in flow            10932
  7      Flow timestamp (UNIX-style)   998006995
  8      Flow end timestamp            998006999
  9      Source port                   3917
  10     Destination port              80
  12     TCP flags (OR of all pkts)    18
  13     IP protocol                   6
  14     IP TOS bits                   0

--tcpdump-text

Read from one or more files containing tcpdump(1) textual output. It's much better to use the binary files produced by 'tcpdump -w', but if someone threw those away and all you have is the ASCII output, you can still make do. Only works with tcpdump versions 3.7 and earlier.

Label

These options determine how packets are labeled; you can supply at most one.

--src, -s: Label by IP source address; all packets with the same source address form an aggregate.
--dst, -d: Label by IP destination address. This is the default.
--length, -l: Label by IP length.
--ip field: Label by the named IP field. Examples include "ip src" (equivalent to --src), "ip ttl", "ip off", "udp sport", and so forth. See AggregateIP(1) for a full list.
--flows: Label by TCP or UDP flow, or, essentially, by end-to-end transport-level connection. Two packets have the same label if and only if they are part of the same TCP or UDP connection. Each flow is assigned its own label. The label number is not meaningful; non-TCP/UDP packets are ignored.
--unidirectional-flows: Label by unidirectional TCP or UDP flow. Like --flows, but packets from a single connection but heading in different directions are assigned different labels.
--address-pairs: Label by address pair. Two packets have the same label if and only if they involve the same pair of IP addresses. The label number is not meaningful.
--unidirectional-address-pairs: Label by unidirectional address pair. Two packets have the same label if and only if their source addresses match and their destination address match.

Measurement Options

These options specify whether ipaggcreate should count packets or bytes.

--packets: Count packets: the output file will report the number of packets per label. This is the default.
--bytes, -B: Count bytes: the output file will report the number of bytes per label. This number includes IP and transport headers, but not any link headers.

Limit and Split Options

These options select portions of the trace file, and allow the user to split trace data into multiple aggregate files.

--time-offset=time, -T time: Ignore the first time worth of packets in the input trace. If the first packet has timestamp T, then all packets (including the first) with timestamp less than T+time are ignored. The time argument can be an absolute number of seconds (938.42), or use suffixes such as "100s", "12ms", "1.5min", "2hr", and so forth.
--start-time=time: Ignore packets with timestamps less than time.
--interval=time, -t time: Stop after recording aggregate information for time worth of packets. That is, if the first recorded packet has timestamp T, then ipaggcreate will exit just before the first packet with timestamp T+time, or the end of the trace, whichever comes first.
--limit-labels=count: Stop after recording information for count distinct labels. That is, exit just before encountering a packet with the count+1 different label, or at the end of the trace, whichever comes first.

The four --split options generate multiple aggregate output files based on characteristics of the input. To use --split, you must supply an explicit --output filename containing a "%d"-style template; a file number is plugged in to that template. For example, the template "file%03d.txt" will generate files "file001.txt", "file002.txt", and so forth.

--split-time=time: Start a new output file every time period. That is, each file will contain data for at most time worth of packets.
--split-labels=count: Start a new output file every count distinct labels. That is, each file will contain at most count different labels.
--split-packets=count: Start a new output file every count packets.
--split-bytes=count: Start a new output file every count bytes.

Other Options

--output=file, -o file: Write the summary dump to file instead of to the standard output.
--binary, -b: Write the summary dump in binary format. See below for more information.
--write-tcpdump=file, -w file: Write processed packets to a tcpdump(1) file -- or to the standard output, if file is a single dash "-" -- in addition to the usual summary output.
--filter=filter, -f filter: Only include packets and flows matching a tcpdump(1) filter. For example, `ipsumdump -f "tcp && src net 18/8"' will summarize data only for TCP packets from net 18. (The syntax for filter is currently a subset of tcpdump's syntax.)
--anonymize, -A: Anonymize IP addresses in the output. The anonymization preserves prefix and class. This means, first, that two anonymized addresses will share the same prefix when their non-anonymized counterparts share the same prefix; and second, that anonymized addresses will be in the same class (A, B, C, or D) as their non-anonymized counterparts. The anonymization algorithm comes from tcpdpriv(1); it works like `tcpdpriv -A50 -C4'.
If --anonymize and --write-tcpdump are both on, the tcpdump output file will have anonymized IP addresses. However, the file will contain actual packet data, unlike tcpdpriv output.
--no-promiscuous: Do not place interfaces into promiscuous mode. Promiscuous mode is the default.
--sample=p: Sample packets with probability p. That is, p is the chance that a packet will cause output to be generated. The actual probability may differ from the specified probability, due to fixed point arithmetic; check the output for a `"!sampling_prob"' comment to see the real probability. Strictly speaking, this option samples records, not packets, so for NetFlow summaries without --multipacket, it will sample flows.
--multipacket: Supply this option if you are reading NetFlow or IP summaries -- files where each record might represent multiple packets -- and you would like the output summary to have one line per packet, instead of the default one line per record. See also --packet-count, above.
--collate: Sort output packets by increasing timestamp. Use this option when reading from multiple tcpdump(1) files to ensure that the output has sorted timestamps. Combine --collate with --write-tcpdump to collate overlapping tcpdump(1) files into a single, sorted tcpdump(1) file.
--random-seed=seed: Set the random seed deterministically to seed, an unsigned integer. By default, the random seed is initialized to a random value using /dev/random, if it exists, combined with other data. The random seed indirectly determines which packets are sampled, and the values of anonymized IP addresses.
--quiet, -q: Do not print a progress bar to standard error. This is the default when ipsumdump isn't running interactively.
--config: Do not produce a summary. Instead, write the Click configuration that ipsumdump would run to the standard output.
--verbose, -V: Produce more verbose error messages.
--help, -h: Print a help message to the standard output, then exit.
--version, -v: Print version number and license information to the standard output, then exit.

SIGNALS

When killed with SIGTERM or SIGINT, ipaggcreate will exit cleanly (and generate an output file). If you want it to flush its buffers without exiting, kill it with SIGHUP.

BINARY FORMAT

Binary ipaggcreate files begin with several ASCII lines, just like regular ipaggcreate files. A line `"!packed_be"' or `"!packed_le"' indicates that the rest of the file, starting immediately after the newline, consists of binary records (in big-endian or little-endian order, respectively). Each record is 8 bytes long, and looks like this:

   +---------------+---------------+
   |     label     |     count     |
   +---------------+---------------+
    <---4 bytes---> <---4 bytes--->

The initial word of data contains the label number, the second the count.

CLICK

The ipaggcreate program uses the Click modular router, an extensible system for processing packets. Click routers consist of C++ components called elements. While some elements run only in a Linux kernel, most can run either in the kernel or in user space, and there are user-level elements for reading packets from libpcap or from tcpdump files.

Ipaggcreate creates and runs a user-level Click configuration. However, you don't need to install Click to run ipsumdump; the libclick directory contains all the relevant parts of Click, bundled into a library.

If you're curious, try running `ipaggcreate --config' with some other options to see the Click configuration ipsumdump would run.

This is, I think, a pleasant way to write a packet processor!

AUTHOR

Eddie Kohler <kohler@cs.ucla.edu>, based on the Click modular router.

Anonymization algorithm from tcpdpriv(1) by Greg Minshall.