|
NAMErwscan - Detect scanning activity in a SiLK datasetSYNOPSISrwscan [--scan-model=MODEL] [--output-path=PATH] [--trw-internal-set=SETFILE] [--trw-theta0=PROB] [--trw-theta1=PROB] [--no-titles] [--no-columns] [--column-separator=CHAR] [--no-final-delimiter] [{--delimited | --delimited=CHAR}] [--integer-ips] [--model-fields] [--scandb] [--threads=THREADS] [--queue-depth=DEPTH] [--verbose-progress=CIDR] [--verbose-flows] [ {--verbose-results | --verbose-results=NUM} ] [--site-config-file=FILENAME] [FILES...] rwscan --help rwscan --version DESCRIPTIONrwscan reads sorted SiLK Flow records, performs scan detection analysis on those records, and outputs textual columnar output for the scanning IP addresses. rwscan writes its out to the --output-path or to the standard output when --output-path is not specified.The types of scan detection analysis that rwscan supports are Threshold Random Walk (TRW) and Bayesian Logistic Regression (BLR). Details about these techniques are described in the "METHOD OF OPERATION" section below. rwscan is designed to write its data into a database. This database can be queried using the rwscanquery(1) tool. See the "EXAMPLES" section for the recommended database schema. The input to rwscan should be pre-sorted using rwsort(1) by the source IP, protocol, and destination IP (i.e., --fields=sip,proto,dip). rwscan reads SiLK Flow records from the files named on the command line or from the standard input when no file names are specified. To read the standard input in addition to the named files, use "-" or "stdin" as a file name. If an input file name ends in ".gz", the file is uncompressed as it is read. OPTIONSOption names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as --arg=param or --arg param, though the first form is required for options that take optional parameters.
METHOD OF OPERATIONrwscan's default behavior is to consult two scan detection models to determine whether a source is a scanner. The primary model used is the Threshold Random Walk (TRW) model. The TRW algorithm takes advantage of the tendency of scanners to attempt to contact a large number of IPs that do not exist on the target network.By keeping track of the number of "hits" (successful connections) and "misses" (attempts to connect to IP addresses that are not active on the target network), scanners can be detected quickly and with a high degree of accuracy. Sequential hypothesis testing is used to analyze the probability that a source is a scanner as each flow record is processed. Once the scan probability exceeds a configured maximum, the source is flagged as a scanner, and no further analysis of traffic from that host is necessary. The TRW model is not 100% accurate, however, and only finds scans in TCP flow data. In the case where the TRW model is inconclusive, a secondary model called BLR is invoked. BLR stands for "Bayesian Logistic Regression." Unlike TRW, the BLR approach must analyze all traffic from a given source IP to determine whether that IP is a scanner. Because of this, BLR operates much slower than TRW. However, the BLR model has been shown to detect scans that are not detected by the TRW model, particularly scans in UDP and ICMP data, and vertical TCP scans which focus on finding services on a single host. It does this by calculating metrics from the flow data from each source, and using those metrics to arrive at an overall likelihood that the flow data represents scanning activity. The metrics BLR uses for detecting scans in TCP flow data are:
The metrics BLR uses for detecting scans in UDP flow data are:
The metrics BLR uses for detecting scans in ICMP flow data are:
Because the TRW model has a lower false positive rate than the BLR model, any source identified as a scanner by TRW will be identified as a scanner by the hybrid model without consulting BLR. BLR is only invoked in the following cases:
In situations where the use of one model is preferred, the other model can be disabled using the --scan-model switch. This may have an impact on the performance and/or accuracy of the system. LIMITATIONSrwscan detects scans in IPv4 flows only.EXAMPLESIn the following examples, the dollar sign ("$") represents the shell prompt. The text after the dollar sign represents the command line. Lines have been wrapped for improved readability, and the back slash ("\") is used to indicate a wrapped line.Basic UsageAssuming a properly sorted SiLK Flow file as input, the basic usage for Bayesian Logistic Regression (BLR) scan detection requires only the input file, data.rw, and output file, scans.txt, arguments.$ rwscan --scan-model=2 --output-path=scans.txt data.rw Basic usage of Threshold Random Walk (TRW) scan detection requires the IP addresses of the targeted network (i.e., the internal IP space), specified in the internal.set IPset file. $ rwscan --trw-internal-set=internal.set --output-path=scans.txt data.rw Typical UsageMore commonly, an analyst uses rwfilter(1) to query the data repository for flow records within a time window. First, the analyst has rwset(1) put the source addresses of outgoing flow records into an IPset, resulting in the IPset containing the IPs of active hosts on the internal network. Next, the incoming traffic is piped to rwsort(1) and then to rwscan.$ rwfilter --start=2004/12/29:00 --type=out,outweb --all-dest=stdout \ | rwset --sip=internal.set $ rwfilter --start=2004/12/29:00 --type=in,inweb --all-dest=stdout \ | rwsort --fields=sip,proto,dip \ | rwscan --trw-internal-set=internal.set --scan-model=0 \ --output-path=scans.txt Storing Scans in a PostgreSQL DatabaseInstead of having the analyst run rwscan directly, often the output from rwscan is put into a database where it can be queried by rwscanquery(1). The output produced by the --scandb switch is suitable for loading into a database of scans. The process for using the PostgreSQL database is described in this section.Schemas for Oracle, MySQL, and SQLite are provided below, but the details to create users with the proper rolls are not included. Here is the schema for PostgreSQL: CREATE DATABASE scans CREATE SCHEMA scans CREATE SEQUENCE scans_id_seq CREATE TABLE scans ( id BIGINT NOT NULL DEFAULT nextval('scans_id_seq'), sip BIGINT NOT NULL, proto SMALLINT NOT NULL, stime TIMESTAMP without time zone NOT NULL, etime TIMESTAMP without time zone NOT NULL, flows BIGINT NOT NULL, packets BIGINT NOT NULL, bytes BIGINT NOT NULL, scan_model INTEGER NOT NULL, scan_prob FLOAT NOT NULL, PRIMARY KEY (id) ) CREATE INDEX scans_stime_idx ON scans (stime) CREATE INDEX scans_etime_idx ON scans (etime) ; A database user should be created for the purposes of populating the scan database, e.g.: CREATE USER rwscan WITH PASSWORD 'secret'; GRANT ALL PRIVILEGES ON DATABASE scans TO rwscan; Additionally, a user with read-only access should be created for use by the rwscanquery tool: CREATE USER rwscanquery WITH PASSWORD 'secret'; GRANT SELECT ON DATABASE scans TO rwscanquery; To import rwscan's --scandb output into a PostgreSQL database, use a command similar to the following: $ cat /tmp/scans.import.txt \ | psql -c \ "COPY scans \ (sip, proto, stime, etime, \ flows, packets, bytes, \ scan_model, scan_prob) \ FROM stdin DELIMITER as '|'" scans Sample Schema for OracleCREATE TABLE scans ( id integer unsigned not null unique, sip integer unsigned not null, proto tinyint unsigned not null, stime datetime not null, etime datetime not null, flows integer unsigned not null, packets integer unsigned not null, bytes integer unsigned not null, scan_model integer unsigned not null, scan_prob float unsigned not null, primary key (id) ); Sample Schema for MySQLCREATE TABLE scans ( id integer unsigned not null auto_increment, sip integer unsigned not null, proto tinyint unsigned not null, stime datetime not null, etime datetime not null, flows integer unsigned not null, packets integer unsigned not null, bytes integer unsigned not null, scan_model integer unsigned not null, scan_prob float unsigned not null, primary key (id), INDEX (stime), INDEX (etime) ) TYPE=InnoDB; Sample Schema and Import Command for SQLiteCREATE TABLE scans ( id INTEGER PRIMARY KEY AUTOINCREMENT, sip INTEGER NOT NULL, proto SMALLINT NOT NULL, stime TIMESTAMP NOT NULL, etime TIMESTAMP NOT NULL, flows INTEGER NOT NULL, packets INTEGER NOT NULL, bytes INTEGER NOT NULL, scan_model INTEGER NOT NULL, scan_prob FLOAT NOT NULL ); CREATE INDEX scans_stime_idx ON scans (stime); CREATE INDEX scans_etime_idx ON scans (etime); To import rwscan's --scandb output into a SQLite database, use the following command: $ perl -nwe 'chomp; print "INSERT INTO scans VALUES (NULL,", (join ",",map { / / ? qq("$_") : $_ } split /\|/), ");\n";' \ scans.txt | sqlite3 scans.sqlite ENVIRONMENT
FILES
SEE ALSOrwscanquery(1), rwfilter(1), rwsort(1), rwset(1), rwsetbuild(1), silk(7)BUGSWhen used in an IPv6 environment, rwscan converts IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix to IPv4. IPv6 records outside of that prefix are silently ignored.
Visit the GSP FreeBSD Man Page Interface. |