|
|
| |
PT-QUERY-DIGEST(1) |
User Contributed Perl Documentation |
PT-QUERY-DIGEST(1) |
pt-query-digest - Analyze MySQL queries from logs, processlist, and tcpdump.
Usage: pt-query-digest [OPTIONS] [FILES] [DSN]
pt-query-digest analyzes MySQL queries from slow, general, and
binary log files. It can also analyze queries from
"SHOW PROCESSLIST" and MySQL protocol data
from tcpdump. By default, queries are grouped by fingerprint and reported in
descending order of query time (i.e. the slowest queries first). If no
"FILES" are given, the tool reads
"STDIN". The optional
"DSN" is used for certain options like
"--since" and "--until".
Report the slowest queries from
"slow.log":
pt-query-digest slow.log
Report the slowest queries from the processlist on host1:
pt-query-digest --processlist h=host1
Capture MySQL protocol data with tcppdump, then report the slowest
queries:
tcpdump -s 65535 -x -nn -q -tttt -i any -c 1000 port 3306 > mysql.tcp.txt
pt-query-digest --type tcpdump mysql.tcp.txt
Save query data from "slow.log"
to host2 for later review and trend analysis:
pt-query-digest --review h=host2 --no-report slow.log
Percona Toolkit is mature, proven in the real world, and well tested, but all
database tools can pose a risk to the system and the database server. Before
using this tool, please:
- Read the tool's documentation
- Review the tool's known "BUGS"
- Test the tool on a non-production server
- Backup your production server and verify the backups
pt-query-digest is a sophisticated but easy to use tool for analyzing MySQL
queries. It can analyze queries from MySQL slow, general, and binary logs.
(Binary logs must first be converted to text, see "--type"). It can
also use "SHOW PROCESSLIST" and MySQL
protocol data from tcpdump. By default, the tool reports which queries are the
slowest, and therefore the most important to optimize. More complex and
custom-tailored reports can be created by using options like
"--group-by", "--filter", and
"--embedded-attributes".
Query analysis is a best-practice that should be done frequently.
To make this easier, pt-query-digest has two features: query review
("--review") and query history ("--history"). When the
"--review" option is used, all unique queries are saved to a
database. When the tool is ran again with "--review", queries
marked as reviewed in the database are not printed in the report. This
highlights new queries that need to be reviewed. When the
"--history" option is used, query metrics (query time, lock time,
etc.) for each unique query are saved to database. Each time the tool is ran
with "--history", the more historical data is saved which can be
used to trend and analyze query performance over time.
pt-query-digest works on events, which are a collection of key-value pairs
called attributes. You'll recognize most of the attributes right away:
"Query_time",
"Lock_time", and so on. You can just look at
a slow log and see them. However, there are some that don't exist in the slow
log, and slow logs may actually include different kinds of attributes (for
example, you may have a server with the Percona patches).
See "ATTRIBUTES REFERENCE" near the end of this
documentation for a list of common and "--type" specific
attributes. A familiarity with these attributes is necessary for working
with "--filter", "--ignore-attributes", and other
attribute-related options.
With creative use of "--filter", you can create new
attributes derived from existing attributes. For example, to create an
attribute called "Row_ratio" for examining
the ratio of "Rows_sent" to
"Rows_examined", specify a filter
like:
--filter '($event->{Row_ratio} = $event->{Rows_sent} / ($event->{Rows_examined})) && 1'
The "&& 1" trick is
needed to create a valid one-line syntax that is always true, even if the
assignment happens to evaluate false. The new attribute will automatically
appears in the output:
# Row ratio 1.00 0.00 1 0.50 1 0.71 0.50
Attributes created this way can be specified for
"--order-by" or any option that requires an attribute.
The default "--output" is a query analysis report. The
"--[no]report" option controls whether or not this report is
printed. Sometimes you may want to parse all the queries but suppress the
report, for example when using "--review" or "--history".
There is one paragraph for each class of query analyzed. A
"class" of queries all have the same value for the
"--group-by" attribute which is
"fingerprint" by default. (See
"ATTRIBUTES".) A fingerprint is an abstracted version of the query
text with literals removed, whitespace collapsed, and so forth. The report
is formatted so it's easy to paste into emails without wrapping, and all
non-query lines begin with a comment, so you can save it to a .sql file and
open it in your favorite syntax-highlighting text editor. There is a
response-time profile at the beginning.
The output described here is controlled by
"--report-format". That option allows you to specify what to print
and in what order. The default output in the default order is described
here.
The report, by default, begins with a paragraph about the entire
analysis run The information is very similar to what you'll see for each
class of queries in the log, but it doesn't have some information that would
be too expensive to keep globally for the analysis. It also has some
statistics about the code's execution itself, such as the CPU and memory
usage, the local date and time of the run, and a list of input file
read/parsed.
Following this is the response-time profile over the events. This
is a highly summarized view of the unique events in the detailed query
report that follows. It contains the following columns:
Column Meaning
============ ==========================================================
Rank The query's rank within the entire set of queries analyzed
Query ID The query's fingerprint
Response time The total response time, and percentage of overall total
Calls The number of times this query was executed
R/Call The mean response time per execution
V/M The Variance-to-mean ratio of response time
Item The distilled query
A final line whose rank is shown as MISC contains aggregate
statistics on the queries that were not included in the report, due to
options such as "--limit" and "--outliers". For details
on the variance-to-mean ratio, please see
http://en.wikipedia.org/wiki/Index_of_dispersion.
Next, the detailed query report is printed. Each query appears in
a paragraph. Here is a sample, slightly reformatted so 'perldoc' will not
wrap lines in a terminal. The following will all be one paragraph, but we'll
break it up for commentary.
# Query 2: 0.01 QPS, 0.02x conc, ID 0xFDEA8D2993C9CAF3 at byte 160665
This line identifies the sequential number of the query in the
sort order specified by "--order-by". Then there's the queries per
second, and the approximate concurrency for this query (calculated as a
function of the timespan and total Query_time). Next there's a query ID.
This ID is a hex version of the query's checksum in the database, if you're
using "--review". You can select the reviewed query's details from
the database with a query like "SELECT
.... WHERE checksum=0xFDEA8D2993C9CAF3".
If you are investigating the report and want to print out every
sample of a particular query, then the following "--filter" may be
helpful:
pt-query-digest slow.log \
--no-report \
--output slowlog \
--filter '$event->{fingerprint} \
&& make_checksum($event->{fingerprint}) eq "FDEA8D2993C9CAF3"'
Notice that you must remove the
"0x" prefix from the checksum.
Finally, in case you want to find a sample of the query in the log
file, there's the byte offset where you can look. (This is not always
accurate, due to some anomalies in the slow log format, but it's usually
right.) The position refers to the worst sample, which we'll see more about
below.
Next is the table of metrics about this class of queries.
# pct total min max avg 95% stddev median
# Count 0 2
# Exec time 13 1105s 552s 554s 553s 554s 2s 553s
# Lock time 0 216us 99us 117us 108us 117us 12us 108us
# Rows sent 20 6.26M 3.13M 3.13M 3.13M 3.13M 12.73 3.13M
# Rows exam 0 6.26M 3.13M 3.13M 3.13M 3.13M 12.73 3.13M
The first line is column headers for the table. The percentage is
the percent of the total for the whole analysis run, and the total is the
actual value of the specified metric. For example, in this case we can see
that the query executed 2 times, which is 13% of the total number of queries
in the file. The min, max and avg columns are self-explanatory. The 95%
column shows the 95th percentile; 95% of the values are less than or equal
to this value. The standard deviation shows you how tightly grouped the
values are. The standard deviation and median are both calculated from the
95th percentile, discarding the extremely large values.
The stddev, median and 95th percentile statistics are approximate.
Exact statistics require keeping every value seen, sorting, and doing some
calculations on them. This uses a lot of memory. To avoid this, we keep 1000
buckets, each of them 5% bigger than the one before, ranging from .000001 up
to a very big number. When we see a value we increment the bucket into which
it falls. Thus we have fixed memory per class of queries. The drawback is
the imprecision, which typically falls in the 5 percent range.
Next we have statistics on the users, databases and time range for
the query.
# Users 1 user1
# Databases 2 db1(1), db2(1)
# Time range 2008-11-26 04:55:18 to 2008-11-27 00:15:15
The users and databases are shown as a count of distinct values,
followed by the values. If there's only one, it's shown alone; if there are
many, we show each of the most frequent ones, followed by the number of
times it appears.
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms #####
# 100ms ####################
# 1s ##########
# 10s+
The execution times show a logarithmic chart of time clustering.
Each query goes into one of the "buckets" and is counted up. The
buckets are powers of ten. The first bucket is all values in the
"single microsecond range" -- that is, less than 10us. The second
is "tens of microseconds," which is from 10us up to (but not
including) 100us; and so on. The charted attribute can be changed by
specifying "--report-histogram" but is limited to time-based
attributes.
# Tables
# SHOW TABLE STATUS LIKE 'table1'\G
# SHOW CREATE TABLE `table1`\G
# EXPLAIN
SELECT * FROM table1\G
This section is a convenience: if you're trying to optimize the
queries you see in the slow log, you probably want to examine the table
structure and size. These are copy-and-paste-ready commands to do that.
Finally, we see a sample of the queries in this class of query.
This is not a random sample. It is the query that performed the worst,
according to the sort order given by "--order-by". You will
normally see a commented "# EXPLAIN" line
just before it, so you can copy-paste the query to examine its EXPLAIN plan.
But for non-SELECT queries that isn't possible to do, so the tool tries to
transform the query into a roughly equivalent SELECT query, and adds that
below.
If you want to find this sample event in the log, use the offset
mentioned above, and something like the following:
tail -c +<offset> /path/to/file | head
See also "--report-format".
A query "--review" is the process of storing all the query
fingerprints analyzed. This has several benefits:
- You can add metadata to classes of queries, such as marking them for
follow-up, adding notes to queries, or marking them with an issue ID for
your issue tracking system.
- You can refer to the stored values on subsequent runs so you'll know
whether you've seen a query before. This can help you cut down on
duplicated work.
- You can store historical data such as the row count, query times, and
generally anything you can see in the report.
To use this feature, you run pt-query-digest with the
"--review" option. It will store the fingerprints and other
information into the table you specify. Next time you run it with the same
option, it will do the following:
- It won't show you queries you've already reviewed. A query is considered
to be already reviewed if you've set a value for the
"reviewed_by" column. (If you want to
see queries you've already reviewed, use the "--report-all"
option.)
- Queries that you've reviewed, and don't appear in the output, will cause
gaps in the query number sequence in the first line of each paragraph. And
the value you've specified for "--limit" will still be honored.
So if you've reviewed all queries in the top 10 and you ask for the top
10, you won't see anything in the output.
- If you want to see the queries you've already reviewed, you can specify
"--report-all". Then you'll see the normal analysis output, but
you'll also see the information from the review table, just below the
execution time graph. For example,
# Review information
# comments: really bad IN() subquery, fix soon!
# first_seen: 2008-12-01 11:48:57
# jira_ticket: 1933
# last_seen: 2008-12-18 11:49:07
# priority: high
# reviewed_by: xaprb
# reviewed_on: 2008-12-18 15:03:11
This metadata is useful because, as you analyze your queries,
you get your comments integrated right into the report.
A query fingerprint is the abstracted form of a query, which makes it possible
to group similar queries together. Abstracting a query removes literal values,
normalizes whitespace, and so on. For example, consider these two queries:
SELECT name, password FROM user WHERE id='12823';
select name, password from user
where id=5;
Both of those queries will fingerprint to
select name, password from user where id=?
Once the query's fingerprint is known, we can then talk about a
query as though it represents all similar queries.
What "pt-query-digest" does is
analogous to a GROUP BY statement in SQL. (But note that "multiple
columns" doesn't define a multi-column grouping; it defines multiple
reports!) If your command-line looks like this,
pt-query-digest \
--group-by fingerprint \
--order-by Query_time:sum \
--limit 10 \
slow.log
The corresponding pseudo-SQL looks like this:
SELECT WORST(query BY Query_time), SUM(Query_time), ...
FROM /path/to/slow.log
GROUP BY FINGERPRINT(query)
ORDER BY SUM(Query_time) DESC
LIMIT 10
You can also use the value
"distill", which is a kind of
super-fingerprint. See "--group-by" for more.
Query fingerprinting accommodates many special cases, which have
proven necessary in the real world. For example, an
"IN" list with 5 literals is really
equivalent to one with 4 literals, so lists of literals are collapsed to a
single one. If you find something that is not fingerprinted properly, please
submit a bug report with a reproducible test case.
Here is a list of transformations during fingerprinting, which
might not be exhaustive:
- Group all SELECT queries from mysqldump together, even if they are against
different tables. The same applies to all queries from
pt-table-checksum.
- Shorten multi-value INSERT statements to a single VALUES()
list.
- Strip comments.
- Abstract the databases in USE statements, so all USE statements are
grouped together.
- Replace all literals, such as quoted strings. For efficiency, the code
that replaces literal numbers is somewhat non-selective, and might replace
some things as numbers when they really are not. Hexadecimal literals are
also replaced. NULL is treated as a literal. Numbers embedded in
identifiers are also replaced, so tables named similarly will be
fingerprinted to the same values (e.g. users_2009 and users_2010 will
fingerprint identically).
- Collapse all whitespace into a single space.
- Lowercase the entire query.
- Replace all literals inside of IN() and VALUES() lists with
a single placeholder, regardless of cardinality.
- Collapse multiple identical UNION queries into a single one.
This tool accepts additional command-line arguments. Refer to the
"SYNOPSIS" and usage information for details.
- --ask-pass
- Prompt for a password when connecting to MySQL.
- --attribute-aliases
- type: array; default: db|Schema
List of attribute|alias,etc.
Certain attributes have multiple names, like db and Schema. If
an event does not have the primary attribute, pt-query-digest looks for
an alias attribute. If it finds an alias, it creates the primary
attribute with the alias attribute's value and removes the alias
attribute.
If the event has the primary attribute, all alias attributes
are deleted.
This helps simplify event attributes so that, for example,
there will not be report lines for both db and Schema.
- --attribute-value-limit
- type: int; default: 0
A sanity limit for attribute values.
This option deals with bugs in slow logging functionality that
causes large values for attributes. If the attribute's value is bigger
than this, the last-seen value for that class of query is used instead.
Disabled by default.
- --charset
- short form: -A; type: string
Default character set. If the value is utf8, sets Perl's
binmode on STDOUT to utf8, passes the mysql_enable_utf8 option to
DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any other
value sets binmode on STDOUT without the utf8 layer, and runs SET NAMES
after connecting to MySQL.
- --config
- type: Array
Read this comma-separated list of config files; if specified,
this must be the first option on the command line.
- --[no]continue-on-error
- default: yes
Continue parsing even if there is an error. The tool will not
continue forever: it stops once any process causes 100 errors, in which
case there is probably a bug in the tool or the input is invalid.
- --[no]create-history-table
- default: yes
Create the "--history" table if it does not
exist.
This option causes the table specified by
"--history" to be created with the default structure shown in
the documentation for "--history".
- --[no]create-review-table
- default: yes
Create the "--review" table if it does not
exist.
This option causes the table specified by "--review"
to be created with the default structure shown in the documentation for
"--review".
- --daemonize
- Fork to the background and detach from the shell. POSIX operating systems
only.
- --database
- short form: -D; type: string
Connect to this database.
- --defaults-file
- short form: -F; type: string
Only read mysql options from the given file. You must give an
absolute pathname.
- --embedded-attributes
- type: array
Two Perl regex patterns to capture pseudo-attributes embedded
in queries.
Embedded attributes might be special attribute-value pairs
that you've hidden in comments. The first regex should match the entire
set of attributes (in case there are multiple). The second regex should
match and capture attribute-value pairs from the first regex.
For example, suppose your query looks like the following:
SELECT * from users -- file: /login.php, line: 493;
You might run pt-query-digest with the following option:
pt-query-digest --embedded-attributes ' -- .*','(\w+): ([^\,]+)'
The first regular expression captures the whole comment:
" -- file: /login.php, line: 493;"
The second one splits it into attribute-value pairs and adds
them to the event:
ATTRIBUTE VALUE
========= ==========
file /login.php
line 493
NOTE: All commas in the regex patterns must be escaped
with \ otherwise the pattern will break.
- --expected-range
- type: array; default: 5,10
Explain items when there are more or fewer than expected.
Defines the number of items expected to be seen in the report
given by "--[no]report", as controlled by "--limit"
and "--outliers". If there are more or fewer items in the
report, each one will explain why it was included.
- --explain
- type: DSN
Run EXPLAIN for the sample query with this DSN and print
results.
This works only when "--group-by" includes
fingerprint. It causes pt-query-digest to run EXPLAIN and include the
output into the report. For safety, queries that appear to have a
subquery that EXPLAIN will execute won't be EXPLAINed. Those are
typically "derived table" queries of the form
select ... from ( select .... ) der;
The EXPLAIN results are printed as a full vertical format in
the event report, which appears at the end of each event report in
vertical style ("\G") just like MySQL
prints it.
- --filter
- type: string
Discard events for which this Perl code doesn't return
true.
This option is a string of Perl code or a file containing Perl
code that gets compiled into a subroutine with one argument:
$event. This is a hashref. If the given value is
a readable file, then pt-query-digest reads the entire file and uses its
contents as the code. The file should not contain a shebang
(#!/usr/bin/perl) line.
If the code returns true, the chain of callbacks continues;
otherwise it ends. The code is the last statement in the subroutine
other than "return $event". The
subroutine template is:
sub { $event = shift; filter && return $event; }
Filters given on the command line are wrapped inside
parentheses like like "( filter )".
For complex, multi-line filters, you must put the code inside a file so
it will not be wrapped inside parentheses. Either way, the filter must
produce syntactically valid code given the template. For example, an
if-else branch given on the command line would not be valid:
--filter 'if () { } else { }' # WRONG
Since it's given on the command line, the if-else branch would
be wrapped inside parentheses which is not syntactically valid. So to
accomplish something more complex like this would require putting the
code in a file, for example filter.txt:
my $event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok
Then specify "--filter
filter.txt" to read the code from filter.txt.
If the filter code won't compile, pt-query-digest will die
with an error. If the filter code does compile, an error may still occur
at runtime if the code tries to do something wrong (like pattern match
an undefined value). pt-query-digest does not provide any safeguards so
code carefully!
An example filter that discards everything but SELECT
statements:
--filter '$event->{arg} =~ m/^select/i'
This is compiled into a subroutine like the following:
sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; }
It is permissible for the code to have side effects (to alter
$event).
See "ATTRIBUTES REFERENCE" for a list of common and
"--type" specific attributes.
Here are more examples of filter code:
- Host/IP matches domain.com
- --filter '($event->{host} || $event->{ip} ||
"") =~ m/domain.com/'
Sometimes MySQL logs the host where the IP is expected.
Therefore, we check both.
- User matches john
- --filter '($event->{user} || "") =~ m/john/'
- More than 1 warning
- --filter '($event->{Warning_count} || 0) > 1'
- Query does full table scan or full join
- --filter '(($event->{Full_scan} || "") eq "Yes") ||
(($event->{Full_join} || "") eq "Yes")'
- Query was not served from query cache
- --filter '($event->{QC_Hit} || "") eq "No"'
- Query is 1 MB or larger
- --filter '$event->{bytes} >= 1_048_576'
Since "--filter" allows you to alter
$event, you can use it to do other things, like
create new attributes. See "ATTRIBUTES" for an example.
- --group-by
- type: Array; default: fingerprint
Which attribute of the events to group by.
In general, you can group queries into classes based on any
attribute of the query, such as "user"
or "db", which will by default show
you which users and which databases get the most
"Query_time". The default attribute,
"fingerprint", groups similar,
abstracted queries into classes; see below and see also
"FINGERPRINTS".
A report is printed for each "--group-by" value
(unless "--no-report" is given).
Therefore, "--group-by user,db" means
"report on queries with the same user and report on queries with
the same db"; it does not mean "report on queries with the
same user and db." See also "OUTPUT".
Every value must have a corresponding value in the same
position in "--order-by". However, adding values to
"--group-by" will automatically add values to
"--order-by", for your convenience.
There are several magical values that cause some extra data
mining to happen before the grouping takes place:
- fingerprint
- This causes events to be fingerprinted to abstract queries into a
canonical form, which is then used to group events together into a class.
See "FINGERPRINTS" for more about fingerprinting.
- tables
- This causes events to be inspected for what appear to be tables, and then
aggregated by that. Note that a query that contains two or more tables
will be counted as many times as there are tables; so a join against two
tables will count the Query_time against both tables.
- distill
- This is a sort of super-fingerprint that collapses queries down into a
suggestion of what they do, such as "INSERT SELECT
table1 table2".
- --help
- Show help and exit.
- --history
- type: DSN
Save metrics for each query class in the given table.
pt-query-digest saves query metrics (query time, lock time, etc.) to
this table so you can see how query classes change over time.
The default table is
"percona_schema.query_history".
Specify database (D) and table (t) DSN options to override the default.
The database and table are automatically created unless
"--no-create-history-table" is
specified (see "--[no]create-history-table").
pt-query-digest inspects the columns in the table. The table
must have at least the following columns:
CREATE TABLE query_review_history (
checksum CHAR(32) NOT NULL,
sample TEXT NOT NULL
);
Any columns not mentioned above are inspected to see if they
follow a certain naming convention. The column is special if the name
ends with an underscore followed by any of these values:
pct|avg|cnt|sum|min|max|pct_95|stddev|median|rank
If the column ends with one of those values, then the prefix
is interpreted as the event attribute to store in that column, and the
suffix is interpreted as the metric to be stored. For example, a column
named "Query_time_min" will be used to
store the minimum "Query_time" for the
class of events.
The table should also have a primary key, but that is up to
you, depending on how you want to store the historical data. We suggest
adding ts_min and ts_max columns and making them part of the primary key
along with the checksum. But you could also just add a ts_min column and
make it a DATE type, so you'd get one row per class of queries per
day.
The following table definition is used for
"--[no]create-history-table":
CREATE TABLE IF NOT EXISTS query_history (
checksum CHAR(32) NOT NULL,
sample TEXT NOT NULL,
ts_min DATETIME,
ts_max DATETIME,
ts_cnt FLOAT,
Query_time_sum FLOAT,
Query_time_min FLOAT,
Query_time_max FLOAT,
Query_time_pct_95 FLOAT,
Query_time_stddev FLOAT,
Query_time_median FLOAT,
Lock_time_sum FLOAT,
Lock_time_min FLOAT,
Lock_time_max FLOAT,
Lock_time_pct_95 FLOAT,
Lock_time_stddev FLOAT,
Lock_time_median FLOAT,
Rows_sent_sum FLOAT,
Rows_sent_min FLOAT,
Rows_sent_max FLOAT,
Rows_sent_pct_95 FLOAT,
Rows_sent_stddev FLOAT,
Rows_sent_median FLOAT,
Rows_examined_sum FLOAT,
Rows_examined_min FLOAT,
Rows_examined_max FLOAT,
Rows_examined_pct_95 FLOAT,
Rows_examined_stddev FLOAT,
Rows_examined_median FLOAT,
-- Percona extended slowlog attributes
-- http://www.percona.com/docs/wiki/patches:slow_extended
Rows_affected_sum FLOAT,
Rows_affected_min FLOAT,
Rows_affected_max FLOAT,
Rows_affected_pct_95 FLOAT,
Rows_affected_stddev FLOAT,
Rows_affected_median FLOAT,
Rows_read_sum FLOAT,
Rows_read_min FLOAT,
Rows_read_max FLOAT,
Rows_read_pct_95 FLOAT,
Rows_read_stddev FLOAT,
Rows_read_median FLOAT,
Merge_passes_sum FLOAT,
Merge_passes_min FLOAT,
Merge_passes_max FLOAT,
Merge_passes_pct_95 FLOAT,
Merge_passes_stddev FLOAT,
Merge_passes_median FLOAT,
InnoDB_IO_r_ops_min FLOAT,
InnoDB_IO_r_ops_max FLOAT,
InnoDB_IO_r_ops_pct_95 FLOAT,
InnoDB_IO_r_ops_stddev FLOAT,
InnoDB_IO_r_ops_median FLOAT,
InnoDB_IO_r_bytes_min FLOAT,
InnoDB_IO_r_bytes_max FLOAT,
InnoDB_IO_r_bytes_pct_95 FLOAT,
InnoDB_IO_r_bytes_stddev FLOAT,
InnoDB_IO_r_bytes_median FLOAT,
InnoDB_IO_r_wait_min FLOAT,
InnoDB_IO_r_wait_max FLOAT,
InnoDB_IO_r_wait_pct_95 FLOAT,
InnoDB_IO_r_wait_stddev FLOAT,
InnoDB_IO_r_wait_median FLOAT,
InnoDB_rec_lock_wait_min FLOAT,
InnoDB_rec_lock_wait_max FLOAT,
InnoDB_rec_lock_wait_pct_95 FLOAT,
InnoDB_rec_lock_wait_stddev FLOAT,
InnoDB_rec_lock_wait_median FLOAT,
InnoDB_queue_wait_min FLOAT,
InnoDB_queue_wait_max FLOAT,
InnoDB_queue_wait_pct_95 FLOAT,
InnoDB_queue_wait_stddev FLOAT,
InnoDB_queue_wait_median FLOAT,
InnoDB_pages_distinct_min FLOAT,
InnoDB_pages_distinct_max FLOAT,
InnoDB_pages_distinct_pct_95 FLOAT,
InnoDB_pages_distinct_stddev FLOAT,
InnoDB_pages_distinct_median FLOAT,
-- Boolean (Yes/No) attributes. Only the cnt and sum are needed
-- for these. cnt is how many times is attribute was recorded,
-- and sum is how many of those times the value was Yes. So
-- sum/cnt * 100 equals the percentage of recorded times that
-- the value was Yes.
QC_Hit_cnt FLOAT,
QC_Hit_sum FLOAT,
Full_scan_cnt FLOAT,
Full_scan_sum FLOAT,
Full_join_cnt FLOAT,
Full_join_sum FLOAT,
Tmp_table_cnt FLOAT,
Tmp_table_sum FLOAT,
Tmp_table_on_disk_cnt FLOAT,
Tmp_table_on_disk_sum FLOAT,
Filesort_cnt FLOAT,
Filesort_sum FLOAT,
Filesort_on_disk_cnt FLOAT,
Filesort_on_disk_sum FLOAT,
PRIMARY KEY(checksum, ts_min, ts_max)
);
Note that we store the count (cnt) for the ts attribute only;
it will be redundant to store this for other attributes.
Starting from Percona Toolkit 3.0.11, the checksum function
has been updated to use 32 chars in the MD5 sum. This causes the
checksum field in the history table will have a different value than in
the previous versions of the tool.
- --host
- short form: -h; type: string
Connect to host.
- --ignore-attributes
- type: array; default: arg, cmd, insert_id, ip, port, Thread_id, timestamp,
exptime, flags, key, res, val, server_id, offset, end_log_pos, Xid
Do not aggregate these attributes. Some attributes are not
query metrics but metadata which doesn't need to be (or can't be)
aggregated.
- --inherit-attributes
- type: array; default: db,ts
If missing, inherit these attributes from the last event that
had them.
This option sets which attributes are inherited or carried
forward to events which do not have them. For example, if one event has
the db attribute equal to "foo", but the next event doesn't
have the db attribute, then it inherits "foo" for its db
attribute.
- --interval
- type: float; default: .1
How frequently to poll the processlist, in seconds.
- --iterations
- type: int; default: 1
How many times to iterate through the collect-and-report
cycle. If 0, iterate to infinity. Each iteration runs for
"--run-time" amount of time. An iteration is usually
determined by an amount of time and a report is printed when that amount
of time elapses. With "--run-time-mode"
"interval", an interval is instead
determined by the interval time you specify with "--run-time".
See "--run-time" and "--run-time-mode" for more
information.
- --limit
- type: Array; default: 95%:20
Limit output to the given percentage or count.
If the argument is an integer, report only the top N worst
queries. If the argument is an integer followed by the
"%" sign, report that percentage of
the worst queries. If the percentage is followed by a colon and another
integer, report the top percentage or the number specified by that
integer, whichever comes first.
The value is actually a comma-separated array of values, one
for each item in "--group-by". If you don't specify a value
for any of those items, the default is the top 95%.
See also "--outliers".
- --log
- type: string
Print all output to this file when daemonized.
- --max-hostname-length
- type: int; default: 10
Trim host names in reports to this length. 0=Do not trim host
names.
- --max-line-length
- type: int; default: 74
Trim lines to this length. 0=Do not trim lines.
- --order-by
- type: Array; default: Query_time:sum
Sort events by this attribute and aggregate function.
This is a comma-separated list of order-by expressions, one
for each "--group-by" attribute. The default
"Query_time:sum" is used for
"--group-by" attributes without explicitly given
"--order-by" attributes (that is, if you specify more
"--group-by" attributes than corresponding
"--order-by" attributes). The syntax is
"attribute:aggregate". See
"ATTRIBUTES" for valid attributes. Valid aggregates are:
Aggregate Meaning
========= ============================
sum Sum/total attribute value
min Minimum attribute value
max Maximum attribute value
cnt Frequency/count of the query
For example, the default
"Query_time:sum" means that queries in
the query analysis report will be ordered (sorted) by their total query
execution time ("Exec time").
"Query_time:max" orders the queries by
their maximum query execution time, so the query with the single largest
"Query_time" will be list first.
"cnt" refers more to the frequency of
the query as a whole, how often it appears; "Count" is its
corresponding line in the query analysis report. So any attribute and
"cnt" should yield the same report
wherein queries are sorted by the number of times they appear.
When parsing general logs ("--type"
"genlog"), the default
"--order-by" becomes
"Query_time:cnt". General logs do not
report query times so only the "cnt"
aggregate makes sense because all query times are zero.
If you specify an attribute that doesn't exist in the events,
then pt-query-digest falls back to the default
"Query_time:sum" and prints a notice
at the beginning of the report for each query class. You can create
attributes with "--filter" and order by them; see
"ATTRIBUTES" for an example.
- --outliers
- type: array; default: Query_time:1:10
Report outliers by attribute:percentile:count.
The syntax of this option is a comma-separated list of
colon-delimited strings. The first field is the attribute by which an
outlier is defined. The second is a number that is compared to the
attribute's 95th percentile. The third is optional, and is compared to
the attribute's cnt aggregate. Queries that pass this specification are
added to the report, regardless of any limits you specified in
"--limit".
For example, to report queries whose 95th percentile
Query_time is at least 60 seconds and which are seen at least 5 times,
use the following argument:
--outliers Query_time:60:5
You can specify an --outliers option for each value in
"--group-by".
- --output
- type: string; default: report
How to format and print the query analysis results. Accepted
values are:
VALUE FORMAT
======= ==============================
report Standard query analysis report
slowlog MySQL slow log
json JSON, on array per query class
json-anon JSON without example queries
secure-slowlog JSON without example queries
The entire "report" output
can be disabled by specifying
"--no-report" (see
"--[no]report"), and its sections can be disabled or
rearranged by specifying "--report-format".
"json" output was introduced
in 2.2.1 and is still in development, so the data structure may change
in future versions.
- --password
- short form: -p; type: string
Password to use when connecting. If password contains commas
they must be escaped with a backslash: "exam\,ple"
- --pid
- type: string
Create the given PID file. The tool won't start if the PID
file already exists and the PID it contains is different than the
current PID. However, if the PID file exists and the PID it contains is
no longer running, the tool will overwrite the PID file with the current
PID. The PID file is removed automatically when the tool exits.
- --port
- short form: -P; type: int
Port number to use for connection.
- --preserve-embedded-numbers
- Preserve numbers in database/table names when fingerprinting queries. The
standar fingeprint method replaces numbers in db/tables names, making a
query like 'SELECT * FROM db1.table2' to be figerprinted as 'SELECT * FROM
db?.table?'. This option changes that behaviour and the fingerprint will
become 'SELECT * FROM db1.table2'.
- --processlist
- type: DSN
Poll this DSN's processlist for queries, with
"--interval" sleep between.
If the connection fails, pt-query-digest tries to reopen it
once per second.
- --progress
- type: array; default: time,30
Print progress reports to STDERR. The value is a
comma-separated list with two parts. The first part can be percentage,
time, or iterations; the second part specifies how often an update
should be printed, in percentage, seconds, or number of iterations.
- --read-timeout
- type: time; default: 0
Wait this long for an event from the input; 0 to wait
forever.
This option sets the maximum time to wait for an event from
the input. It applies to all types of input except
"--processlist". If an event is not received after the
specified time, the script stops reading the input and prints its
reports. If "--iterations" is 0 or greater than 1, the next
iteration will begin, else the script will exit.
This option requires the Perl POSIX module.
- --[no]report
- default: yes
Print query analysis reports for each "--group-by"
attribute. This is the standard slow log analysis functionality. See
"OUTPUT" for the description of what this does and what the
results look like.
If you don't need a report (for example, when using
"--review" or "--history"), it is best to specify
"--no-report" because this allows the
tool to skip some expensive operations.
- --report-all
- Report all queries, even ones that have been reviewed. This only affects
the "report" "--output" when
using "--review". Otherwise, all queries are always
printed.
- --report-format
- type: Array; default:
rusage,date,hostname,files,header,profile,query_report,prepared
Print these sections of the query analysis report.
SECTION PRINTS
============ ======================================================
rusage CPU times and memory usage reported by ps
date Current local date and time
hostname Hostname of machine on which pt-query-digest was run
files Input files read/parse
header Summary of the entire analysis run
profile Compact table of queries for an overview of the report
query_report Detailed information about each unique query
prepared Prepared statements
The sections are printed in the order specified. The rusage,
date, files and header sections are grouped together if specified
together; other sections are separated by blank lines.
See "OUTPUT" for more information on the various
parts of the query report.
- --report-histogram
- type: string; default: Query_time
Chart the distribution of this attribute's values.
The distribution chart is limited to time-based attributes, so
charting "Rows_examined", for example,
will produce a useless chart. Charts look like:
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms ###########################
# 100ms ########################################################
# 1s ########
# 10s+
See "OUTPUT" for more information.
- --resume
- type: string
If specified, the tool writes the last file offset, if there
is one, to the given filename. When ran again with the same value for
this option, the tool reads the last file offset from the file, seeks to
that position in the log, and resumes parsing events from that point
onward.
- --review
- type: DSN
Save query classes for later review, and don't report already
reviewed classes.
The default table is
"percona_schema.query_review". Specify
database (D) and table (t) DSN options to override the default. The
database and table are automatically created unless
"--no-create-review-table" is
specified (see "--[no]create-review-table").
If the table was created manually, it must have at least the
following columns. You can add more columns for your own special
purposes, but they won't be used by pt-query-digest.
CREATE TABLE IF NOT EXISTS query_review (
checksum CHAR(32) NOT NULL PRIMARY KEY,
fingerprint TEXT NOT NULL,
sample TEXT NOT NULL,
first_seen DATETIME,
last_seen DATETIME,
reviewed_by VARCHAR(20),
reviewed_on DATETIME,
comments TEXT
)
The columns are:
COLUMN MEANING
=========== ====================================================
checksum A 64-bit checksum of the query fingerprint
fingerprint The abstracted version of the query; its primary key
sample The query text of a sample of the class of queries
first_seen The smallest timestamp of this class of queries
last_seen The largest timestamp of this class of queries
reviewed_by Initially NULL; if set, query is skipped thereafter
reviewed_on Initially NULL; not assigned any special meaning
comments Initially NULL; not assigned any special meaning
Note that the "fingerprint"
column is the true primary key for a class of queries. The
"checksum" is just a cryptographic
hash of this value, which provides a shorter value that is very likely
to also be unique.
After parsing and aggregating events, your table should
contain a row for each fingerprint. This option depends on
"--group-by fingerprint" (which is the
default). It will not work otherwise.
- --run-time
- type: time
How long to run for each "--iterations". The default
is to run forever (you can interrupt with CTRL-C). Because
"--iterations" defaults to 1, if you only specify
"--run-time", pt-query-digest runs for that amount of time and
then exits. The two options are specified together to do
collect-and-report cycles. For example, specifying
"--iterations" 4
"--run-time" "15m" with a
continuous input (like STDIN or "--processlist") will cause
pt-query-digest to run for 1 hour (15 minutes x 4), reporting four
times, once at each 15 minute interval.
- --run-time-mode
- type: string; default: clock
Set what the value of "--run-time" operates on.
Following are the possible values for this option:
- clock
- "--run-time" specifies an amount of real clock time during which
the tool should run for each "--iterations".
- event
- "--run-time" specifies an amount of log time. Log time is
determined by timestamps in the log. The first timestamp seen is
remembered, and each timestamp after that is compared to the first to
determine how much log time has passed. For example, if the first
timestamp seen is "12:00:00" and the
next is "12:01:30", that is 1 minute and
30 seconds of log time. The tool will read events until the log time is
greater than or equal to the specified "--run-time" value.
Since timestamps in logs are not always printed, or not always
printed frequently, this mode varies in accuracy.
- interval
- "--run-time" specifies interval boundaries of log time into
which events are divided and reports are generated. This mode is different
from the others because it doesn't specify how long to run. The value of
"--run-time" must be an interval that divides evenly into
minutes, hours or days. For example,
"5m" divides evenly into hours (60/5=12,
so 12 5 minutes intervals per hour) but
"7m" does not (60/7=8.6).
Specifying "--run-time-mode interval
--run-time 30m --iterations 0" is similar to specifying
"--run-time-mode clock --run-time 30m --iterations
0". In the latter case, pt-query-digest will run forever,
producing reports every 30 minutes, but this only works effectively with
continuous inputs like STDIN and the processlist. For fixed inputs, like
log files, the former example produces multiple reports by dividing the
log into 30 minutes intervals based on timestamps.
Intervals are calculated from the zeroth second/minute/hour in
which a timestamp occurs, not from whatever time it specifies. For
example, with 30 minute intervals and a timestamp of
"12:10:30", the interval is not
"12:10:30" to
"12:40:30", it is
"12:00:00" to
"12:29:59". Or, with 1 hour intervals,
it is "12:00:00" to
"12:59:59". When a new timestamp
exceeds the interval, a report is printed, and the next interval is
recalculated based on the new timestamp.
Since "--iterations" is 1 by default, you probably
want to specify a new value else pt-query-digest will only get and
report on the first interval from the log since 1 interval = 1
iteration. If you want to get and report every interval in a log,
specify "--iterations" 0.
- --sample
- type: int
Filter out all but the first N occurrences of each query. The
queries are filtered on the first value in "--group-by", so by
default, this will filter by query fingerprint. For example,
"--sample 2" will permit two sample
queries for each fingerprint. Useful in conjunction with
"--output slowlog" to print the
queries. You probably want to set
"--no-report" to avoid the overhead of
aggregating and reporting if you're just using this to print out samples
of queries. A complete example:
pt-query-digest --sample 2 --no-report --output slowlog slow.log
- --slave-user
- type: string
Sets the user to be used to connect to the slaves. This
parameter allows you to have a different user with less privileges on
the slaves but that user must exist on all slaves.
- --slave-password
- type: string
Sets the password to be used to connect to the slaves. It can
be used with --slave-user and the password for the user must be the same
on all slaves.
- --set-vars
- type: Array
Set the MySQL variables in this comma-separated list of
"variable=value" pairs.
By default, the tool sets:
wait_timeout=10000
Variables specified on the command line override these
defaults. For example, specifying "--set-vars
wait_timeout=500" overrides the defaultvalue of
10000.
The tool prints a warning and continues if a variable cannot
be set.
- --show-all
- type: Hash
Show all values for these attributes.
By default pt-query-digest only shows as many of an
attribute's value that fit on a single line. This option allows you to
specify attributes for which all values will be shown (line width is
ignored). This only works for attributes with string values like user,
host, db, etc. Multiple attributes can be specified,
comma-separated.
- --since
- type: string
Parse only queries newer than this value (parse queries since
this date).
This option allows you to ignore queries older than a certain
value and parse only those queries which are more recent than the value.
The value can be several types:
* Simple time value N with optional suffix: N[shmd], where
s=seconds, h=hours, m=minutes, d=days (default s if no suffix
given); this is like saying "since N[shmd] ago"
* Full date with optional hours:minutes:seconds:
YYYY-MM-DD [HH:MM:SS]
* Short, MySQL-style date:
YYMMDD [HH:MM:SS]
* Any time expression evaluated by MySQL:
CURRENT_DATE - INTERVAL 7 DAY
If you give a MySQL time expression, and you have not also
specified a DSN for "--explain", "--processlist", or
"--review", then you must specify a DSN on the command line so
that pt-query-digest can connect to MySQL to evaluate the
expression.
The MySQL time expression is wrapped inside a query like
"SELECT UNIX_TIMESTAMP(<expression>)", so be sure that
the expression is valid inside this query. For example, do not use
UNIX_TIMESTAMP() because UNIX_TIMESTAMP(UNIX_TIMESTAMP())
returns 0.
Events are assumed to be in chronological: older events at the
beginning of the log and newer events at the end of the log.
"--since" is strict: it ignores all queries until one is found
that is new enough. Therefore, if the query events are not consistently
timestamped, some may be ignored which are actually new enough.
See also "--until".
- --socket
- short form: -S; type: string
Socket file to use for connection.
- --timeline
- Show a timeline of events.
This option makes pt-query-digest print another kind of
report: a timeline of the events. Each query is still grouped and
aggregate into classes according to "--group-by", but then
they are printed in chronological order. The timeline report prints out
the timestamp, interval, count and value of each classes.
If all you want is the timeline report, then specify
"--no-report" to suppress the default
query analysis report. Otherwise, the timeline report will be printed at
the end before the response-time profile (see
"--report-format" and "OUTPUT").
For example, this:
pt-query-digest /path/to/log --group-by distill --timeline
will print something like:
# ########################################################
# distill report
# ########################################################
# 2009-07-25 11:19:27 1+00:00:01 2 SELECT foo
# 2009-07-27 11:19:30 00:01 2 SELECT bar
# 2009-07-27 11:30:00 1+06:30:00 2 SELECT foo
- --type
- type: Array; default: slowlog
The type of input to parse. The permitted types are
- binlog
- Parse a binary log file that has first been converted to text using
mysqlbinlog.
For example:
mysqlbinlog mysql-bin.000441 > mysql-bin.000441.txt
pt-query-digest --type binlog mysql-bin.000441.txt
- genlog
- Parse a MySQL general log file. General logs lack a lot of
"ATTRIBUTES", notably
"Query_time". The default
"--order-by" for general logs changes to
"Query_time:cnt".
- slowlog
- Parse a log file in any variation of MySQL slow log format.
- tcpdump
- Inspect network packets and decode the MySQL client protocol, extracting
queries and responses from it.
pt-query-digest does not actually watch the network (i.e. it
does NOT "sniff packets"). Instead, it's just parsing the
output of tcpdump. You are responsible for generating this output;
pt-query-digest does not do it for you. Then you send this to
pt-query-digest as you would any log file: as files on the command line
or to STDIN.
The parser expects the input to be formatted with the
following options: "-x -n
-q -tttt". For example, if you want to
capture output from your local machine, you can do something like the
following (the port must come last on FreeBSD):
tcpdump -s 65535 -x -nn -q -tttt -i any -c 1000 port 3306 \
> mysql.tcp.txt
pt-query-digest --type tcpdump mysql.tcp.txt
The other tcpdump parameters, such as -s, -c, and -i, are up
to you. Just make sure the output looks like this (there is a line break
in the first line to avoid man-page problems):
2009-04-12 09:50:16.804849 IP 127.0.0.1.42167
> 127.0.0.1.3306: tcp 37
0x0000: 4508 0059 6eb2 4000 4006 cde2 7f00 0001
0x0010: ....
Remember tcpdump has a handy -c option to stop after it
captures some number of packets! That's very useful for testing your
tcpdump command. Note that tcpdump can't capture traffic on a Unix
socket. Read <http://bugs.mysql.com/bug.php?id=31577> if you're
confused about this.
Devananda Van Der Veen explained on the MySQL Performance Blog
how to capture traffic without dropping packets on busy servers. Dropped
packets cause pt-query-digest to miss the response to a request, then
see the response to a later request and assign the wrong execution time
to the query. You can change the filter to something like the following
to help capture a subset of the queries. (See
<http://www.mysqlperformanceblog.com/?p=6092> for details.)
tcpdump -i any -s 65535 -x -n -q -tttt \
'port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2'
All MySQL servers running on port 3306 are automatically
detected in the tcpdump output. Therefore, if the tcpdump out contains
packets from multiple servers on port 3306 (for example, 10.0.0.1:3306,
10.0.0.2:3306, etc.), all packets/queries from all these servers will be
analyzed together as if they were one server.
If you're analyzing traffic for a MySQL server that is not
running on port 3306, see "--watch-server".
Also note that pt-query-digest may fail to report the database
for queries when parsing tcpdump output. The database is discovered only
in the initial connect events for a new client or when <USE db> is
executed. If the tcpdump output contains neither of these, then
pt-query-digest cannot discover the database.
Server-side prepared statements are supported. SSL-encrypted
traffic cannot be inspected and decoded.
- rawlog
- Raw logs are not MySQL logs but simple text files with one SQL statement
per line, like:
SELECT c FROM t WHERE id=1
/* Hello, world! */ SELECT * FROM t2 LIMIT 1
INSERT INTO t (a, b) VALUES ('foo', 'bar')
INSERT INTO t SELECT * FROM monkeys
Since raw logs do not have any metrics, many options and
features of pt-query-digest do not work with them.
One use case for raw logs is ranking queries by count when the
only information available is a list of queries, from polling
"SHOW PROCESSLIST" for example.
- --until
- type: string
Parse only queries older than this value (parse queries until
this date).
This option allows you to ignore queries newer than a certain
value and parse only those queries which are older than the value. The
value can be one of the same types listed for "--since".
Unlike "--since", "--until" is not strict:
all queries are parsed until one has a timestamp that is equal to or
greater than "--until". Then all subsequent queries are
ignored.
- --user
- short form: -u; type: string
User for login if not current user.
- --variations
- type: Array
Report the number of variations in these attributes'
values.
Variations show how many distinct values an attribute had
within a class. The usual value for this option is
"arg" which shows how many distinct
queries were in the class. This can be useful to determine a query's
cacheability.
Distinct values are determined by CRC32 checksums of the
attributes' values. These checksums are reported in the query report for
attributes specified by this option, like:
# arg crc 109 (1/25%), 144 (1/25%)... 2 more
In that class there were 4 distinct queries. The checksums of
the first two variations are shown, and each one occurred once (or, 25%
of the time).
The counts of distinct variations is approximate because only
1,000 variations are saved. The mod (%) 1000 of the full CRC32 checksum
is saved, so some distinct checksums are treated as equal.
- --version
- Show version and exit.
- --[no]version-check
- default: yes
Check for the latest version of Percona Toolkit, MySQL, and
other programs.
This is a standard "check for updates automatically"
feature, with two additional features. First, the tool checks its own
version and also the versions of the following software: operating
system, Percona Monitoring and Management (PMM), MySQL, Perl, MySQL
driver for Perl (DBD::mysql), and Percona Toolkit. Second, it checks for
and warns about versions with known problems. For example, MySQL 5.5.25
had a critical bug and was re-released as 5.5.25a.
A secure connection to Percona’s Version Check database
server is done to perform these checks. Each request is logged by the
server, including software version numbers and unique ID of the checked
system. The ID is generated by the Percona Toolkit installation script
or when the Version Check database call is done for the first time.
Any updates or known problems are printed to STDOUT before the
tool's normal output. This feature should never interfere with the
normal operation of the tool.
For more information, visit
<https://www.percona.com/doc/percona-toolkit/LATEST/version-check.html>.
- --[no]vertical-format
- default: yes
Output a trailing "\G" in the reported SQL
queries.
This makes the mysql client display the result using vertical
format. Non-native MySQL clients like phpMyAdmin do not support
this.
- --watch-server
- type: string
This option tells pt-query-digest which server IP address and
port (like "10.0.0.1:3306") to watch when parsing tcpdump (for
"--type" tcpdump); all other servers are ignored. If you don't
specify it, pt-query-digest watches all servers by looking for any IP
address using port 3306 or "mysql". If you're watching a
server with a non-standard port, this won't work, so you must specify
the IP address and port to watch.
If you want to watch a mix of servers, some running on
standard port 3306 and some running on non-standard ports, you need to
create separate tcpdump outputs for the non-standard port servers and
then specify this option for each. At present pt-query-digest cannot
auto-detect servers on port 3306 and also be told to watch a server on a
non-standard port.
These DSN options are used to create a DSN. Each option is given like
"option=value". The options are
case-sensitive, so P and p are not the same option. There cannot be whitespace
before or after the "=" and if the value
contains whitespace it must be quoted. DSN options are comma-separated. See
the percona-toolkit manpage for full details.
- A
dsn: charset; copy: yes
Default character set.
- D
dsn: database; copy: yes
Default database to use when connecting to MySQL.
- F
dsn: mysql_read_default_file; copy: yes
Only read default options from the given file.
- h
dsn: host; copy: yes
Connect to host.
- p
dsn: password; copy: yes
Password to use when connecting. If password contains commas
they must be escaped with a backslash: "exam\,ple"
- P
dsn: port; copy: yes
Port number to use for connection.
- S
dsn: mysql_socket; copy: yes
Socket file to use for connection.
- t
The "--review" or "--history" table.
- u
dsn: user; copy: yes
User for login if not current user.
The environment variable "PTDEBUG" enables
verbose debugging output to STDERR. To enable debugging and capture all output
to a file, run the tool like:
PTDEBUG=1 pt-query-digest ... > FILE 2>&1
Be careful: debugging output is voluminous and can generate
several megabytes of output.
You need Perl, DBI, DBD::mysql, and some core packages that ought to be
installed in any reasonably new version of Perl.
For a list of known bugs, see
<http://www.percona.com/bugs/pt-query-digest>.
Please report bugs at
<https://jira.percona.com/projects/PT>. Include the following
information in your bug report:
- Complete command-line used to run the tool
- Tool "--version"
- MySQL version of all servers involved
- Output from the tool including STDERR
- Input files (log/dump/config files, etc.)
If possible, include debugging output by running the tool with
"PTDEBUG"; see
"ENVIRONMENT".
Visit <http://www.percona.com/software/percona-toolkit/> to download the
latest release of Percona Toolkit. Or, get the latest release from the command
line:
wget percona.com/get/percona-toolkit.tar.gz
wget percona.com/get/percona-toolkit.rpm
wget percona.com/get/percona-toolkit.deb
You can also get individual tools from the latest release:
wget percona.com/get/TOOL
Replace "TOOL" with the name of
any tool.
Events may have the following attributes. If writing a "--filter", be
sure to check that an attribute is defined in each event before using it, else
the filter code may crash the tool with a "use of uninitialized
value" error.
You can dump event attributes for any input like:
$ pt-query-digest \
slow.log \
--filter 'print Dumper $event' \
--no-report \
--sample 1
That will produce a lot of output with "attribute =>
value" pairs like:
$VAR1 = {
Query_time => '0.033384',
Rows_examined => '0',
Rows_sent => '0',
Thread_id => '10',
Tmp_table => 'No',
Tmp_table_on_disk => 'No',
arg => 'SELECT col FROM tbl WHERE id=5',
bytes => 103,
cmd => 'Query',
db => 'db1',
fingerprint => 'select col from tbl where id=?',
host => '',
pos_in_log => 1334,
ts => '071218 11:48:27',
user => '[SQL_SLAVE]'
};
These attribute are common to all input "--type" and
"--processlist", except where noted.
- arg
- The query text, or the command for admin commands like
"Ping".
- bytes
- The byte length of the "arg".
- cmd
- "Query" or "Admin".
- db
- The current database. The value comes from USE database statements. By
default, "Schema" is an alias which is
automatically changed to "db"; see
"--attribute-aliases".
- fingerprint
- An abstracted form of the query. See "FINGERPRINTS".
- host
- Client host which executed the query.
- pos_in_log
- The byte offset of the event in the log or tcpdump, except for
"--processlist".
- Query_time
- The total time the query took, including lock time.
- ts
- The timestamp of when the query ended.
Events have all available attributes from the log file. Therefore, you only need
to look at the log file to see which events are available, but remember: not
all events have the same attributes.
Percona Server adds many attributes to the slow log; see
http://www.percona.com/docs/wiki/patches:slow_extended for more
information.
These attributes are available when parsing "--type" tcpdump.
- Error_no
- The MySQL error number if the query caused an error.
- ip
- The client's IP address. Certain log files may also contain this
attribute.
- No_good_index_used
- Yes or No if no good index existed for the query (flag set by
server).
- No_index_used
- Yes or No if the query did not use any index (flag set by server).
- port
- The client's port number.
- Warning_count
- The number of warnings, as otherwise shown by "SHOW
WARNINGS".
If using "--processlist", an "id"
attribute is available for the process ID, in addition to the common
attributes.
Baron Schwartz, Daniel Nichter, and Brian Fraser
This tool is part of Percona Toolkit, a collection of advanced command-line
tools for MySQL developed by Percona. Percona Toolkit was forked from two
projects in June, 2011: Maatkit and Aspersa. Those projects were created by
Baron Schwartz and primarily developed by him and Daniel Nichter. Visit
<http://www.percona.com/software/> to learn about other free,
open-source software from Percona.
This program is copyright 2008-2018 Percona LLC and/or its affiliates.
THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as published by
the Free Software Foundation, version 2; OR the Perl Artistic License. On
UNIX and similar systems, you can issue `man perlgpl' or `man perlartistic'
to read these licenses.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software Foundation,
Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
Hey! The above document had some coding errors, which are explained
below:
- Around line 16637:
- Non-ASCII character seen before =encoding in 'Percona’s'. Assuming
UTF-8
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |