|  | 
   
 |   |  |   
  
    | PT-STALK(1) | User Contributed Perl Documentation | PT-STALK(1) |  
pt-stalk - Collect forensic data about MySQL when problems
  occur. Usage: pt-stalk [OPTIONS] pt-stalk waits for a trigger condition to occur, then collects
    data to help diagnose problems. The tool is designed to run as a daemon with
    root privileges, so that you can diagnose intermittent problems that you
    cannot observe directly. You can also use it to execute a custom command, or
    to collect data on demand without waiting for the trigger to occur. Percona Toolkit is mature, proven in the real world, and well
    tested, but all database tools can pose a risk to the system and the
    database server. Before using this tool, please: 
  Read the tool's documentationReview the tool's known "BUGS"Test the tool on a non-production serverBackup your production server and verify the backups Sometimes a problem happens infrequently and for a short time,
    giving you no chance to see the system when it happens. How do you solve
    intermittent MySQL problems when you can't observe them? That's why pt-stalk
    exists. In addition to using it when there's a known problem on your
    servers, it is a good idea to run pt-stalk all the time, even when you think
    nothing is wrong. You will appreciate the data it collects when a problem
    occurs, because problems such as MySQL lockups or spikes in activity
    typically leave no evidence to use in root cause analysis. pt-stalk does two things: it watches a MySQL server and waits for
    a trigger condition to occur, and it collects diagnostic data when that
    trigger occurs. To avoid false-positives caused by short-lived problems, the
    trigger condition must be true at least "--cycles" times before a
    "--collect" is triggered. To use pt-stalk effectively, you need to define a good trigger. A
    good trigger is sensitive enough to fire reliably when a problem occurs, so
    that you don't miss a chance to solve problems. On the other hand, a good
    trigger isn't prone to false positives, so you don't gather information when
    the server is functioning normally. The most reliable triggers for MySQL tend to be the number of
    connections to the server, and the number of queries running concurrently.
    These are available in the SHOW GLOBAL STATUS command as Threads_connected
    and Threads_running. Sometimes Threads_connected is not a reliable indicator
    of trouble, but Threads_running usually is. Your job, as the tool's user, is
    to define an appropriate trigger condition for the tool. Choose carefully,
    because the quality of your results will depend on the trigger you
  choose. You define the trigger with the "--function",
    "--variable", "--threshold", and "--cycles"
    options. The default values for these options define a reasonable trigger,
    but you should adjust or change them to suite your particular system and
    needs. By default, pt-stalk tool watches MySQL forever until the trigger
    occurs, then it collects diagnostic data for a while, and sleeps afterwards
    to avoid repeatedly collecting data if the trigger remains true. The general
    order of operations is:    while true; do
      if --variable from --function > --threshold; then
         cycles_true++
         if cycles_true >= --cycles; then
            --notify-by-email
            if --collect; then
               if --disk-bytes-free and --disk-pct-free ok; then
                  (--collect for --run-time seconds) &
               fi
               rm files in --dest older than --retention-time
            fi
            iter++
            cycles_true=0
         fi
         if iter < --iterations; then
            sleep --sleep seconds
         else
            break
         fi
      else
         if iter < --iterations; then
            sleep --interval seconds
         else
            break
         fi
      fi
   done
   rm old --dest files older than --retention-time
   if --collect process are still running; then
      wait up to --run-time * 3 seconds
      kill any remaining --collect processes
   fi
The diagnostic data is written to files whose names begin with a
    timestamp, so you can distinguish samples from each other in case the tool
    collects data multiple times. The pt-sift tool is designed to help you
    browse and analyze the resulting data samples. Although this sounds simple enough, in practice there are a number
    of subtleties, such as detecting when the disk is beginning to fill up so
    that the tool doesn't cause the server to run out of disk space. This tool
    handles these types of potential problems, so it's a good idea to use this
    tool instead of writing something from scratch and possibly experiencing
    some of the hazards this tool is designed to avoid. You can use standard Percona Toolkit configuration files to set
    command line options. You will probably want to run the tool as a daemon and customize
    at least the "--threshold". Here's a sample configuration file for
    triggering when there are more than 20 queries running at once:   daemonize
  threshold=20
 If you don't run the tool as root, then you will need specify
    several options, such as "--pid", "--log", and
    "--dest", else the tool will probably fail to start. 
  --ask-passPrompt for a password when connecting to MySQL.--collectdefault: yes; negatable: yes
    Collect diagnostic data when the trigger occurs. Specify
        "--no-collect" to make the tool watch
        the system but not collect data. See also "--stalk".--collect-gdbCollect GDB stacktraces. This is achieved by attaching to MySQL and
      printing stack traces from all threads. This will freeze the server for
      some period of time, ranging from a second or so to much longer on very
      busy systems with a lot of memory and many threads in the server. For this
      reason, it is disabled by default. However, if you are trying to diagnose
      a server stall or lockup, freezing the server causes no additional harm,
      and the stack traces can be vital for diagnosis.
    In addition to freezing the server, there is also some risk of
        the server crashing or performing badly after GDB detaches from it.--collect-oprofileCollect oprofile data. This is achieved by starting an oprofile session,
      letting it run for the collection time, and then stopping and saving the
      resulting profile data in the system's default location. Please read your
      system's oprofile documentation to learn more about this.--collect-straceCollect strace data. This is achieved by attaching strace to the server,
      which will make it run very slowly until strace detaches. The same
      cautions apply as those listed in --collect-gdb. You should not enable
      this option together with --collect-gdb, because GDB and strace can't
      attach to the server process simultaneously.--collect-tcpdumpCollect tcpdump data. This option causes tcpdump to capture all traffic on
      all interfaces for the port on which MySQL is listening. You can later use
      pt-query-digest to decode the MySQL protocol and extract a log of query
      traffic from it.--configtype: string
    Read this comma-separated list of config files. If specified,
        this must be the first option on the command line.--cyclestype: int; default: 5
    How many times "--variable" must be greater than
        "--threshold" before triggering "--collect". This
        helps prevent false positives, and makes the trigger condition less
        likely to fire when the problem recovers quickly.--daemonizeDaemonize the tool. This causes the tool to fork into the background and
      log its output as specified in --log.--defaults-fileshort form: -F; type: string
    Only read mysql options from the given file. You must give an
        absolute pathname.--desttype: string; default: /var/lib/pt-stalk
    Where to save diagnostic data from "--collect". Each
        time the tool collects data, it writes to a new set of files, which are
        named with the current system timestamp.--disk-bytes-freetype: size; default: 100M
    Do not "--collect" if the disk has less than this
        much free space. This prevents the tool from filling up the disk with
        diagnostic data. If the "--dest" directory contains a previously
        captured sample of data, the tool will measure its size and use that as
        an estimate of how much data is likely to be gathered this time, too. It
        will then be even more pessimistic, and will refuse to collect data
        unless the disk has enough free space to hold the sample and still have
        the desired amount of free space. For example, if you'd like 100MB of
        free space and the previous diagnostic sample consumed 100MB, the tool
        won't collect any data unless the disk has 200MB free. Valid size value suffixes are k, M, G, and T.--disk-pct-freetype: int; default: 5
    Do not "--collect" if the disk has less than this
        percent free space. This prevents the tool from filling up the disk with
        diagnostic data. This option works similarly to "--disk-bytes-free"
        but specifies a percentage margin of safety instead of a bytes margin of
        safety. The tool honors both options, and will not collect any data
        unless both margins are satisfied.--functiontype: string; default: status
    What to watch for the trigger. The default value watches
        "SHOW GLOBAL STATUS", but you can also
        watch "SHOW PROCESSLIST" and specify a
        file with your own custom code. This function supplies the value of
        "--variable", which is then compared against
        "--threshold" to see if the the trigger condition is met.
        Additional options may be required as well; see below. Possible values
        are: 
  status
    Watch "SHOW GLOBAL STATUS"
        for the trigger. The value of "--variable" then defines which
        status counter is the trigger.processlist
    Watch "SHOW FULL
        PROCESSLIST" for the trigger. The trigger value is the count
        of processes whose "--variable" column matches the
        "--match" option. For example, to trigger
        "--collect" when more than 10 processes are in the
        "statistics" state, specify:    --function processlist \
   --variable State       \
   --match statistics     \
   --threshold 10
     
In addition, you can specify a file that contains your custom
    trigger function, written in Unix shell script. This can be a wrapper that
    executes anything you wish. If the argument to "--function" is a
    file, then it takes precedence over built-in functions, so if there is a
    file in the working directory named "status" or
    "processlist" then the tool will use that file even though are
    valid built-in values. The file works by providing a function called
    "trg_plugin", and the tool simply sources
    the file and executes the function. For example, the file might contain:    trg_plugin() {
      mysql $EXT_ARGV -e "SHOW ENGINE INNODB STATUS" \
        | grep -c "has waited at"
   }
This snippet will count the number of mutex waits inside InnoDB.
    It illustrates the general principle: the function must output a number,
    which is then compared to "--threshold" as usual. The
    $EXT_ARGV variable contains the MySQL options
    mentioned in the "SYNOPSIS" above. The file should not alter the tool's existing global variables.
    Prefix any file-specific global variables with
    "PLUGIN_" or make them local. 
  --helpPrint help and exit.--hostshort form: -h; type: string
    Host to connect to.--intervaltype: int; default: 1
    How often to check the if trigger is true, in seconds.--iterationstype: int
    How many times to "--collect" diagnostic data. By
        default, the tool runs forever and collects data every time the trigger
        occurs. Specify "--iterations" to collect data a limited
        number of times. This option is also useful with
        "--no-stalk" to collect data once and
        exit, for example.--logtype: string; default: /var/log/pt-stalk.log
    Print all output to this file when daemonized.--matchtype: string
    The pattern to use when watching SHOW PROCESSLIST. See
        "--function" for details.--notify-by-emailtype: string
    Send an email to these addresses for every
        "--collect".--passwordshort form: -p; type: string
    Password to use when connecting. If password contains commas
        they must be escaped with a backslash: "exam\,ple"--pidtype: string; default: /var/run/pt-stalk.pid
    Create the given PID file. The tool won't start if the PID
        file already exists and the PID it contains is different than the
        current PID. However, if the PID file exists and the PID it contains is
        no longer running, the tool will overwrite the PID file with the current
        PID. The PID file is removed automatically when the tool exits.--plugintype: string
    Load a plugin to hook into the tool and extend is
        functionality. The specified file does not need to be executable, nor
        does its first line need to be shebang line. It only needs to define one
        or more of these Bash functions: 
  before_stalkCalled before stalking.before_collectCalled when the trigger occurs, before running a "--collect"
      subprocesses in the background.after_collectCalled after running a collector process. The PID of the collector process
      is passed as the first argument. This hook is called before
      "after_collect_sleep".after_collect_sleepCalled after sleeping "--sleep" seconds for the collector
      process to finish. This hook is called after
      "after_collect".after_interval_sleepCalled after sleeping "--interval" seconds after each trigger
      check.after_stalkCalled after stalking. Since pt-stalk stalks forever by default, this hook
      is only called if "--iterations" is specified. 
For example, a very simple plugin that touches a file when
    "--collect" is triggered:    before_collect() {
      touch /tmp/foo
   }
Since the plugin is completely sourced (imported) into the tool's
    namespace, be careful not to define other functions or global variables that
    already exist in the tool. You should prefix all plugin-specific functions
    and global variables with "plugin_" or
    "PLUGIN_". Plugins have access to all command line options but they should
    not modify them. Each option is a global variable like
    $OPT_DEST which corresponds to "--dest".
    Therefore, the global variable for each command line option is
    "OPT_" plus the option name in all caps
    with hyphens replaced by underscores. Plugins can stop the tool by setting the global variable
    "OKTORUN" to 1. In
    this case, the global variable
    "EXIT_REASON" should also be set to
    indicate why the tool was stopped. Plugin writers should keep in mind that the file destination
    prefix currently in use should be accessed through the
    $prefix variable, rather than
    $OPT_PREFIX. 
  --mysql-onlyTrigger only MySQL related captures, ignoring all others. The only not
      MySQL related value being collected is the disk space, because it is
      needed to calculate the available free disk space to write the result
      files. This option is useful for RDS instances.--portshort form: -P; type: int
    Port number to use for connection.--prefixtype: string
    The filename prefix for diagnostic samples. By default, all
        files created by the same "--collect" instance have a
        timestamp prefix based on the current local time, like
        "2011_12_06_14_02_02", which is
        December 6, 2011 at 14:02:02.--retention-counttype: int; default: 0
    Keep the data for the last N runs. If N > 0, the program
        will keep the data for the last N runs and will delete the older
      data.--retention-sizetype: int; default: 0
    Keep up to --retention-size MB of data. It will keep at least
        1 run even if the size is bigger than the specified in this
      parameter--retention-timetype: int; default: 30
    Number of days to retain collected samples. Any samples that
        are older will be purged.--run-timetype: int; default: 30
    How long to "--collect" diagnostic data when the
        trigger occurs. The value is in seconds and should not be longer than
        "--sleep". It is usually not necessary to change this; if the
        default 30 seconds doesn't collect enough data, running longer is not
        likely to help because the system or MySQL server is probably too busy
        to respond. In fact, in many cases a shorter collection period is
        appropriate. This value is used two other times. After collecting, the
        collect subprocess will wait another "--run-time" seconds for
        its commands to finish. Some commands can take awhile if the system is
        running very slowly (which can likely be the case given that a
        collection was triggered). Since empty files are deleted, the extra wait
        gives commands time to finish and write their data. The value is
        potentially used again just before the tool exits to wait again for any
        collect subprocesses to finish. In most cases this won't happen because
        of the aforementioned extra wait. If it happens, the tool will log
        "Waiting up to N seconds for subprocesses to finish..." where
        N is three times "--run-time". In both cases, after waiting,
        the tool kills all of its subprocesses.--sleeptype: int; default: 300
    How long to sleep after "--collect". This prevents
        the tool from triggering continuously, which might be a problem if the
        collection process is intrusive. It also prevents filling up the disk or
        gathering too much data to analyze reasonably.--sleep-collecttype: int; default: 1
    How long to sleep between collection loop cycles. This is
        useful with "--no-stalk" to do long
        collections. For example, to collect data every minute for an hour,
        specify: "--no-stalk --run-time 3600
        --sleep-collect 60".--socketshort form: -S; type: string
    Socket file to use for connection.--stalkdefault: yes; negatable: yes
    Watch the server and wait for the trigger to occur. Specify
        "--no-stalk" to collect diagnostic
        data immediately, that is, without waiting for the trigger to occur. You
        probably also want to specify values for "--interval",
        "--iterations", and "--sleep". For example, to
        immediately collect data for 1 minute then exit, specify:    --no-stalk --run-time 60 --iterations 1
    "--cycles", "--daemonize",
        "--log" and "--pid" have no effect with
        "--no-stalk". Safeguard options, like
        "--disk-bytes-free" and "--disk-pct-free", are still
        respected. See also "--collect".--system-onlyTrigger only operating system related captures, ignoring all others.--thresholdtype: int; default: 25
    The maximum acceptable value for "--variable".
        "--collect" is triggered when the value of
        "--variable" is greater than "--threshold" for
        "--cycles" many times. Currently, there is no way to define a
        lower threshold to check for a "--variable" value that is too
        low. See also "--function".--usershort form: -u; type: string
    User for login if not current user.--variabletype: string; default: Threads_running
    The variable to compare against "--threshold". See
        also "--function".--verbosetype: int; default: 2
    Print more or less information while running. Since the tool
        is designed to be a long-running daemon, the default verbosity level
        only prints the most important information. If you run the tool
        interactively, you may want to use a higher verbosity level.   LEVEL PRINTS
  ===== =====================================
  0     Errors
  1     Warnings
  2     Matching triggers and collection info
  3     Non-matching triggers
    --versionPrint tool's version and exit. This tool does not require any environment variables for
    configuration, although it can be influenced to work differently by through
    several variables. Keep in mind that these are expert settings, and should
    not be used in most cases. Specifically, the variables that can be set are: 
  CMD_GDBCMD_IOSTATCMD_MPSTATCMD_MYSQLCMD_MYSQLADMINCMD_OPCONTROLCMD_OPREPORTCMD_PMAPCMD_STRACECMD_SYSCTLCMD_TCPDUMPCMD_VMSTAT For example, during collection iostat is called with a -dx
    argument, but because you have an NFS partition, you also need the -n flag
    there. Instead of editing the source, you can call pt-stalk as     CMD_IOSTAT="iostat -n" pt-stalk ...
 which will do exactly what you need. Combined with the plugin
    hooks, this gives you a fine-grained control of what the tool does. It is possible to enable "debug"
    mode in mysqladmin specifying: "CMD_MYSQLADMIN='mysqladmin debug' pt-stalk
    params ..." This tool requires Bash v3 or newer. Certain options require other
    programs: 
  "--collect-gdb" requires "gdb""--collect-oprofile" requires "opcontrol" and
    "opreport""--collect-strace" requires "strace""--collect-tcpdump" requires "tcpdump" For a list of known bugs, see
    <https://jira.percona.com/projects/PT/issues>. Please report bugs at
    <https://jira.percona.com/projects/PT>. Include the following
    information in your bug report: 
  Complete command-line used to run the toolTool "--version"MySQL version of all servers involvedOutput from the tool including STDERRInput files (log/dump/config files, etc.) If possible, include debugging output by running the tool with
    "PTDEBUG"; see
  "ENVIRONMENT". Using <PTDEBUG> might expose passwords. When debug is
    enabled, all command line parameters are shown in the output. Visit <http://www.percona.com/software/percona-toolkit/> to
    download the latest release of Percona Toolkit. Or, get the latest release
    from the command line:    wget percona.com/get/percona-toolkit.tar.gz
   wget percona.com/get/percona-toolkit.rpm
   wget percona.com/get/percona-toolkit.deb
 You can also get individual tools from the latest release:    wget percona.com/get/TOOL
 Replace "TOOL" with the name of
    any tool. Baron Schwartz, Justin Swanhart, Fernando Ipar, Daniel Nichter,
    and Brian Fraser This tool is part of Percona Toolkit, a collection of advanced
    command-line tools for MySQL developed by Percona. Percona Toolkit was
    forked from two projects in June, 2011: Maatkit and Aspersa. Those projects
    were created by Baron Schwartz and primarily developed by him and Daniel
    Nichter. Visit <http://www.percona.com/software/> to learn about other
    free, open-source software from Percona. This program is copyright 2011-2021 Percona LLC and/or its
    affiliates, 2010-2011 Baron Schwartz. THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS
    OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES
    OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. This program is free software; you can redistribute it and/or
    modify it under the terms of the GNU General Public License as published by
    the Free Software Foundation, version 2; OR the Perl Artistic License. On
    UNIX and similar systems, you can issue `man perlgpl' or `man perlartistic'
    to read these licenses. You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software Foundation,
    Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. 
  Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc.
 |