What will we cover?
The following commands have been recommended by the Network Engineering team at Netflix. They use these commands to quickly and effectively diagnose performance issues with Linux servers. In this article I will outline the commands they use followed by a brief description and example output. The original article can be found at http://techblog.netflix.com/2015/11/linux-performance-analysis-in-60s.html?m=1
uptime is a simple Linux/Unix command which is helpful for quickly identifying potentially high CPU load. The command can be used in conjunction with other performance analysis tools to quickly determine if the CPU is the source of an issue. When executed, the command returns the current time, server up time, logged in users and most importantly the load average. The load average is presented as three numbers which represent 1, 5 and 15 minute intervals. The three numbers identify changing CPU load over a 15 minute window.
08:11:22 up 146 days, 34 min, 3 users, load average: 1.05, 0.70, 5.09
How to interpret the data. Based on the assumption that the server has one CPU.
Over the last 1 minute: The computer was overloaded by 5% on average. On average, .05 processes were waiting for the CPU. (1.05)
Over the last 5 minutes: The CPU idled for 30% of the time. (0.70)
Over the last 15 minutes: The computer was overloaded by 409% on average. On average, 4.09 processes were waiting for the CPU. (5.09)
Syntax: uptime [options]
Options: [-h, –help] [-V, –version]
dmesg | tail
The dmesg command is popular for quick error checking. When the command is piped through the “tail” function the user is presented with the last few lines of system messages which will show errors, if any have occurred. The system messages are taken from the Kernel Ring Buffer which includes boot information.
$ dmesg | tail[1880957.563150] perl invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0[…][1880957.563400] Out of memory: Kill process 18694 (perl) score 246 or sacrifice child[1880957.563408] Killed process 18694 (perl) total-vm:1972392kB, anon-rss:1953348kB, file-rss:0kB[2320864.954447] TCP: Possible SYN flooding on port 7001. Dropping request. Check SNMP counters.
To interpret this data the system administrator simply has to review the output to identify any potential problem sources. This information can then be acted upon accordingly.
Syntax: dmesg [options]
Options: Refer to [-h, –help]
The vmstat command provides a comprehensive overview of system performance. This includes information such as CPU processes, free memory, Swap-in/Swap-out and CPU statistics.
$ vmstat procs ———–memory———- —swap– —–io—- -system– —-cpu—- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 3532 148760 50700 1397880 0 0 1 2 6 6 3 1 97 0
Vmstat is useful for error checking as it provides information on a wide range of key systems the data can be interpreted using the corresponding headings. Processes, memory, swap and CPU are the four main data feeds. A quick look through this table will allow the user to see if any of the system resources are exhausted and causing issues.
mpstat –P ALL 1
This command provides an in-depth analysis of CPU performance including a breakdown of each of the CPU cores. This allows system administrators to identify if one of the CPU’s are handling the majority of the load and causing performance issues.
$ mpstat -P ALL
Linux 2.6.32-100.28.5.el6.x86_64 (dev-db) 07/09/2011
_x86_64_ (4 CPU)10:28:04 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle10:28:04 PM all 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.9910:28:04 PM 0 0.01 0.00 0.01 0.01 0.00 0.00 0.00 0.00 99.9810:28:04 PM 1 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 99.9810:28:04 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.0010:28:04 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Each of the columns represents the CPU utilisation during a different stage of system operation. This command will also help to see if a single threaded application is consuming system resources.
The pidstat command is a very useful rolling summary of Linux Kernel tasks. All of the tasks will be written to standard output along with additional data such as CPU usage. This command is helpful for monitoring over time and can be used to easily identify applications which have high system usage.
Example Output: (mysql server)
Linux 2.6.32-279.el6.x86_64 (server1.cyberciti.biz) 08/21/2012
_x86_64_ (8 CPU)05:24:35 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command05:24:36 PM 7114 0.00 40.00 0.00 mysqld05:24:37 PM 7114 0.00 64.00 64.00 mysqld05:24:38 PM 7114 0.00 44.00 8.00 mysqld05:24:39 PM 7114 0.00 24.00 0.00 mysqld05:24:40 PM 7114 0.00 128.00 128.00 mysqld
Iostat –xz 1
The iostat command provides an analysis of CPU statistics in relation to physical disk I/O. The purpose of the command is to identify irregular or imbalanced load on any of the physical drives as this can result in delays. Included in the output are delivered read and writes which can be used for workload characterisation. Await is another key piece of information which provides the time in milliseconds that applications have to wait for drives to process requests, if await values are larger than previous averages it can be an indication of device saturation. Device Utilisation “%util” is another useful statistic. The utilisation column shows the percentage of time that the drives are doing work. If this number is greater than 60% it can indicate poor performance which can be confirmed by looking at the await column to determine if applications are having to wait long periods of time.
$ iostat -xz 1Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015
_x86_64_ (32 CPU)avg-cpu: %user %nice %system %iowait %steal %idle 73.96 0.00 3.73 0.03 0.06 22.21Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %utilxvda 0.00 0.23 0.21 0.18 4.52 2.08 34.37 0.00 9.98 13.80 5.42 2.44 0.09xvdb 0.01 0.00 1.02 8.94 127.97 598.53 145.79 0.00 0.43 1.78 0.28 0.25 0.25xvdc 0.01 0.00 1.02 8.86 127.79 595.94 146.50 0.00 0.45 1.82 0.30 0.27 0.26dm-0 0.00 0.00 0.69 2.32 10.47 31.69 28.01 0.01 3.23 0.71 3.98 0.13 0.04dm-1 0.00 0.00 0.00 0.94 0.01 3.78 8.00 0.33 345.84 0.04 346.81 0.01 0.00dm-2 0.00 0.00 0.09 0.07 1.35 0.36 22.50 0.00 2.55 0.23 5.62 1.78 0.03[…]
The free command simply outputs the current memory statistics to standard output. This includes important information such as total memory, memory used and free memory. In addition to buffer and cache details. This command is very useful for quick troubleshooting as if a systems memory is exhausted the performance will be drastically reduced as memory sheets are placed onto the hard disk which is several hundred thousand times slower.
$ free –m
total used free shared buffers cachedMem: 245998 24545 221453 83 59 541-/+ buffers/cache: 23944 222053Swap: 0 0 0
Syntax: dmesg [options]
Options: [-m display data in megabytes] [-k display data in kilobytes] [-g display data in gigabytes] [-t display data in terabytes]