This post is going to use an java code example to show the process how to analysis linux system performance issue using all kinds of Linux internal commands without other third party tools.
System: Top
Here is a java code for an example
public class JavaDemo2 {
public static void main(String[] args) {
while (true){
System.out.println(new java.util.Random().nextInt(77778888));
}
}
}
top check system resource usage
[root@192 ~]# top
top – 11:28:50 up 7 min, 2 users, load average: 0.73, 0.39, 0.18 Tasks: 230 total, 1 running, 229 sleeping, 0 stopped, 0 zombie Cpu(s): 1.0%us, 2.3%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1906280k total, 649948k used, 1256332k free, 21164k buffers Swap: 2097148k total, 0k used, 2097148k free, 203820k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3672 root 20 0 3158m 94m 11m S 15.6 5.1 0:13.67 java 3397 root 20 0 100m 5388 3396 S 9.3 0.3 0:09.00 sshd 39 root 20 0 0 0 0 S 1.7 0.0 0:00.25 events/4 37 root 20 0 0 0 0 S 1.0 0.0 0:00.39 events/2 35 root 20 0 0 0 0 S 0.3 0.0 0:01.27 events/0 36 root 20 0 0 0 0 S 0.3 0.0 0:00.34 events/1 41 root 20 0 0 0 0 S 0.3 0.0 0:00.24 events/6 [root@192 ~]# uptime 11:29:09 up 7 min, 2 users, load average: 0.52, 0.36, 0.18
htop Usage:
yum install htop
CPU: vmstat
[root@192 ~]# vmstat -n 2 3
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 1313028 21228 203868 0 0 54 3 112 1620 0 1 98 0 0
0 0 0 1312732 21228 203896 0 0 0 0 2579 25940 1 3 96 0 0
0 0 0 1312856 21228 203896 0 0 0 0 2677 24290 1 3 97 0 0
id:处于空闲的CPU百分比
wa:系统等待IO的CPU时间百分比
st:来自于一个虚拟机偷取的CPU时间百分比
查看额外的CPU信息
所有CPU核信息
[root@192 ~]# mpstat -P ALL 2
Linux 2.6.32-642.el6.x86_64 (192.168.1.101) 08/05/2019 _x86_64_ (8 CPU)
11:42:40 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
11:42:42 AM all 1.07 0.00 2.89 0.00 0.00 0.00 0.00 0.00 96.04
11:42:42 AM 0 0.00 0.00 1.52 0.00 0.00 0.00 0.00 0.00 98.48
11:42:42 AM 1 6.09 0.00 9.14 0.00 0.00 0.00 0.00 0.00 84.77
11:42:42 AM 2 0.50 0.00 4.02 0.00 0.00 0.00 0.00 0.00 95.48
11:42:42 AM 3 1.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 97.00
每个进程使用CPU的用量分解信息
[root@192 ~]# pidstat -u 1 -p 3790
Linux 2.6.32-642.el6.x86_64 (192.168.1.101) 08/05/2019 _x86_64_ (8 CPU)
11:45:00 AM PID %usr %system %guest %CPU CPU Command
11:45:01 AM 3790 6.00 10.00 0.00 16.00 7 java
11:45:02 AM 3790 4.00 11.00 0.00 15.00 7 java
11:45:03 AM 3790 9.00 15.00 0.00 24.00 7 java
11:45:04 AM 3790 7.00 14.00 0.00 21.00 7 java
[Expert@CP:0]# vmstat 2 |awk ‘{now=strftime(“%Y-%m-%d %T “); print now $0}’2014-10-29 09:26:47 procs ———–memory———- —swap– —–io—- –system– —–cpu——2014-10-29 09:26:47 r b swpd free buff cache si so bi bo in cs us sy id wa st2014-10-29 09:26:47 1 0 448004 10748 1928 126520 10 13 53 581 118 155 8 11 81 1 02014-10-29 09:26:49 1 0 448004 10748 1936 126520 0 0 0 84 1123 2197 5 10 84 0 02014-10-29 09:26:51 1 0 448004 10780 1936 126520 0 0 0 0 1123 2145 3 6 92 0 02014-10-29 09:26:53 1 0 448004 10500 1944 126512 0 0 0 82 1123 2204 6 13 82 0 02014-10-29 09:26:55 1 0 448004 10500 1944 126520 0 0 0 0 1125 2139 6 11 84 0 02014-10-29 09:26:58 3 0 448004 10484 1944 126520 0 0 0 0 1123 2112 6 10 84 0 0 |
The ‘procs’ field has 2 columns:
r – The number of processes waiting for run time.
b – The number of processes in uninterruptible sleep (blocked processes).
The ‘memory’ field has 4 columns: (see with vmstat -a)
swpd – The amount of used swap space(virtual memory) used.
free – The amount of idle memory(free RAM).
inact – The amount of inactive memory.
active – The amount of active memory.
The ‘swap’ field has 2 columns:
si – Amount of memory swapped in from disk (/s).
so – Amount of memory swapped to disk (/s).
The ‘io’ field has 2 columns:
bi – Blocks received from a block device (blocks in).
bo – Blocks sent to a block device (blocks out).
The ‘system’ field has 2 columns:
in – The number of interrupts per second, including the clock (System interrupts).
cs – The number of context switches per second (Process context switches).
The ‘cpu’ field has only 4 columns:
us: Time spent running non-kernel code. (user time, including nice time).
sy: Time spent running kernel code. (system time).
id: Time spent idle.
wa: Time spent waiting for IO.
CPU slow1:
r has numbers in it constantly, threads/tasks waiting to be processed by your gimp cpu
CPU slow2:
in is high, you are handling too many interrupts (likely from disk activity, but could be bad driver)
Processes:
us or sy is high? Some process is being a cpu hog, use top -n 1 to find it, and kill -9 the PID if needed
Disk Subsystem Overloaded:
wa is high? If you are waiting for IO then you need to upgrade your disk subsystem
Not Enough RAM:
si and so are high, swapping disk too much. You really shouldn’t swap at all for high performance. If these are high, in will be high too. Upgrade your RAM.
Low Memory2:
cs is high? The kernel is paging memory in and out of context. Likely you need more RAM, but it could be other issues too such as damaged hardware or pitiful software.
Out of Memory:
I ignore free, inact, active because it’s not as useful and understanding the actual reasons. Ie: if you are out of memory, you’ll know that, but unless you look at cs, so, si, etc you won’t know why. So it’s redundant.
Memory: Free
应用程序可用内存数
free -m
[root@192 ~]# free -g
total used free shared buffers cached
Mem: 1 0 1 0 0 0
-/+ buffers/cache: 0 1
Swap: 1 0 1
[root@192 ~]# free -m
total used free shared buffers cached
Mem: 1861 553 1308 1 20 199
-/+ buffers/cache: 333 1528
Swap: 2047 0 2047
查看额外
[root@192 ~]# pidstat -p 3823 -r 2
Linux 2.6.32-642.el6.x86_64 (192.168.1.101) 08/05/2019 _x86_64_ (8 CPU)
11:51:16 AM PID minflt/s majflt/s VSZ RSS %MEM Command
11:51:18 AM 3823 1.00 0.00 3234728 47068 2.47 java
11:51:20 AM 3823 2.50 0.00 3234728 47068 2.47 java
11:51:22 AM 3823 0.50 0.00 3234728 47068 2.47 java
11:51:24 AM 3823 2.50 0.00 3234728 47068 2.47 java
11:51:26 AM 3823 0.50 0.00 3234728 47068 2.47 java
Hard Drive: df
查看磁盘剩余空间
[root@192 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 15G 5.6G 8.4G 41% /
tmpfs 931M 72K 931M 1% /dev/shm
/dev/sda1 190M 39M 142M 22% /boot
Disk IO: iostat
[root@192 ~]# iostat -xdk 2 3
Linux 2.6.32-642.el6.x86_64 (192.168.1.101) 08/05/2019 _x86_64_ (8 CPU)
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
scd0 0.00 0.00 0.38 0.00 1.50 0.00 8.00 0.00 0.67 0.67 0.00 0.67 0.03
sda 34.33 4.94 47.36 3.18 1631.23 32.44 65.84 0.07 1.38 1.22 3.72 0.88 4.43
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.50 0.00 4.00 0.00 16.00 0.00 1.00 1.00 0.00 1.00 0.05
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
查看额外
[root@192 ~]# pidstat -d 2 -p 3485
Linux 2.6.32-642.el6.x86_64 (192.168.1.101) 08/05/2019 _x86_64_ (8 CPU)
02:16:10 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command
02:16:12 PM 3485 0.00 0.00 0.00 java
02:16:14 PM 3485 0.00 0.00 0.00 java
02:16:16 PM 3485 0.00 0.00 0.00 java
02:16:18 PM 3485 0.00 0.00 0.00 java
02:16:20 PM 3485 0.00 0.00 0.00 java
02:16:22 PM 3485 0.00 0.00 0.00 java
02:16:24 PM 3485 0.00 0.00 0.00 java
Network IO: ifstat
默认本地没有,需要下载ifstat
安装ifstat
[root@192 ifstat-1.1]# yum install flex byacc libpcap ncurses ncurses-devel libpcap-devel gcc-c++
wget http://gael.roualland.free.fr/ifstat/ifstat-1.1.tar.gz
tar xzvf ifstat-1.1.tar.gz
cd ifstat-1.1
./configure
make
make install
[root@192 ifstat-1.1]# ifstat 1
eth0
KB/s in KB/s out
0.06 0.13
0.06 0.12
0.06 0.12
0.06 0.12
0.06 0.12
0.06 0.12
0.12 0.17
0.06 0.12
Troubleshooting Steps
1. Top command to find out which process is using CPU most
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4294 root 20 0 3158m 37m 11m S 16.6 2.0 0:03.28 java
3428 root 20 0 100m 5444 3396 S 14.3 0.3 0:22.85 sshd
35 root 20 0 0 0 0 S 1.0 0.0 0:03.18 events/0
41 root 20 0 0 0 0 S 0.3 0.0 0:00.52 events/6
2012 root 20 0 199m 5192 4288 S 0.3 0.3 0:00.32 ManagementAgent
1 root 20 0 19344 1544 1228 S 0.0 0.1 0:01.83 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
2. ps -ef or jps to pin-point the process detail
[root@192 ifstat-1.1]# jps
4320 Jps
4294 JavaDemo2
3. Pin-point to code or specific process instance
[root@192 ifstat-1.1]# ps -mp 4294 -o THREAD,tid,time
USER %CPU PRI SCNT WCHAN USER SYSTEM TID TIME
root 18.6 - - - - - - 00:00:15
root 0.0 19 - futex_ - - 4294 00:00:00
root 15.6 19 - n_tty_ - - 4295 00:00:13
root 0.2 19 - futex_ - - 4296 00:00:00
root 0.2 19 - futex_ - - 4297 00:00:00
root 0.3 19 - futex_ - - 4298 00:00:00
root 0.2 19 - futex_ - - 4299 00:00:00
root 0.3 19 - futex_ - - 4300 00:00:00
root 0.3 19 - futex_ - - 4301 00:00:00
root 0.3 19 - futex_ - - 4302 00:00:00
root 0.2 19 - futex_ - - 4303 00:00:00
root 0.0 19 - futex_ - - 4304 00:00:00
root 0.0 19 - futex_ - - 4305 00:00:00
root 0.0 19 - futex_ - - 4306 00:00:00
root 0.0 19 - futex_ - - 4307 00:00:00
root 0.0 19 - futex_ - - 4308 00:00:00
root 0.1 19 - futex_ - - 4309 00:00:00
root 0.0 19 - futex_ - - 4310 00:00:00
root 0.0 19 - futex_ - - 4311 00:00:00
root 0.0 19 - futex_ - - 4312 00:00:00
root 0.0 19 - futex_ - - 4313 00:00:00
-m:显示所有的线程
-p: pid进程使用CPU的时间
-o:该参数后是用户自定义格式
4. Convert instance ID to hex code
线程ID转换为16进制格式(英文小写格式)
[root@192 ifstat-1.1]# printf "%x\n" 4295
10c7
5. jstack instance ID|grep tid(hex code – lower case) -A60
[root@192 ifstat-1.1]# jstack 4294 | grep 10c7 -A60
"main" #1 prio=5 os_prio=0 tid=0x00007f7f68009000 nid=0x10c7 runnable [0x00007f7f6f117000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000000e2e0e6e0> (a java.io.BufferedOutputStream)
at java.io.PrintStream.write(PrintStream.java:482)
- locked <0x00000000e2e0bbd8> (a java.io.PrintStream)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
- locked <0x00000000e2e0e800> (a java.io.OutputStreamWriter)
at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185)
at java.io.PrintStream.newLine(PrintStream.java:546)
- eliminated <0x00000000e2e0bbd8> (a java.io.PrintStream)
at java.io.PrintStream.println(PrintStream.java:737)
- locked <0x00000000e2e0bbd8> (a java.io.PrintStream)
at JavaDemo2.main(JavaDemo2.java:4)
"VM Thread" os_prio=0 tid=0x00007f7f68080000 nid=0x10d0 runnable
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f7f6801e800 nid=0x10c8 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f7f68020000 nid=0x10c9 runnable
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f7f68022000 nid=0x10ca runnable
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f7f68024000 nid=0x10cb runnable
"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007f7f68025800 nid=0x10cc runnable
"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007f7f68027800 nid=0x10cd runnable
"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007f7f68029800 nid=0x10ce runnable
"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007f7f6802b000 nid=0x10cf runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00007f7f680e8000 nid=0x10d9 waiting on condition
JNI global references: 5