MySQL压测时Linux中断异常飚高，原来是因为..._飞道的博客

MySQL压测时Linux中断异常飚高，原来是因为...

2020-10-13 10:35 803人阅读评论(0)

1. 系统环境

OS: CentOS Linux release 7.8.2003 (Core)
Kernel: 3.10.0-1127.19.1.el7.x86_64
MySQL: 用5.0、5.7均有此问题，应该和版本无关

2. 压测工具

benchyou[1]
mysql_random_load[2]

3. 问题现象

利用 mysql_random_load 工具连接MySQL写入数据时，性能非常非常低。

由于 mysql_random_load 工具不支持通过socket连接，只好放弃，改用benchyou。顺便说一下，benchyou和sysbench极为相似，也非常好用。

改用benchyou工具后，压测正常。看来的确不是MySQL版本的问题。

用 mysql_random_load 工具进行压测时，系统负载非常高，同时可观测到系统的中断也很高并且也很不均衡。


   
    
     
      
     
     
      
       [root@yejr.run]# vmstat -S m 
       1
      
     
    
     
      
     
     
      
       procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
      
     
    
     
      
     
     
      
        r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
      
     
    
     
      
     
     
      
        
       2  
       0      
       0  
       73585      
       2  
       41051    
       0    
       0   
       117    
       91    
       4    
       2  
       0  
       0 
       99  
       0  
       0
      
     
    
     
      
     
     
      
        
       2  
       0      
       0  
       73585      
       2  
       41051    
       0    
       0     
       0 
       28320 
       55444 
       100207 
       18  
       2 
       80  
       0  
       0
      
     
    
     
      
     
     
      
        
       4  
       0      
       0  
       73584      
       2  
       41052    
       0    
       0     
       0  
       1936 
       52949 
       98607 
       18  
       2 
       81  
       0  
       0
      
     
    
     
      
     
     
      
        
       2  
       0      
       0  
       73583      
       2  
       41052    
       0    
       0     
       0  
       4864 
       56375 
       101262 
       14  
       2 
       84  
       0  
       0
      
     
    
     
      
     
     
      
        
       4  
       0      
       0  
       73583      
       2  
       41052    
       0    
       0     
       0 
       29064 
       55806 
       103715 
       19  
       2 
       80  
       0  
       0
      
     
    
     
      
     
     
      
        
       5  
       0      
       0  
       73583      
       2  
       41052    
       0    
       0     
       0  
       5704 
       55854 
       98386 
       15  
       2 
       83  
       0  
       0

可以看到 system.in 这列的值非常高，改成benchyou工具后，这列的值从5.5万降到1.6万。


   
    
     
      
     
     
      
       [root@yejr.run]# vmstat -S m 
       1
      
     
    
     
      
     
     
      
       procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
      
     
    
     
      
     
     
      
        r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
      
     
    
     
      
     
     
      
        
       4  
       0      
       0  
       77238      
       2  
       38371    
       0    
       0   
       118    
       88    
       2    
       3  
       0  
       0 
       99  
       0  
       0
      
     
    
     
      
     
     
      
        
       2  
       0      
       0  
       77234      
       2  
       38374    
       0    
       0     
       0 
       31620 
       16039 
       77988  
       3  
       2 
       95  
       0  
       0
      
     
    
     
      
     
     
      
        
       2  
       0      
       0  
       77231      
       2  
       38377    
       0    
       0     
       0 
       31996 
       16091 
       78926  
       3  
       2 
       95  
       0  
       0
      
     
    
     
      
     
     
      
        
       3  
       0      
       0  
       77229      
       2  
       38378    
       0    
       0     
       0 
       33028 
       16347 
       81006  
       3  
       2 
       95  
       0  
       0
      
     
    
     
      
     
     
      
        
       0  
       0      
       0  
       77226      
       2  
       38383    
       0    
       0     
       0 
       52412 
       15496 
       75715  
       3  
       2 
       95  
       0  
       0
      
     
    
     
      
     
     
      
        
       2  
       0      
       0  
       77224      
       2  
       38384    
       0    
       0     
       0 
       32252 
       16167 
       79352  
       3  
       2 
       95  
       0  
       0

再看下有问题时的系统中断表现


   
    
     
      
     
     
      
       [root@yejr.run]# mpstat -I SUM -P ALL 
       1
      
     
    
     
      
     
     
      
       Linux 
       3.10
       .0
       -1127.19
       .1.el7.x86_64 (yejr.run)  
       09/
       28/
       2020  _x86_64_ (
       32 CPU)
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       05:
       37:
       40 PM  CPU    intr/s
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM  all  
       51833.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM    
       0   
       2069.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM    
       1   
       1159.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM    
       2   
       2979.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM    
       3   
       1580.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM    
       4   
       1627.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM    
       5   
       1461.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM    
       6   
       1243.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM    
       7   
       1825.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM    
       8   
       2154.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM    
       9   
       1367.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       10   
       1277.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       11   
       1376.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       12   
       4085.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       13   
       1601.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       14   
       4045.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       15   
       1857.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       16   
       1692.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       17    
       722.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       18    
       118.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       19   
       1862.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       20   
       1637.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       21   
       1130.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       22   
       1750.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       23   
       1653.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       24   
       1417.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       25   
       1547.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       26   
       1500.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       27   
       1033.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       28     
       20.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       29   
       1683.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       30    
       888.00
      
     
    
     
      
     
     
      
       05:
       37:
       41 PM   
       31   
       1549.00

可以看到每秒中断总量有5.5万，但多个CPU间并不均衡。

4. 问题分析

初步认定是因为系统中断太高导致的写入性能差，并且也认定是因为多个CPU间中断不均衡导致的这个问题。

观察是都有哪些中断比较高，发现主要是 LOC 和 RES 这两个每秒的增长比较大。


   
    
     
      
     
     
      
       [root@yejr.run]# watch -d cat /proc/interrupts
      
     
    
     
      
     
     
      
       ...
      
     
    
     
      
     
     
      
       LOC: 
       2468939840 
       2374791518 
       2373834803 
       2373613050   Local timer interrupts
      
     
    
     
      
     
     
      
       SPU:          
       0          
       0          
       0          
       0   Spurious interrupts
      
     
    
     
      
     
     
      
       PMI:          
       0          
       0          
       0          
       0   Performance monitoring interrupts
      
     
    
     
      
     
     
      
       IWI:   
       50073298   
       45861632   
       45568755   
       45833911   IRQ work interrupts
      
     
    
     
      
     
     
      
       RTR:          
       0          
       0          
       0          
       0   APIC ICR read retries
      
     
    
     
      
     
     
      
       RES: 
       3472920231 
       3022439316 
       2990464825 
       3012790828   Rescheduling interrupts
      
     
    
     
      
     
     
      
       CAL:    
       5131479    
       6539715   
       17285454   
       11211131   Function call interrupts
      
     
    
     
      
     
     
      
       TLB:   
       23094853   
       24045725   
       24230472   
       24271286   TLB shootdowns
      
     
    
     
      
     
     
      
       TRM:          
       0          
       0          
       0          
       0   Thermal event interrupts
      
     
    
     
      
     
     
      
       ...

在尝试修改相关中断号绑定的CPU后（参考：SMP affinity and proper interrupt handling in Linux[3]），问题还是没有得到缓解。

后来某神秘大佬给指点了下，才发现原来是个内核的bug，涉及到参数 kernel.timer_migration，需要将其设置为 0 才行。

[root@yejr.run]# sysctl -w kernel.timer_migration=0

当然了，最好持久化写入到 /etc/sysctl.conf 文件中。


   
    
     
      
     
     
      
       [root@yejr.run]# cat /etc/sysctl.conf
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       kernel.timer_migration=
       0
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       #加载配置文件使之生效
      
     
    
     
      
     
     
      
       [root@yejr.run]# sysctl -p

再次用 mysql_random_load 工具进行压测就没事了。

下面是关于该bug的描述


   
    
     
      
     
     
      
       The bug is when linux os receive too many tcp packages, 
      
     
    
     
      
     
     
      
       and the tcp may add too many timer, but in
      
     
    
     
      
     
     
      
       get_target_base->get_nohz_timer_target it check current 
      
     
    
     
      
     
     
      
       cpu is idle, sometimes thouth the current core is very busy,
      
     
    
     
      
     
     
      
       but the idle_cpu result is 
       1, in this time
      
     
    
     
      
     
     
      
       if set kernel.timer_migration=
       1 ,the timer will be move to next cpu.

bug详情见：Bug 124661 - kernel.timer_migration=1 cause too many Rescheduling interrupts[4]

最后，值得一提的是，在云主机上修改该参数应该是不管用，除非修改物理机的。我在某云主机上运行 mysql_random_load 工具压测也遇到类似问题，修改内核参数后问题依旧。

文内链接

[1]:https://github.com/xelabs/benchyou
[2]:https://github.com/Percona-Lab/mysql_random_data_load
[3]:http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux
[4]:https://bugzilla.kernel.org/show_bug.cgi?id=124661

全文完。

Enjoy Linux & MySQL :)

「MySQL核心优化」课已升级到MySQL 8.0，扫码开启MySQL修行之旅

转载：https://blog.csdn.net/n88Lpo/article/details/108957523

查看评论

飞道的博客

飞道的博客

个人资料

文章分类

文章存档

阅读排行

评论排行

推荐文章