记一次mogodb占用cpu高问题

公司服务器上安装了contly,是一个开源的node.js项目,用于统计手机app使用情况,后端数据储存使用的mongodb,使用的时候经常发现mongodb占用cpu非常高,打到了210%的爆表值

top - 13:42:39 up 308 days, 23:01,  2 users,  load average: 2.84, 2.96, 2.93
Tasks: 209 total,   1 running, 208 sleeping,   0 stopped,   0 zombie
%Cpu(s): 59.4 us,  2.7 sy,  0.0 ni, 36.9 id,  0.2 wa,  0.7 hi,  0.0 si,  0.0 st
KiB Mem:   8173524 total,  6095460 used,  2078064 free,   133752 buffers
KiB Swap:  1048572 total,   750536 used,   298036 free.  3480688 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16870 root      20   0 48.452g 2.643g 2.490g S 210.5 33.9  38607:41 mongod
28508 root      20   0  891316 110272   5976 S   5.0  1.3  10:07.08 nodejs
28589 root      20   0  893736 114044   5972 S   5.0  1.4  13:19.13 nodejs
28566 root      20   0  959004 113128   5976 S   4.0  1.4  11:16.49 nodejs
28670 root      20   0 1189932 383820   5980 S   4.0  4.7  26:00.97 nodejs
16367 www-data  20   0   86960   2952    996 S   1.0  0.0 237:47.74 nginx
16368 www-data  20   0   87028   3212    996 S   1.0  0.0 235:54.43 nginx
16370 www-data  20   0   86964   2952    996 S   1.0  0.0 245:46.74 nginx
16371 www-data  20   0   86960   2948    996 S   1.0  0.0 246:40.89 nginx
 1154 root      20   0   23792   1764   1144 R   0.7  0.0   0:00.23 top
    7 root      20   0       0      0      0 S   0.3  0.0   1053:48 rcu_sched
    8 root      20   0       0      0      0 S   0.3  0.0 229:02.41 rcuos/0
 6559 root      20   0 3957096 217820   5668 S   0.3  2.7 230:37.95 jsvc
    1 root      20   0   37348   3704    976 S   0.0  0.0   0:19.37 init
    2 root      20   0       0      0      0 S   0.0  0.0   0:02.51 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0  68:42.78 ksoftirqd/0
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H
    9 root      20   0       0      0      0 S   0.0  0.0 225:56.62 rcuos/1
   10 root      20   0       0      0      0 S   0.0  0.0 217:26.86 rcuos/2
   11 root      20   0       0      0      0 S   0.0  0.0 228:57.90 rcuos/3
   12 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/4
   13 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/5
   14 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/6
   15 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/7
   16 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/8
   17 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/9
   18 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/10
   19 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/11
   20 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/12
   21 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/13
   22 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/14
   23 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/15
   24 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/16
   25 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/17
   26 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/18
   27 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/19
   28 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/20
   29 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/21
   30 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/22
   31 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/23
   32 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/24
   33 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/25
   34 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/26
   35 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/27
   36 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/28
   37 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/29
   38 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/30
   39 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/31
   40 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh
   41 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/0
   42 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/1
   43 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/2
   44 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/3
   45 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/4
   46 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/5
   47 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/6
   48 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/7
   49 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/8
   50 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/9
   51 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/10
   52 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/11   

仔细查看是中断很多,命令是pid -w 1

[email protected]:~# pidstat -w 1
Linux 3.13.0-24-generic (localhost)     07/14/2017      _x86_64_        (4 CPU)

01:44:41 PM   UID       PID   cswch/s nvcswch/s  Command
01:44:42 PM     0         7    260.40      0.00  rcu_sched
01:44:42 PM     0         8     78.22      0.00  rcuos/0
01:44:42 PM     0         9     49.50      0.00  rcuos/1
01:44:42 PM     0        10     88.12      0.00  rcuos/2
01:44:42 PM     0        11     96.04      0.00  rcuos/3
01:44:42 PM     0        74      0.99      0.00  watchdog/0
01:44:42 PM     0        75      0.99      0.00  watchdog/1
01:44:42 PM     0        77      7.92      0.00  ksoftirqd/1
01:44:42 PM     0        80      0.99      0.00  watchdog/2
01:44:42 PM     0        82     39.60      0.00  ksoftirqd/2
01:44:42 PM     0        85      0.99      0.00  watchdog/3
01:44:42 PM     0        87      9.90      0.00  ksoftirqd/3
01:44:42 PM     0      1103      7.92      0.00  kworker/u64:2
01:44:42 PM     0      1213      0.99      4.95  pidstat
01:44:42 PM     0      1265      0.99      0.00  supervisord
01:44:42 PM     0      1279      9.90      0.00  vmtoolsd
01:44:42 PM     0      1373      0.99      0.00  fail2ban-server
01:44:42 PM     0      1954     18.81      0.00  xfsaild/sdb1
01:44:42 PM     0      2132      9.90      0.00  kworker/1:1H
01:44:42 PM     0      6800      0.99      0.00  kworker/0:1
01:44:42 PM     0      6807      0.99      0.00  kworker/2:2
01:44:42 PM  1000     14269      0.99      0.00  zabbix_agentd
01:44:42 PM  1000     14273      0.99      0.00  zabbix_agentd
01:44:42 PM    33     16367    154.46      0.00  nginx
01:44:42 PM    33     16368    178.22      0.00  nginx
01:44:42 PM    33     16370    166.34      0.00  nginx
01:44:42 PM    33     16371    168.32      0.00  nginx
01:44:42 PM     0     16870    100.00      0.00  mongod
01:44:42 PM     0     28498      0.99      0.00  supervisord
01:44:42 PM     0     28508    126.73     39.60  nodejs
01:44:42 PM     0     28566    141.58     46.53  nodejs
01:44:42 PM     0     28589    137.62    110.89  nodejs
01:44:42 PM     0     28670    123.76     96.04  nodejs
01:44:42 PM     0     28716      6.93      0.00  kworker/3:1
01:44:42 PM     0     28732      0.99      0.00  kworker/1:0

vmstat

[email protected]:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  1 750536 2089980 133864 3482948    0    0     8   738    0    0 14  4 81  1  0
 3  0 750536 2089932 133864 3482948    0    0     0  2062 6793 9336 67  5 28  0  0
 3  0 750536 2086820 133864 3482952    0    0     0  1767 6125 8760 72  5 23  0  0
 4  0 750536 2084232 133864 3482952    0    0     0  1611 6227 8686 60  4 37  0  0
 3  0 750536 2088264 133864 3482952    0    0     0   140 5974 8556 65  4 31  0  0
 4  0 750536 2083864 133864 3482956    0    0     0  2635 9100 10589 78  5 17  0  0
 2  0 750536 2085624 133864 3482916    0    0     0   168 8152 11602 63  5 33  0  0

中断和上下文切换多,是因为nodejs一直在被从sleep中唤醒,实际上是countly要接受手机端返回来的数据,不断被IO唤醒,每次唤醒都要上下文切换,所以表象是上下文频繁切换,实质上是counly负担比较重。

不过这个貌似和mongodb没有关系,只能说明countly的负担比较重,一直在写入mongodb,所以现在mongodb的负担也比较大。其实io算不上高,cpu的iowait也不多,硬盘写入并不是瓶颈,可能是mongodb本身的问题

查看mongostat

[email protected]:~# mongostat
connected to: 127.0.0.1
insert  query update delete getmore command flushes mapped  vsize    res faults     locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time
    13    171    424      2       0    15|0       0    24g  48.5g  2.64g      0 countly:11.9%          0       0|0     3|0   102k    67k    12   13:51:04
     9    225   1358    105       0    13|0       0    24g  48.5g  2.65g      0  countly:6.5%          0       0|0     0|1   255k   177k    12   13:51:05
   218    967   3056     11       0    82|0       0    24g  48.5g  2.63g      0 countly:20.7%          0       0|0     2|0   807k   965k    12   13:51:06
    19    158    315      2       0    12|0       0    24g  48.5g  2.65g      0  countly:1.7%          0       0|0     1|0    88k    92k    12   13:51:07
    17    132    270      2       0     8|0       0    24g  48.5g  2.58g      0  countly:2.1%          0       0|0     2|0    77k   115k    12   13:51:08
    14    151    433      2       0    11|0       0    24g  48.5g  2.64g      0  countly:2.2%          0       0|0     3|0   106k    89k    12   13:51:09
    22    181    335      3       0    17|0       0    24g  48.5g  2.66g      0  countly:7.9%          0       0|0     3|0    96k   105k    12   13:51:10
    36    230    765      3       0     8|0       0    24g  48.5g  2.66g      0  countly:3.6%          0       1|0     0|1   184k   224k    12   13:51:11
   122    567   1580      1       0    56|0       0    24g  48.5g  2.67g      0 countly:16.4%          0       0|0     2|0   440k   559k    12   13:51:12
     8    124    286      4       0     8|0       0    24g  48.5g  2.65g      0  countly:2.0%          0       0|0     2|0    67k    61k    12   13:51:13
insert  query update delete getmore command flushes mapped  vsize    res faults     locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time
    21    193    450      1       0    18|0       0    24g  48.5g  2.66g      0  countly:5.0%          0       0|0     3|0   119k   126k    12   13:51:14
   156    774    950      1       0    67|0       0    24g  48.5g  2.65g      0 countly:18.5%          0       0|0     3|0   414k   623k    12   13:51:15
     7     69    194      2       0    12|0       0    24g  48.5g  2.65g      0  countly:1.4%          0       0|0     3|0    49k    40k    12   13:51:16
    33    280    525      2       0    12|0       0    24g  48.5g  2.64g      0  countly:4.8%          0       0|0     3|0   145k   187k    12   13:51:17
    18    201    563      5       0    17|0       0    24g  48.5g  2.66g      0  countly:5.5%          0       0|0     3|0   141k   126k    12   13:51:18
    10    144    334      2       0    15|0       0    24g  48.5g  2.65g      0  countly:4.4%          0       0|0     4|0    84k   103k    12   13:51:19
    14    210    525      4       0    19|0       0    24g  48.5g  2.63g      0  countly:5.8%          0       0|0     1|0   125k    96k    12   13:51:20
   563   1959   2259      2       0   222|0       0    24g  48.5g  2.61g      0 countly:49.2%          0       0|0     1|0     1m     2m    12   13:51:21
   227    692   1326      2       0    60|0       0    24g  48.5g  2.68g      3 countly:14.3%          0       0|0     2|0   647k   652k    12   13:51:22
    17    146    315      1       0     8|0       0    24g  48.5g  2.65g      0  countly:3.5%          0       2|0     0|1    87k    95k    12   13:51:23
insert  query update delete getmore command flushes mapped  vsize    res faults     locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time
    23    166    376      1       0    10|0       0    24g  48.5g  2.65g      0  countly:7.4%          0       0|0     2|0   104k   104k    12   13:51:24
    18    113    212     *0       0     7|0       0    24g  48.5g  2.64g      0 countly:12.6%          0       0|0     2|0    65k    75k    12   13:51:25
    13    177    375     *0       0    19|0       0    24g  48.5g  2.64g      0  countly:2.4%          0       3|0     0|1    93k   110k    12   13:51:26
    20    147    299      5       0     5|0       0    24g  48.5g  2.63g      0  countly:1.9%          0       0|0     2|0    85k   103k    12   13:51:27
    40    280    476      2       0    15|0       1    24g  48.5g  2.63g      0  countly:2.9%          0       0|0     1|0   157k   247k    12   13:51:28
     7    108    263      2       0     5|0       0    24g  48.5g  2.64g      0  countly:1.6%          0       0|0     2|0    62k    78k    12   13:51:29
    17    181    394     *0       0     8|0       0    24g  48.5g  2.59g      0  countly:5.4%          0       0|0     2|0   101k   147k    12   13:51:30
    17    152    350      1       0    12|0       0    24g  48.5g  2.63g      0  countly:3.7%          0       0|0     1|0    90k    80k    12   13:51:31
    14    123    261      4       0    11|0       0    24g  48.5g  2.58g      0  countly:4.0%          0       2|0     0|1    72k   108k    12   13:51:32
   184    741   1220     21       0    84|0       0    24g  48.5g  2.67g      0 countly:25.3%          0       0|0     4|0   492k   654k    12   13:51:33 

mongodb的锁很多,怀疑是因为写入慢,一直在等待,浪费了大量的cpu时间

进入到mongodb中查看

[email protected]:~# mongo
MongoDB shell version: 2.4.14
connecting to: test
> db.serverStatus()
{
        "host" : "localhost",
        "version" : "2.4.14",
        "process" : "mongod",
        "pid" : 16870,
        "uptime" : 1551490,
        "uptimeMillis" : NumberLong(1551489627),
        "uptimeEstimate" : 1537152,
        "localTime" : ISODate("2017-07-14T06:12:34.348Z"),
        "asserts" : {
                "regular" : 0,
                "warning" : 0,
                "msg" : 0,
                "user" : 1,
                "rollovers" : 0
        },
        "backgroundFlushing" : {
                "flushes" : 25858,
                "total_ms" : 6413133,
                "average_ms" : 248.01349679016164,
                "last_ms" : 111,
                "last_finished" : ISODate("2017-07-14T06:12:27.580Z")
        },
        "connections" : {
                "current" : 12,
                "available" : 19988,
                "totalCreated" : NumberLong(660)
        },
        "cursors" : {
                "totalOpen" : 2,
                "clientCursors_size" : 2,
                "timedOut" : 125,
                "totalNoTimeout" : 2
        },
        "dur" : {
                "commits" : 29,
                "journaledMB" : 0.28672,
                "writeToDataFilesMB" : 0.272729,
                "compression" : 0.9344831856907262,
                "commitsInWriteLock" : 0,
                "earlyCommits" : 0,
                "timeMs" : {
                        "dt" : 3071,
                        "prepLogBuffer" : 0,
                        "writeToJournal" : 49,
                        "writeToDataFiles" : 4,
                        "remapPrivateView" : 5
                }
        },
        "extra_info" : {
                "note" : "fields vary by platform",
                "heap_usage_bytes" : 174045352,
                "page_faults" : 37855
        },
        "globalLock" : {
                "totalTime" : NumberLong("1551489627000"),
                "lockTime" : NumberLong("28450005021"),
                "currentQueue" : {
                        "total" : 0,
                        "readers" : 0,
                        "writers" : 0
                },
                "activeClients" : {
                        "total" : 2,
                        "readers" : 2,
                        "writers" : 0
                }
        },
        "indexCounters" : {
                "accesses" : 2559023213,
                "hits" : 2559023356,
                "misses" : 0,
                "resets" : 0,
                "missRatio" : 0
        },
        "locks" : {
                "." : {
                        "timeLockedMicros" : {
                                "R" : NumberLong(1631854144),
                                "W" : NumberLong("28450005021")
                        },
                        "timeAcquiringMicros" : {
                                "R" : NumberLong(1839692044),
                                "W" : NumberLong(269051643)
                        }
                },
                "admin" : {
                        "timeLockedMicros" : {

                        },
                        "timeAcquiringMicros" : {

                        }
                },
                "local" : {
                        "timeLockedMicros" : {
                                "r" : NumberLong(2235967),
                                "w" : NumberLong(0)
                        },
                        "timeAcquiringMicros" : {
                                "r" : NumberLong(856059),
                                "w" : NumberLong(0)
                        }
                },
                "countly" : {
                        "timeLockedMicros" : {
                                "r" : NumberLong("4052844846798"),
                                "w" : NumberLong("77669514748")
                        },
                        "timeAcquiringMicros" : {
                                "r" : NumberLong("2069545770119"),
                                "w" : NumberLong("82155316804")
                        }
                },
                "countly_drill" : {
                        "timeLockedMicros" : {
                                "r" : NumberLong(874145401),
                                "w" : NumberLong("19474885608")
                        },
                        "timeAcquiringMicros" : {
                                "r" : NumberLong(271347249),
                                "w" : NumberLong(1475646788)
                        }
                },
                "test" : {
                        "timeLockedMicros" : {
                                "r" : NumberLong(68640),
                                "w" : NumberLong(79192)
                        },
                        "timeAcquiringMicros" : {
                                "r" : NumberLong(146),
                                "w" : NumberLong(26)
                        }
                }
        },
        "network" : {
                "bytesIn" : 233060269131,
                "bytesOut" : 258427384819,
                "numRequests" : 1384182586
        },
        "opcounters" : {
                "insert" : 43661297,
                "query" : 434433635,
                "update" : 877502004,
                "delete" : 4880964,
                "getmore" : 1023,
                "command" : 26378757
        },
        "opcountersRepl" : {
                "insert" : 0,
                "query" : 0,
                "update" : 0,
                "delete" : 0,
                "getmore" : 0,
                "command" : 0
        },
        "recordStats" : {
                "accessesNotInMemory" : 8466,
                "pageFaultExceptionsThrown" : 3696,
                "countly" : {
                        "accessesNotInMemory" : 1231,
                        "pageFaultExceptionsThrown" : 221
                },
                "countly_drill" : {
                        "accessesNotInMemory" : 7235,
                        "pageFaultExceptionsThrown" : 3475
                },
                "local" : {
                        "accessesNotInMemory" : 0,
                        "pageFaultExceptionsThrown" : 0
                },
                "test" : {
                        "accessesNotInMemory" : 0,
                        "pageFaultExceptionsThrown" : 0
                }
        },
        "writeBacksQueued" : false,
        "mem" : {
                "bits" : 64,
                "resident" : 2692,
                "virtual" : 49774,
                "supported" : true,
                "mapped" : 24630,
                "mappedWithJournal" : 49260
        },
        "metrics" : {
                "document" : {
                        "deleted" : NumberLong(4398459),
                        "inserted" : NumberLong(43661297),
                        "returned" : NumberLong(249187499),
                        "updated" : NumberLong(868225290)
                },
                "getLastError" : {
                        "wtime" : {
                                "num" : 1332687,
                                "totalMillis" : 154
                        },
                        "wtimeouts" : NumberLong(0)
                },
                "operation" : {
                        "fastmod" : NumberLong(846861873),
                        "idhack" : NumberLong(1085704909),
                        "scanAndOrder" : NumberLong(0)
                },
                "queryExecutor" : {
                        "scanned" : NumberLong(1095979673)
                },
                "record" : {
                        "moves" : NumberLong(1249911)
                },
                "repl" : {
                        "apply" : {
                                "batches" : {
                                        "num" : 0,
                                        "totalMillis" : 0
                                },
                                "ops" : NumberLong(0)
                        },
                        "buffer" : {
                                "count" : NumberLong(0),
                                "maxSizeBytes" : 268435456,
                                "sizeBytes" : NumberLong(0)
                        },
                        "network" : {
                                "bytes" : NumberLong(0),
                                "getmores" : {
                                        "num" : 0,
                                        "totalMillis" : 0
                                },
                                "ops" : NumberLong(0),
                                "readersCreated" : NumberLong(0)
                        },
                        "oplog" : {
                                "insert" : {
                                        "num" : 0,
                                        "totalMillis" : 0
                                },
                                "insertBytes" : NumberLong(0)
                        },
                        "preload" : {
                                "docs" : {
                                        "num" : 0,
                                        "totalMillis" : 0
                                },
                                "indexes" : {
                                        "num" : 0,
                                        "totalMillis" : 0
                                }
                        }
                },
                "ttl" : {
                        "deletedDocuments" : NumberLong(10),
                        "passes" : NumberLong(25857)
                }
        },
        "ok" : 1
}

锁确实比较多,查看mongotop

[email protected]:/mnt/mongodb/log# mongotop
connected to: 127.0.0.1

                                                                ns       total        read       write          2017-07-14T06:46:41
                      countly.online_users55d2e563292d6b2c7d1d132d      1669ms      1665ms         4ms
                      countly.online_users5770b587f4a7f02a6a2b9cce        53ms        52ms         1ms
                                                  countly.carriers        42ms         0ms        42ms
                      countly.online_users5506ab1979a3a67e5214cda6        40ms        38ms         2ms
                                                   countly.devices        16ms         0ms        16ms
                                                      countly.apps         9ms         9ms         0ms
                                                     countly.users         8ms         0ms         8ms
                                            countly.device_details         7ms         0ms         7ms
                         countly.app_users55d2e563292d6b2c7d1d132d         4ms         0ms         4ms

                                                                ns       total        read       write          2017-07-14T06:46:42
                      countly.online_users55d2e563292d6b2c7d1d132d      3671ms      3669ms         2ms
                      countly.online_users5770b587f4a7f02a6a2b9cce        64ms        63ms         1ms
                      countly.online_users5506ab1979a3a67e5214cda6        26ms        25ms         1ms
                                                   countly.devices        19ms         0ms        19ms
                                                  countly.carriers        16ms         0ms        16ms
                                                     countly.users         6ms         0ms         6ms
                                                      countly.apps         6ms         6ms         0ms
                         countly.app_users55d2e563292d6b2c7d1d132d         4ms         1ms         3ms
countly_drill.drill_eventse02cd05a335f76036692df25f94be2b3d5c20bd2         3ms         0ms         3ms

                                                                ns       total        read       write          2017-07-14T06:46:43
                      countly.online_users55d2e563292d6b2c7d1d132d      1709ms      1705ms         4ms
                      countly.online_users5506ab1979a3a67e5214cda6        33ms        31ms         2ms
                                                  countly.carriers        19ms         0ms        19ms
                                                   countly.devices        10ms         0ms        10ms
                                                     countly.users        10ms         0ms        10ms
                      countly.online_users54feb14879a3a67e52f6f7cd         7ms         7ms         0ms
                                                      countly.apps         5ms         5ms         0ms
                         countly.app_users55d2e563292d6b2c7d1d132d         4ms         0ms         4ms
countly_drill.drill_eventsbb446ae67767dec699789c496e6c067b32dee999         3ms         0ms         3ms

                                                                ns       total        read       write          2017-07-14T06:46:44
                      countly.online_users55d2e563292d6b2c7d1d132d      2301ms      2298ms         3ms
                      countly.online_users5770b587f4a7f02a6a2b9cce       262ms       261ms         1ms
                      countly.online_users5506ab1979a3a67e5214cda6        35ms        34ms         1ms
countly_drill.drill_eventse02cd05a335f76036692df25f94be2b3d5c20bd2        22ms         0ms        22ms
                      countly.online_users54feb14879a3a67e52f6f7cd         8ms         7ms         1ms
                                                     countly.users         7ms         0ms         7ms
countly_drill.drill_eventscc860280b73e64c3413b38f2e9959acb7babd3f9         6ms         0ms         6ms
                         countly.app_users55d2e563292d6b2c7d1d132d         5ms         2ms         3ms
                                                      countly.apps         4ms         4ms         0ms

是读的时间很长,这个怀疑是没有建立索引,另外mongodb本身有一个bug

https://docs.mongodb.com/manual/administration/production-notes/

解决方法如下所示

#numactl --interleave all command

原理如下

http://www.cnblogs.com/Lifehacker/p/database_swap_insanity_on_Linux.html

这么操作了好像也没什么效果,最后想到之前这个数据库因为磁盘爆满清空过,之前听说过mongodb迁移后会丢失索引,要重建索引,我查了一下索引

> db.online_users55d2e563292d6b2c7d1d132d.find()
{ "_id" : "4e5d3922-6553-430a-849f-8b1007373d7c", "la" : ISODate("2017-06-30T00:55:08.933Z"), "n" : 0 }
{ "_id" : "55d7d514-f75c-432a-b557-17c036437a32", "la" : ISODate("2017-06-29T16:38:12.863Z"), "n" : 0 }
{ "_id" : "0118022e-54d2-4fb9-998d-a544d9a76d42", "la" : ISODate("2017-07-05T17:17:44.307Z"), "n" : 0 }
{ "_id" : "2a160945-fcc1-42f5-8fd5-693f97c4c332", "la" : ISODate("2017-07-02T07:01:35.619Z"), "n" : 0 }
{ "_id" : "14cd09d4-fd09-43b8-9259-8a007fc08b45", "la" : ISODate("2017-06-28T01:23:29.400Z"), "n" : 0 }
{ "_id" : "b9fa7cc9-cd8c-4a1b-83fb-fc4c4f929715", "la" : ISODate("2017-06-26T13:38:32.887Z"), "n" : 1 }
{ "_id" : "1d968b85-2571-4cf7-9520-bfeb44d893bc", "la" : ISODate("2017-07-13T13:08:18.807Z"), "n" : 0 }
{ "_id" : "0b1a3d07-8e5d-4e78-a93f-d13811a81b86", "la" : ISODate("2017-07-13T03:10:39.927Z"), "n" : 0 }
{ "_id" : "18d8f9fb-dff3-46e6-a3c7-bb19db748c4a", "la" : ISODate("2017-07-08T17:59:37.518Z"), "n" : 0 }
{ "_id" : "a81349ac-9d83-4ef7-be0b-1b9bf64f867f", "la" : ISODate("2017-07-13T23:27:35.174Z"), "n" : 0 }
{ "_id" : "dd841258-36a6-45de-bc45-450441d9bf33", "la" : ISODate("2017-07-14T06:34:23.838Z"), "n" : 0 }
{ "_id" : "75720c9f-69af-4833-ae14-78a381cc31b6", "la" : ISODate("2017-07-13T15:57:28.700Z"), "n" : 0 }
{ "_id" : "ef37ff3d-979d-4275-9991-398ccb2d5576", "la" : ISODate("2017-07-14T04:45:38.872Z"), "n" : 0 }
{ "_id" : "465f923e-b0f1-4aa7-8f68-2d7db95973b2", "la" : ISODate("2017-07-07T11:47:53.029Z"), "n" : 0 }
{ "_id" : "e6b5fd38-bb56-4634-9232-7eeceb8cf46a", "la" : ISODate("2017-07-09T03:27:04.545Z"), "n" : 0 }
{ "_id" : "d4957c5d-154f-4c4f-8843-9792beab4817", "la" : ISODate("2017-07-11T21:49:35.417Z"), "n" : 0 }
{ "_id" : "08d0d626-b37b-4761-81b2-98b5ee426df8", "la" : ISODate("2017-06-29T13:56:14.974Z"), "n" : 0 }
{ "_id" : "6d2eef45-cb25-4ff6-b8cd-497ab3b938ae", "la" : ISODate("2017-07-07T02:19:16.154Z"), "n" : 0 }
{ "_id" : "6f444a2b-e2fc-4ab0-8b1d-d4334bf095e3", "la" : ISODate("2017-07-09T12:17:05.711Z"), "n" : 0 }
{ "_id" : "85e06ac1-7158-42fb-95ed-2cd39a7f0687", "la" : ISODate("2017-06-26T02:27:25.389Z"), "n" : 1 }
Type "it" for more
> db.online_users55d2e563292d6b2c7d1d132d.find({"_id" : "4e5d3922-6553-430a-849f-8b1007373d7c"}).explain()
{
        "cursor" : "BtreeCursor _id_",
        "isMultiKey" : false,
        "n" : 1,
        "nscannedObjects" : 1,
        "nscanned" : 1,
        "nscannedObjectsAllPlans" : 1,
        "nscannedAllPlans" : 1,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "start" : {
                        "_id" : "4e5d3922-6553-430a-849f-8b1007373d7c"
                },
                "end" : {
                        "_id" : "4e5d3922-6553-430a-849f-8b1007373d7c"
                }
        },
        "server" : "localhost:27017"
}

明明是走索引的,可能是la这一列没有加索引(这个我当时没有看,但是从现在来看这个思路应该是正确的)

加上索引

db.online_users547c2a634dcc1eef3d000001.ensureIndex({la: 1}, {expireAfterSeconds: 180});

因为这个是cache性质频繁读写的表,要转成Capped Collection,自动老化移出,优化性能,并设置最大记录数

db.runCommand({"convertToCapped":"online_users547c2a634dcc1eef3d000001",size:60})

 db.runCommand({"convertToCapped":"live_data547c2a634dcc1eef3d000001",size:60})

这样之后结果就是好多了,再观察一段时间吧

时间: 2024-10-08 03:02:41

记一次mogodb占用cpu高问题的相关文章

关于csrss.exe和winlogon.exe进程多、占用CPU高的解决办法

原地址 http://blog.sina.com.cn/s/blog_912e77480101nuif.html 最近VPS的CPU一直处在100%左右,后台管理上去经常打不开,后来发现上远程都要好半天才反映过来,看到任务管理器有多个winlogon.exe进程和占用CPU高,百度查了下,说是有人在破解远程,最后按教程修改3389端口和改密码,重启后CPU直接降下来了,winlogon.exe进程也只有一个,一切恢复正常. 我太懒了,之前知道winlogon.exe进程多,但没这几天严重,今天查

查看JAVA占用CPU高的线程日志

# 1. 查看主进程占用cpu高 top # 1647 java # 2. 按照线程占用cpu由高到低进行排查: ps -mp 1647 -o THREAD,tid,time | sort -rn |head -6 # USER %CPU PRI SCNT WCHAN USER SYSTEM TID TIME # root 1.6 - - - - - - 00:02:07 # root 0.1 19 - futex_ - - 1767 00:00:11 # root 0.1 19 - futex

记一次w3wp占用CPU过高的解决过程(Dictionary和线程安全)

项目上线以来一直存在一个比较揪心的问题,和一个没有信心处理的BUG,那就是在应用程序启动时有可能会导致cpu跑满99%或持续在一个值如50%左右,这样一来对服务器的压力是非常大的,经常出现服务器无法远程的状态,唯有通过PowerShell杀掉对应的w3wp进程才可以解决这个问题. 为什么没有信心处理这个问题 原因非常简单,这个问题是间歇性的,不容易重现的,只会在项目启动时有一定的可能性会发生CPU跑满的问题. 所有可以重现的BUG的处理都不会太难,而类似这种无法重现的BUG是最让人头疼的,因为它

一次mysql占用cpu高的处理过程

今天早上在正式服部署了新代码,过了一段时间,服务器的负载告警,cpu使用率告警,登录服务器查询,发现是mysql导致cpu的使用率过高,于是show processlist查询了一下,看到有很多线程处于sending data和lock的状态中,都是select某个数据库的某张表的操作. 于是将sending data的那些sql语句复制执行了一遍,发现执行它们的时间太长,然后又explain分析了一下. 有条select的条件中没有主键和索引,由于查询条件中那个字段具有唯一性,所以和开发商量将

w3wp占用CPU过高

w3wp占用CPU过高 在此之前项目有发生过两次类似的状况,都得以解决,但最近又会发现偶尔CPU会跑满,虽然之前使用过WinDbg解决过两次问题但人的记忆是不可靠的,今天处理同样问题的时候还是遇到了一些障碍,这一次希望可以记录的更全面些. 上两次的博文链接:记一次w3wp占用CPU过高的解决过程(Dictionary和线程安全).EntityFramework中的线程安全,又是Dictionary. 首先请大家不要喷我,因为这一次还是关于Dictionary的一些低级错误,我自己看到都无语了..

tomcat7+java压测过程中占用CPU过高排查故障和解决办法

环架构境: 前端haproxy做为反向代理,后端N+1台tomcat+java服务 出现问题: 环境是新搭建的,本周在做压测刚开始的时候正常,随着量的上涨,导致CPU一直暴涨. 解决办法和思路: 1.)先通过top命令查看占用cpu高的PID # 根据top命令查看发现PID为2195和975的的进程占用CPU高达%200+,明显出现故障 2.)通过top -H -p pid命令查看,发现2275 3302 3375这几个进程占用CPU时间8分钟 3.)把线程pid转换为16进制,例如:上面的p

java web服务器cpu占用过高的处理

平时项目中有时遇到cpu过高的情况,在此基于自己有限的经验写个分享,此处的服务器都是基于linux平台. cpu的占有线程类型总的来说分为两种:us :用户空间占用CPU百分比sy :内核空间占用CPU百分比在linux下可以通过top命令查看详细,示例如下: 一般来讲CPU us高的解决方法:CPU us 高的原因主要是执行线程不需要任何挂起动作,且一直执行,导致CPU 没有机会去调度执行其他的线程.CPU sy高的解决方法:CPU sy 高的原因主要是线程的运行状态要经常切换,对于这种情况,

win10 解决 WMI Provider Host 占用CPU过高问题

真心懒得写Blog,但是之前遇到这个问题在网上查了一大圈,几乎一摸一样都是让关防火墙等服务的,然而对于我来说,并没有毛线用. 无奈,直接去微软社区查,还真有一篇问题解决方案.顺手翻译一下放在这里,希望能帮到大家. 参考链接:https://answers.microsoft.com/en-us/windows/forum/windows_10-other_settings/wmi-provider-host-high-cpu-usage-and-laptop/44cb3953-a883-404a

【经验之谈】mysql占用CPU利用率过高的问题

前段时间Zabbix上出现自告警,报Escalators进程压力过大,增大Escalators参数,等改到80-90才不告警了,就觉得不对劲. 到设备上top一下发现mysql把95%上的CPU都吃掉了 上mysql去show processlist; 发现有大量sleep进程,show status like 'Thread%':看到连接数只有138. 修改my.cnf文件,将max_connections降低到300, 发现没有改善,百度到可能是全局读写timeout时间没有优化的问题 修改