0%

基于HDP搭建的的Hbase调优实践-CMS GC调优

背景

环境

HDP : 2.6.5.0-292

Hbase : 1.1.2

Kylin : 2.6.2

Kylin使用hbase作为存储,每小时的第五分钟开始构建,有十几个Cube,平均构建时间十五分钟。采用构建集群和存储集群分离。

问题

最近几天观察Kylin的查询响应时间曲线图,发现每天的响应时间在1-3秒请求的在逐步增多,也就是说,kylin查询变得越来越慢,但是三秒以上的却很少,几乎没有,说明这个慢,是比较平稳的变慢,而不是一会儿很快,一会慢的很离谱(可能发生了Full GC,或者老年代GC时间过长),请看下图。

image

分析

由于访问量增加导致?

通过查看过去几天的访问量,发现并没有过多变化,非常平稳。所以这个原因排除 image

由于机器负载过高或者内存不足或者网络流量过高导致?

  1. 负载并无异常

image

  1. 内存并无异常

image

  1. 网络并无异常

image

Kylin JVM 异常

查看Vm args 信息

1
jinfo  -flags 98730

image

可以看出,Kylin使用的是CMS老年代垃圾收集器,搭配ParNew作为新生代垃圾回收器,最大晋升年龄为6。

查看GC基本信息

1
jstat -gcutil 98730  2000 100
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT
5.05 0.00 52.12 22.66 66.71 44.97 930 78.303 8 1.574 79.877
5.05 0.00 54.85 22.66 66.71 44.97 930 78.303 8 1.574 79.877
5.05 0.00 56.57 22.66 66.71 44.97 930 78.303 8 1.574 79.877
5.05 0.00 60.09 22.66 66.71 44.97 930 78.303 8 1.574 79.877
5.05 0.00 62.48 22.66 66.71 44.97 930 78.303 8 1.574 79.877
5.05 0.00 67.98 22.66 66.71 44.97 930 78.303 8 1.574 79.877
5.05 0.00 71.43 22.66 66.71 44.97 930 78.303 8 1.574 79.877
5.05 0.00 77.33 22.66 66.71 44.97 930 78.303 8 1.574 79.877
5.05 0.00 84.86 22.66 66.71 44.97 930 78.303 8 1.574 79.877
5.05 0.00 93.40 22.66 66.71 44.97 930 78.303 8 1.574 79.877
5.05 0.00 97.94 22.66 66.71 44.97 930 78.303 8 1.574 79.877
0.00 5.72 6.87 22.66 66.67 45.00 931 78.362 8 1.574 79.936
0.00 5.72 10.84 22.66 66.67 45.00 931 78.362 8 1.574 79.936
0.00 5.72 19.11 22.66 66.67 45.00 931 78.362 8 1.574 79.936
0.00 5.72 24.05 22.66 66.67 45.00 931 78.362 8 1.574 79.936

从上面可以发现:

  1. 从运行至今,总共进行了8次Full GC,总用时不到两秒;

  2. YGC次数在20秒左右增加了一次,并且这一次Eden区减少到了 6.87%,S1区域使用5.72%,但是老年代却是没有增加加;仍然是66.67,说明Kylin虚拟机内存分布中,大都是朝生夕灭的短寿对象;

  3. Monitor GC时间也非常短。

image

结论

Kylin 的 JVM无异常现象,排除。

Hbase Regionserver的JVM问题

Hbase主要对外提供服务的进程是RegionServer,负责的任务非常多,比如读写缓存,storefile的合并等等。另外regionserver是长寿对象居多的工程,分为以下几种对象:

  1. RPC请求对象,短寿对象,随请求销毁而忘;
  2. Memstore 对象, 长寿对象 写入MemStore之后就一直存在,直到flush到hdfs,通常需要一个小时到几个小时。 一般都很大,有 2M左右;
  3. BlockCache对象,和MemStore一样,长寿对象,默认64K。

所以regionserver的jvm调优就变得非常重要。分析流程:

JVM vm args分析

1
2
3
VM Flags:
Non-default VM flags: -XX:CICompilerCount=15 -XX:CMSInitiatingOccupancyFraction=50 -XX:ErrorFile=null -XX:InitialHeapSize=13153337344 -XX:+ManagementServer -XX:MaxHeapSize=13153337344 -XX:MaxNewSize=2625634304 -XX:MaxTenuringThreshold=6 -XX:MinHeapDeltaBytes=196608 -XX:NewSize=2625634304 -XX:OldPLABSize=16 -XX:OldSize=10527703040 -XX:OnOutOfMemoryError=null -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseFastUnorderedTimeStamps -XX:+UseParNewGC
Command line: -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Dhdp.version=2.6.5.0-292 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hbase/hs_err_pid%p.log -Djava.io.tmpdir=/tmp -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/hbase/gc.log-202005141054 -Xmn2504m -XX:CMSInitiatingOccupancyFraction=50 -XX:+UseCMSInitiatingOccupancyOnly -Xms12544m -Xmx12544m -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=10102 -javaagent:/data1/prometheus/jmx_prometheus_javaagent-0.12.0.jar=17001:/data1/prometheus/hbase_jmx_config.yaml -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-regionserver-ssd3.log -Dhbase.home.dir=/usr/hdp/current/hbase-regionserver/bin/.. -Dhbase.id.str=hbase -Dhbase.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native -Dhbase.security.logger=INFO,RFAS
  1. 使用CMS老年代垃圾收集器,搭配ParNew作为新生代垃圾回收器,最大晋升年龄为6。
  2. CMSInitiatingOccupancyFraction为50,也就是老年代在内存占用为50%的时候,就会触发CMS GC。

GC 情况

1
jstat -gcutil  109443   2000 100
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT
100.00 0.00 35.92 70.33 71.54 42.51 219024 48865.108 237744 12510.026 61375.134
0.00 96.18 47.73 72.92 71.54 42.51 219025 48865.267 237744 12510.026 61375.293
0.00 96.18 95.20 72.92 71.54 42.51 219025 48865.267 237745 12510.064 61375.330
25.66 0.00 38.03 72.92 71.54 42.51 219026 48865.281 237745 12510.064 61375.345
25.66 0.00 58.60 68.28 71.54 42.51 219026 48865.281 237746 12510.131 61375.412
0.00 16.30 7.49 64.22 71.55 42.51 219027 48865.296 237747 12510.134 61375.430
0.00 16.30 68.66 64.21 71.55 42.51 219027 48865.296 237748 12510.230 61375.526
100.00 0.00 51.23 69.66 71.55 42.51 219028 48866.550 237748 12510.230 61376.779
0.00 87.40 79.47 72.36 71.55 42.51 219029 48866.728 237749 12510.246 61376.974
42.13 0.00 45.35 72.36 71.55 42.51 219030 48866.756 237749 12510.246 61377.002
42.13 0.00 56.93 69.08 71.55 42.51 219030 48866.756 237750 12510.330 61377.086
42.13 0.00 93.60 64.76 71.55 42.51 219030 48866.756 237750 12510.330 61377.086
0.00 49.08 6.81 63.74 71.55 42.51 219031 48866.784 237750 12510.330 61377.114
0.00 49.08 78.97 63.74 71.55 42.51 219031 48866.784 237751 12510.339 61377.123
100.00 0.00 60.26 68.84 71.55 42.51 219032 48867.879 237752 12510.447 61378.326
0.00 90.35 59.46 71.46 71.55 42.51 219033 48868.052 237753 12510.463 61378.515
41.10 0.00 59.31 71.83 71.55 42.51 219034 48868.160 237754 12510.562 61378.722

结果分析:

  1. YGC非常频繁,平均每两秒就会又一次YGC,总共进行二十多万次YGC,平均每一次200ms左右,时间过长;

  2. YCC每次清理的并不是很干净,比如上面第一次,Eden区从95%清理到38%;

  3. YGC发生后,老年代的使用率也增加较快,证明对象过早的进入了老年代;

  4. 老年代的GC也非常频繁,甚至比新生代的次数还要多,说明收集过于频繁。

下面是截取的一段具体的GC日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
2020-05-26T17:04:03.872+0800: 1058975.570: [CMS-concurrent-abortable-preclean: 1.554/1.863 secs] [Times: user=5.96 sys=0.00, real=1.86 secs]
2020-05-26T17:04:03.874+0800: 1058975.572: [GC (CMS Final Remark) [YG occupancy: 1437824 K (2307712 K)]2020-05-26T17:04:03.874+0800: 1058975.572: [Rescan (parallel) , 0.0891795 secs]2020-05-26T17:04:03.963+0800: 1058975.661: [weak refs processing, 0.0000377 secs]2020-05-26T17:04:03.963+0800: 1058975.661: [class unloading, 0.0188261 secs]2020-05-26T17:04:03.982+0800: 1058975.680: [scrub symbol table, 0.0064729 secs]2020-05-26T17:04:03.988+0800: 1058975.686: [scrub string table, 0.0007631 secs][1 CMS-remark: 7479514K(10280960K)] 8917339K(12588672K), 0.1154363 secs] [Times: user=2.06 sys=0.00, real=0.12 secs]
2020-05-26T17:04:03.989+0800: 1058975.687: [CMS-concurrent-sweep-start]
2020-05-26T17:04:05.636+0800: 1058977.334: [CMS-concurrent-sweep: 1.647/1.647 secs] [Times: user=2.90 sys=0.00, real=1.64 secs]
2020-05-26T17:04:05.641+0800: 1058977.340: [CMS-concurrent-reset-start]
2020-05-26T17:04:05.659+0800: 1058977.357: [CMS-concurrent-reset: 0.018/0.018 secs] [Times: user=0.04 sys=0.00, real=0.02 secs]
2020-05-26T17:04:07.660+0800: 1058979.359: [GC (CMS Initial Mark) [1 CMS-initial-mark: 6593887K(10280960K)] 8489670K(12588672K), 0.0829345 secs] [Times: user=1.60 sys=0.00, real=0.08 secs]
2020-05-26T17:04:07.743+0800: 1058979.442: [CMS-concurrent-mark-start]
2020-05-26T17:04:08.096+0800: 1058979.794: [CMS-concurrent-mark: 0.353/0.353 secs] [Times: user=2.18 sys=0.00, real=0.35 secs]
2020-05-26T17:04:08.096+0800: 1058979.795: [CMS-concurrent-preclean-start]
2020-05-26T17:04:08.117+0800: 1058979.815: [CMS-concurrent-preclean: 0.020/0.020 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
2020-05-26T17:04:08.117+0800: 1058979.815: [CMS-concurrent-abortable-preclean-start]
2020-05-26T17:04:10.577+0800: 1058982.275: [GC (Allocation Failure) 2020-05-26T17:04:10.577+0800: 1058982.275: [ParNew: 2267483K->80578K(2307712K), 0.0291582 secs] 8861370K->6674466K(12588672K), 0.0293826 secs] [Times: user=0.58 sys=0.00, real=0.03 secs]
CMS: abort preclean due to time 2020-05-26T17:04:13.332+0800: 1058985.030: [CMS-concurrent-abortable-preclean: 4.169/5.215 secs] [Times: user=6.60 sys=0.00, real=5.22 secs]
2020-05-26T17:04:13.334+0800: 1058985.032: [GC (CMS Final Remark) [YG occupancy: 586392 K (2307712 K)]2020-05-26T17:04:13.334+0800: 1058985.032: [Rescan (parallel) , 0.0208641 secs]2020-05-26T17:04:13.355+0800: 1058985.053: [weak refs processing, 0.0000680 secs]2020-05-26T17:04:13.355+0800: 1058985.053: [class unloading, 0.0568393 secs]2020-05-26T17:04:13.412+0800: 1058985.110: [scrub symbol table, 0.0138110 secs]2020-05-26T17:04:13.425+0800: 1058985.124: [scrub string table, 0.0011695 secs][1 CMS-remark: 6593887K(10280960K)] 7180280K(12588672K), 0.0929308 secs] [Times: user=0.53 sys=0.00, real=0.09 secs]
2020-05-26T17:04:13.427+0800: 1058985.125: [CMS-concurrent-sweep-start]
2020-05-26T17:04:14.511+0800: 1058986.210: [CMS-concurrent-sweep: 1.080/1.085 secs] [Times: user=2.27 sys=0.00, real=1.09 secs]
2020-05-26T17:04:14.512+0800: 1058986.210: [CMS-concurrent-reset-start]
2020-05-26T17:04:14.538+0800: 1058986.236: [CMS-concurrent-reset: 0.026/0.026 secs] [Times: user=0.05 sys=0.00, real=0.03 secs]
2020-05-26T17:04:15.144+0800: 1058986.842: [GC (Allocation Failure) 2020-05-26T17:04:15.144+0800: 1058986.842: [ParNew: 2131906K->256384K(2307712K), 1.2043355 secs] 8710471K->7428897K(12588672K), 1.2045373 secs] [Times: user=22.59 sys=0.86, real=1.21 secs]
2020-05-26T17:04:16.350+0800: 1058988.048: [GC (CMS Initial Mark) [1 CMS-initial-mark: 7172513K(10280960K)] 7453650K(12588672K), 0.0183647 secs] [Times: user=0.34 sys=0.03, real=0.02 secs]
2020-05-26T17:04:16.368+0800: 1058988.066: [CMS-concurrent-mark-start]
2020-05-26T17:04:16.640+0800: 1058988.339: [CMS-concurrent-mark: 0.272/0.272 secs] [Times: user=1.89 sys=0.40, real=0.27 secs]
2020-05-26T17:04:16.641+0800: 1058988.339: [CMS-concurrent-preclean-start]
2020-05-26T17:04:16.685+0800: 1058988.383: [CMS-concurrent-preclean: 0.044/0.044 secs] [Times: user=0.08 sys=0.01, real=0.04 secs]
2020-05-26T17:04:16.685+0800: 1058988.383: [CMS-concurrent-abortable-preclean-start]
2020-05-26T17:04:17.742+0800: 1058989.441: [GC (Allocation Failure) 2020-05-26T17:04:17.743+0800: 1058989.441: [ParNew: 2307712K->215445K(2307712K), 0.1832540 secs] 9480225K->7666856K(12588672K), 0.1834614 secs] [Times: user=2.20 sys=0.29, real=0.18 secs]
2020-05-26T17:04:18.652+0800: 1058990.350: [CMS-concurrent-abortable-preclean: 1.673/1.967 secs] [Times: user=5.81 sys=0.97, real=1.97 secs]
2020-05-26T17:04:18.654+0800: 1058990.352: [GC (CMS Final Remark) [YG occupancy: 1800437 K (2307712 K)]2020-05-26T17:04:18.654+0800: 1058990.352: [Rescan (parallel) , 0.0794339 secs]2020-05-26T17:04:18.734+0800: 1058990.432: [weak refs processing, 0.0000383 secs]2020-05-26T17:04:18.734+0800: 1058990.432: [class unloading, 0.0239246 secs]2020-05-26T17:04:18.758+0800: 1058990.456: [scrub symbol table, 0.0074005 secs]2020-05-26T17:04:18.765+0800: 1058990.463: [scrub string table, 0.0008843 secs][1 CMS-remark: 7451411K(10280960K)] 9251848K(12588672K), 0.1118495 secs] [Times: user=1.69 sys=0.16, real=0.11 secs]
2020-05-26T17:04:18.766+0800: 1058990.464: [CMS-concurrent-sweep-start]
2020-05-26T17:04:19.791+0800: 1058991.489: [GC (GCLocker Initiated GC) 2020-05-26T17:04:19.791+0800: 1058991.490: [ParNew: 2271621K->108904K(2307712K), 0.0271725 secs] 9398478K->7235761K(12588672K), 0.0274050 secs] [Times: user=0.53 sys=0.06, real=0.03 secs]
2020-05-26T17:04:20.395+0800: 1058992.093: [CMS-concurrent-sweep: 1.597/1.629 secs] [Times: user=3.04 sys=0.55, real=1.63 secs]
2020-05-26T17:04:20.395+0800: 1058992.094: [CMS-concurrent-reset-start]
2020-05-26T17:04:20.413+0800: 1058992.111: [CMS-concurrent-reset: 0.018/0.018 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
2020-05-26T17:04:22.414+0800: 1058994.112: [GC (CMS Initial Mark) [1 CMS-initial-mark: 6684475K(10280960K)] 7681507K(12588672K), 0.0121276 secs] [Times: user=0.21 sys=0.02, real=0.01 secs]
2020-05-26T17:04:22.427+0800: 1058994.125: [CMS-concurrent-mark-start]
2020-05-26T17:04:22.746+0800: 1058994.444: [CMS-concurrent-mark: 0.320/0.320 secs] [Times: user=1.73 sys=0.19, real=0.32 secs]
2020-05-26T17:04:22.746+0800: 1058994.444: [CMS-concurrent-preclean-start]
2020-05-26T17:04:22.780+0800: 1058994.479: [CMS-concurrent-preclean: 0.034/0.034 secs] [Times: user=0.03 sys=0.00, real=0.04 secs]
2020-05-26T17:04:22.780+0800: 1058994.479: [CMS-concurrent-abortable-preclean-start]
2020-05-26T17:04:26.688+0800: 1058998.386: [GC (Allocation Failure) 2020-05-26T17:04:26.688+0800: 1058998.386: [ParNew: 2159435K->111685K(2307712K), 0.0286155 secs] 8843911K->6796161K(12588672K), 0.0288247 secs] [Times: user=0.63 sys=0.00, real=0.02 secs]
CMS: abort preclean due to time 2020-05-26T17:04:28.055+0800: 1058999.753: [CMS-concurrent-abortable-preclean: 4.903/5.274 secs] [Times: user=7.02 sys=0.00, real=5.27 secs]
2020-05-26T17:04:28.057+0800: 1058999.755: [GC (CMS Final Remark) [YG occupancy: 370337 K (2307712 K)]2020-05-26T17:04:28.057+0800: 1058999.755: [Rescan (parallel) , 0.0147381 secs]2020-05-26T17:04:28.072+0800: 1058999.770: [weak refs processing, 0.0000485 secs]2020-05-26T17:04:28.072+0800: 1058999.770: [class unloading, 0.0330506 secs]2020-05-26T17:04:28.105+0800: 1058999.803: [scrub symbol table, 0.0099221 secs]2020-05-26T17:04:28.115+0800: 1058999.813: [scrub string table, 0.0011202 secs][1 CMS-remark: 6684475K(10280960K)] 7054813K(12588672K), 0.0590651 secs] [Times: user=0.36 sys=0.00, real=0.06 secs]
2020-05-26T17:04:28.116+0800: 1058999.814: [CMS-concurrent-sweep-start]
2020-05-26T17:04:29.244+0800: 1059000.942: [CMS-concurrent-sweep: 1.122/1.128 secs] [Times: user=2.61 sys=0.00, real=1.13 secs]
2020-05-26T17:04:29.244+0800: 1059000.942: [CMS-concurrent-reset-start]
2020-05-26T17:04:29.261+0800: 1059000.960: [CMS-concurrent-reset: 0.017/0.017 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]
2020-05-26T17:04:29.493+0800: 1059001.191: [GC (Allocation Failure) 2020-05-26T17:04:29.493+0800: 1059001.191: [ParNew: 2163013K->256384K(2307712K), 1.2014500 secs] 8713109K->7417968K(12588672K), 1.2018116 secs] [Times: user=24.18 sys=0.00, real=1.20 secs]
2020-05-26T17:04:30.698+0800: 1059002.396: [GC (CMS Initial Mark) [1 CMS-initial-mark: 7161584K(10280960K)] 7439893K(12588672K), 0.0217866 secs] [Times: user=0.39 sys=0.00, real=0.02 secs]
2020-05-26T17:04:30.720+0800: 1059002.418: [CMS-concurrent-mark-start]
2020-05-26T17:04:30.968+0800: 1059002.666: [CMS-concurrent-mark: 0.247/0.247 secs] [Times: user=2.05 sys=0.00, real=0.25 secs]
2020-05-26T17:04:30.968+0800: 1059002.666: [CMS-concurrent-preclean-start]
2020-05-26T17:04:30.995+0800: 1059002.693: [CMS-concurrent-preclean: 0.026/0.027 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]
2020-05-26T17:04:30.995+0800: 1059002.693: [CMS-concurrent-abortable-preclean-start]
2020-05-26T17:04:32.373+0800: 1059004.072: [GC (Allocation Failure) 2020-05-26T17:04:32.374+0800: 1059004.072: [ParNew: 2307712K->211939K(2307712K), 0.1995155 secs] 9469296K->7651281K(12588672K), 0.1997741 secs] [Times: user=2.73 sys=0.00, real=0.20 secs]
2020-05-26T17:04:33.259+0800: 1059004.958: [CMS-concurrent-abortable-preclean: 1.949/2.265 secs] [Times: user=7.84 sys=0.12, real=2.27 secs]
2020-05-26T17:04:33.261+0800: 1059004.959: [GC (CMS Final Remark) [YG occupancy: 1716846 K (2307712 K)]2020-05-26T17:04:33.261+0800: 1059004.959: [Rescan (parallel) , 0.0738579 secs]2020-05-26T17:04:33.335+0800: 1059005.033: [weak refs processing, 0.0000354 secs]2020-05-26T17:04:33.335+0800: 1059005.033: [class unloading, 0.0726293 secs]2020-05-26T17:04:33.408+0800: 1059005.106: [scrub symbol table, 0.0163362 secs]2020-05-26T17:04:33.424+0800: 1059005.122: [scrub string table, 0.0007886 secs][1 CMS-remark: 7439341K(10280960K)] 9156188K(12588672K), 0.1638405 secs] [Times: user=1.56 sys=0.21, real=0.16 secs]
2020-05-26T17:04:33.425+0800: 1059005.124: [CMS-concurrent-sweep-start]
2020-05-26T17:04:33.807+0800: 1059005.505: [GC (Allocation Failure) 2020-05-26T17:04:33.807+0800: 1059005.506: [ParNew: 2263267K->107757K(2307712K), 0.0263471 secs] 9589492K->7433982K(12588672K), 0.0265882 secs] [Times: user=0.51 sys=0.07, real=0.02 secs]
2020-05-26T17:04:34.782+0800: 1059006.480: [CMS-concurrent-sweep: 1.328/1.356 secs] [Times: user=3.01 sys=0.53, real=1.36 secs]
2020-05-26T17:04:34.782+0800: 1059006.480: [CMS-concurrent-reset-start]
2020-05-26T17:04:34.825+0800: 1059006.523: [CMS-concurrent-reset: 0.044/0.044 secs] [Times: user=0.08 sys=0.02, real=0.04 secs]
2020-05-26T17:04:36.826+0800: 1059008.525: [GC (CMS Initial Mark) [1 CMS-initial-mark: 6676453K(10280960K)] 7611455K(12588672K), 0.0377263 secs] [Times: user=0.38 sys=0.13, real=0.04 secs]
2020-05-26T17:04:36.864+0800: 1059008.563: [CMS-concurrent-mark-start]
2020-05-26T17:04:37.175+0800: 1059008.873: [CMS-concurrent-mark: 0.311/0.311 secs] [Times: user=1.64 sys=0.22, real=0.31 secs]
2020-05-26T17:04:37.175+0800: 1059008.873: [CMS-concurrent-preclean-start]
2020-05-26T17:04:37.209+0800: 1059008.907: [CMS-concurrent-preclean: 0.033/0.033 secs] [Times: user=0.03 sys=0.00, real=0.04 secs]
2020-05-26T17:04:37.209+0800: 1059008.907: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 2020-05-26T17:04:42.441+0800: 1059014.139: [CMS-concurrent-abortable-preclean: 4.531/5.232 secs] [Times: user=4.62 sys=0.84, real=5.23 secs]
2020-05-26T17:04:42.443+0800: 1059014.141: [GC (CMS Final Remark) [YG occupancy: 1641955 K (2307712 K)]2020-05-26T17:04:42.443+0800: 1059014.141: [Rescan (parallel) , 0.0463625 secs]2020-05-26T17:04:42.490+0800: 1059014.188: [weak refs processing, 0.0000413 secs]2020-05-26T17:04:42.490+0800: 1059014.188: [class unloading, 0.0212300 secs]2020-05-26T17:04:42.511+0800: 1059014.209: [scrub symbol table, 0.0059408 secs]2020-05-26T17:04:42.517+0800: 1059014.215: [scrub string table, 0.0008717 secs][1 CMS-remark: 6676453K(10280960K)] 8318408K(12588672K), 0.0746340 secs] [Times: user=0.99 sys=0.10, real=0.08 secs]
2020-05-26T17:04:42.518+0800: 1059014.216: [CMS-concurrent-sweep-start]
2020-05-26T17:04:43.531+0800: 1059015.229: [GC (Allocation Failure) 2020-05-26T17:04:43.531+0800: 1059015.229: [ParNew: 2159085K->256384K(2307712K), 0.1807710 secs] 8715907K->6887517K(12588672K), 0.1809489 secs] [Times: user=2.99 sys=0.47, real=0.18 secs]
2020-05-26T17:04:44.051+0800: 1059015.749: [CMS-concurrent-sweep: 1.345/1.533 secs] [Times: user=4.59 sys=1.22, real=1.53 secs]
2020-05-26T17:04:44.051+0800: 1059015.750: [CMS-concurrent-reset-start]
2020-05-26T17:04:44.087+0800: 1059015.785: [CMS-concurrent-reset: 0.036/0.036 secs] [Times: user=0.08 sys=0.10, real=0.04 secs]

调优目标

  1. 平均Monitor GC 时间尽可能的短,因为Monitor GC使用 ParNew GC,是并行垃圾处理器,需要STW,
  2. CMS GC 越少越好 时间越短越好。 频繁CMS产生内存碎片,严重时引起Full GC

具体优化方法

  1. 内存调整为16G
  • 以前为12G,有点小
  1. 修改 CMSInitiatingOccupancyFraction 为 60
  • 以前老年代过早进行回收,会增加老年代的GC频率
  1. 新增 -XX:MaxTenuringThreshold=15
  • 对象过早进入到了老年代,导致增加了老年代的回收次数
  1. 新增 -XX:+UseCMSCompactAtFullCollection
  • 防止垃圾碎片
  1. 新增 -XX:+PrintTenuringDistribution
  • 只有在添加参数-XX:+PrintTenuringDistribution才能打印对应日志,强烈建议线上集群开启该参数

优化结果

Hbase regionserver GC情况

1
2
3
4
5
6
7
8
9
10
11
S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT
2.70 0.00 88.20 22.77 66.18 44.45 970 81.137 8 1.574 82.710
2.70 0.00 90.19 22.77 66.18 44.45 970 81.137 8 1.574 82.710
2.70 0.00 94.74 22.77 66.18 44.45 970 81.137 8 1.574 82.710
2.70 0.00 96.47 22.77 66.18 44.45 970 81.137 8 1.574 82.710
2.70 0.00 99.41 22.77 66.18 44.45 970 81.137 8 1.574 82.710
0.00 2.55 0.49 22.77 66.08 44.39 971 81.216 8 1.574 82.790
0.00 2.55 6.04 22.77 66.08 44.39 971 81.216 8 1.574 82.790
0.00 2.55 9.17 22.77 66.08 44.39 971 81.216 8 1.574 82.790
0.00 2.55 11.85 22.77 66.08 44.39 971 81.216 8 1.574 82.790
0.00 2.55 15.71 22.77 66.08 44.39 971 81.216 8 1.574 82.790

Eden区每次回收都非常的干净,基本上都是全部回收了,存活到Survivor区的数量也少,并且基本上没有晋升到老年代的对象,标识增大晋升年龄起到了作用。

查询请求的响应情况

查询响应速度大幅度提升,99.97%的请求都在1s以内响应

image