[讨论] 关于CMS GC的一些疑问

pulsar_lxl 2014-08-15

本人最近在学习OpenJDK1.7 的HotSpot VM中遇到了一些问题,麻烦哪位大侠不吝指点下迷津。

 

我写了一个触发promotion failed的例子,后来分析GC日志时,发现了一些问题,CMS GC并没有按照我设置的参数发生。

 

 

JVM 参数设置和程序如下:

 

-Xmx20M -Xms20M -Xmn10M -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:SurvivorRatio=8
-verbose:gc -XX:+PrintHeapAtGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=10

 

public static void main(String[] args) throws Exception {
        byte[] alloc1,alloc2,alloc3,alloc4, alloc11, alloc22, alloc33, alloc44;
        alloc1 = new byte[1*1024*1024];
        alloc2 = new byte[1*1024*1024];
        alloc3 = new byte[1*1024*1024];

        alloc4 = new byte[1*1024*1024];

        alloc11 = new byte[1*1024*1024];
        Thread.sleep(1000);
        System.out.println("*********************1111" + new Date());
        alloc22 = new byte[2*1024*1024];

        Thread.sleep(1000);
        System.out.println("*********************2222" + new Date());
        alloc33 = new byte[1*1024*1024];
        alloc44 = new byte[1*1024*1024];


        byte[] alloc5,alloc6,alloc7,alloc8, alloc9;

        alloc5 = new byte[3*1024*1024];
        alloc1 = null;
        alloc4 = null;
        Thread.sleep(2000);
        System.out.println("*********************3333" + new Date());
        alloc6 = new byte[3*1024*1024];
        alloc33 = null;

        alloc7 = new byte[3*1024*1024];
        Thread.sleep(2000);
        System.out.println("*********************4444" + new Date());
        alloc8 = new byte[1*1024*1024];
        alloc4 = null;

    }

 打印出来的GC日志是

 

*********************1111Thu Aug 14 18:57:26 CST 2014
{Heap before GC invocations=0 (full 0):
 par new generation   total 9216K, used 6827K [0x00000007f9a00000, 0x00000007fa400000, 0x00000007fa400000)
  eden space 8192K,  83% used [0x00000007f9a00000, 0x00000007fa0aad18, 0x00000007fa200000)
  from space 1024K,   0% used [0x00000007fa200000, 0x00000007fa200000, 0x00000007fa300000)
  to   space 1024K,   0% used [0x00000007fa300000, 0x00000007fa300000, 0x00000007fa400000)
 concurrent mark-sweep generation total 10240K, used 0K [0x00000007fa400000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 21248K, used 3632K [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)
2014-08-14T18:57:26.451-0800: [GC [ParNew: 6827K->685K(9216K), 0.0043720 secs] 6827K->5807K(19456K), 0.0044000 secs] [Times: user=0.03 sys=0.01, real=0.00 secs] 
Heap after GC invocations=1 (full 0):
 par new generation   total 9216K, used 685K [0x00000007f9a00000, 0x00000007fa400000, 0x00000007fa400000)
  eden space 8192K,   0% used [0x00000007f9a00000, 0x00000007f9a00000, 0x00000007fa200000)
  from space 1024K,  66% used [0x00000007fa300000, 0x00000007fa3ab770, 0x00000007fa400000)
  to   space 1024K,   0% used [0x00000007fa200000, 0x00000007fa200000, 0x00000007fa300000)
 concurrent mark-sweep generation total 10240K, used 5122K [0x00000007fa400000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 21248K, used 3632K [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)
}
*********************2222Thu Aug 14 18:57:27 CST 2014
2014-08-14T18:57:28.457-0800: [GC [1 CMS-initial-mark: 5122K(10240K)] 13239K(19456K), 0.0010080 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
2014-08-14T18:57:28.460-0800: [CMS-concurrent-mark: 0.002/0.002 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2014-08-14T18:57:28.460-0800: [CMS-concurrent-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
2014-08-14T18:57:28.460-0800: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
2014-08-14T18:57:28.460-0800: [GC[YG occupancy: 8117 K (9216 K)][Rescan (parallel) , 0.0005060 secs][weak refs processing, 0.0000060 secs][scrub string table, 0.0000370 secs] [1 CMS-remark: 5122K(10240K)] 13239K(19456K), 0.0005850 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
2014-08-14T18:57:28.461-0800: [CMS-concurrent-sweep: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
2014-08-14T18:57:28.462-0800: [CMS-concurrent-reset: 0.001/0.001 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
*********************3333Thu Aug 14 18:57:29 CST 2014
{Heap before GC invocations=1 (full 1):
 par new generation   total 9216K, used 8117K [0x00000007f9a00000, 0x00000007fa400000, 0x00000007fa400000)
  eden space 8192K,  90% used [0x00000007f9a00000, 0x00000007fa141eb8, 0x00000007fa200000)
  from space 1024K,  66% used [0x00000007fa300000, 0x00000007fa3ab770, 0x00000007fa400000)
  to   space 1024K,   0% used [0x00000007fa200000, 0x00000007fa200000, 0x00000007fa300000)
 concurrent mark-sweep generation total 10240K, used 3074K [0x00000007fa400000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 21248K, used 3632K [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)
2014-08-14T18:57:29.459-0800: [GC [ParNew (promotion failed): 8117K->7496K(9216K), 0.0051800 secs][CMS: 7765K->7748K(10240K), 0.0102620 secs] 11191K->10821K(19456K), [CMS Perm : 3632K->3630K(21248K)], 0.0154830 secs] [Times: user=0.04 sys=0.00, real=0.02 secs] 
Heap after GC invocations=2 (full 2):
 par new generation   total 9216K, used 3072K [0x00000007f9a00000, 0x00000007fa400000, 0x00000007fa400000)
  eden space 8192K,  37% used [0x00000007f9a00000, 0x00000007f9d00158, 0x00000007fa200000)
  from space 1024K,   0% used [0x00000007fa200000, 0x00000007fa200000, 0x00000007fa300000)
  to   space 1024K,   0% used [0x00000007fa300000, 0x00000007fa300000, 0x00000007fa400000)
 concurrent mark-sweep generation total 10240K, used 7748K [0x00000007fa400000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 21248K, used 3630K [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)
}
2014-08-14T18:57:29.475-0800: [GC [1 CMS-initial-mark: 7748K(10240K)] 13893K(19456K), 0.0000680 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
{Heap before GC invocations=2 (full 3):
 par new generation   total 9216K, used 6228K [0x00000007f9a00000, 0x00000007fa400000, 0x00000007fa400000)
  eden space 8192K,  76% used [0x00000007f9a00000, 0x00000007fa015060, 0x00000007fa200000)
  from space 1024K,   0% used [0x00000007fa200000, 0x00000007fa200000, 0x00000007fa300000)
  to   space 1024K,   0% used [0x00000007fa300000, 0x00000007fa300000, 0x00000007fa400000)
 concurrent mark-sweep generation total 10240K, used 7748K [0x00000007fa400000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 21248K, used 3630K [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)
2014-08-14T18:57:29.475-0800: [Full GC [CMS2014-08-14T18:57:29.478-0800: [CMS-concurrent-mark: 0.002/0.003 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
 (concurrent mode failure): 7748K->9797K(10240K), 0.0096760 secs] 13976K->12869K(19456K), [CMS Perm : 3630K->3630K(21248K)], 0.0096980 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
Heap after GC invocations=3 (full 4):
 par new generation   total 9216K, used 3072K [0x00000007f9a00000, 0x00000007fa400000, 0x00000007fa400000)
  eden space 8192K,  37% used [0x00000007f9a00000, 0x00000007f9d00078, 0x00000007fa200000)
  from space 1024K,   0% used [0x00000007fa200000, 0x00000007fa200000, 0x00000007fa300000)
  to   space 1024K,   0% used [0x00000007fa300000, 0x00000007fa300000, 0x00000007fa400000)
 concurrent mark-sweep generation total 10240K, used 9797K [0x00000007fa400000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 21248K, used 3630K [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)
}
2014-08-14T18:57:31.486-0800: [GC [1 CMS-initial-mark: 9797K(10240K)] 16006K(19456K), 0.0002190 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
*********************4444Thu Aug 14 18:57:31 CST 2014
Heap
 par new generation   total 9216K, used 7397K [0x00000007f9a00000, 0x00000007fa400000, 0x00000007fa400000)
  eden space 8192K,  90% used [0x00000007f9a00000, 0x00000007fa1396a8, 0x00000007fa200000)
  from space 1024K,   0% used [0x00000007fa200000, 0x00000007fa200000, 0x00000007fa300000)
  to   space 1024K,   0% used [0x00000007fa300000, 0x00000007fa300000, 0x00000007fa400000)
 concurrent mark-sweep generation total 10240K, used 9797K [0x00000007fa400000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 21248K, used 3638K [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)

 这个日志感觉很奇怪。

1、第一次CMS GC发生时,老年代并没有达到设定的80%,这是为什么啊?

2、promotion failed 发生的条件是什么,看gc日志,发生promotion failed时,老年代还有将近7M空间可用,也不够放青年代的8M对象啊,为什么就发生了promotion failed呢,我的理解应该是老年代空间够用,但是有碎片的时候会发生promotion failed?而且full gc后,只是一部分青年代的对象晋级老年代了,另一部分还是留在了青年代,这是什么原理?发生promotion failed时,会触发full gc, 老年代使用什么收集算法,CMS还是serial MSC?

3、from 区里面存放的是什么?因为我构建的时候N兆的大对象,怎么会有几百K的对象进入from区呢?

4、关于CMSFullGCsBeforeCompaction这个参数,我看R大在之前的帖(http://hllvm.group.iteye.com/group/topic/28854)中解释过,我在源码中找答案时,发现_full_gcs_since_conc_gc这个参数也是在CMSCollector::collect方法中自增的,所以发生concurrent mode failure时,还是使用的CMS算法收集老年代,而不是之前说的Serial old?

 

烦请哪些大牛给解答下,不胜感激!!

pulsar_lxl 2014-08-15
关于第一点疑问,我已经找到答案了,是由于CMS的悲观策略。
其他问题,还请大牛帮忙解答下啊!
nijiaben 2014-08-17
先试着回答一部分吧
引用
第一次CMS GC发生时,老年代并没有达到设定的80%,这是为什么啊?

cms线程是否会做cms gc判断标准不仅仅是根据你的CMSInitiatingOccupancyFraction和UseCMSInitiatingOccupancyOnly参数来的,比如说你新生代的to space不为空或者老生代剩余的空间不够容纳根据之前数据统计的平均每次ygc晋升到old的内存大小,也不够容纳目前新生代已经使用的空间(eden+from space)的时候也会进行cms,还有其他一些因素,对于你的demo来说属于后者,因此在未达到CMSInitiatingOccupancyFraction就进行了cms gc

引用
promotion failed 发生的条件是什么

promotion failed的发生时间是在做ygc过程中,不能被回收的对象,并且达到一定年龄了,需要被cp到old区时如果old不够空间了就会发生

引用
发生promotion failed时,会触发full gc, 老年代使用什么收集算法,CMS还是serial MSC?

当出现promotion failed的时候,会设置_incremental_collection_failed为true,那么vmThread在处理gc事件的时候会使用full gc的方式,年老代用什么算法还是取决于是否达到了压缩标准,UseCMSFullGCsBeforeCompaction表示是否开启cms过程中的压缩功能,但是这并不是唯一的因素,比如你提到的_full_gcs_since_conc_gc(在每次执行cms gc时会递增1,并不是说一定要Full GC才递增)这个递增值会和CMSFullGCsBeforeCompaction值进行比较看是否达到了(还有其他的可能,比如触发gc的原因是用户请求的,比如是否触发的System.gc()也会进行压缩),达到了就会做压缩算法,整个过程是vmThread完成的,因此比较耗时,不是常规的cms算法有些阶段分给并发线程去执行

引用
from 区里面存放的是什么?因为我构建的时候N兆的大对象,怎么会有几百K的对象进入from区呢?

其实在jvm启动的时候本身就会创建一些对象,比如某些异常对象,这些异常对象通常是备用的,比如oom,nullpoint等异常,在jvm运行过程中如果再vm内判断需要抛出这些异常就会抛出来,而这些都是在heap创建好之后在heap里分配的

引用
关于CMSFullGCsBeforeCompaction这个参数,我看R大在之前的帖(http://hllvm.group.iteye.com/group/topic/28854)中解释过,我在源码中找答案时,发现_full_gcs_since_conc_gc这个参数也是在CMSCollector::collect方法中自增的,所以发生concurrent mode failure时,还是使用的CMS算法收集老年代,而不是之前说的Serial old?

concurrent mode failure主要是发生在cms线程正在进行cms gc过程中有其他线程因为内存分配失败(new和old都不够分配的情况下)而请求必须进行jvm进行gc的情况下,该信息由vmThread线程执行gc任务时打印的,虽然打印了此信息,使用什么算法,主要是是否会做压缩,会不会做压缩上面已经提到了

最近在做gc日志分析的工具,所以对日志格式做了下研究,对上面的日志也稍微讲下
2014-08-14T18:57:29.475-0800: [Full GC [CMS2014-08-14T18:57:29.478-0800: [CMS-concurrent-mark: 0.002/0.003 secs] [Times: user=0.00 sys=0.00, real=0.00 secs](concurrent mode failure): 7748K->9797K(10240K), 0.0096760 secs] 13976K->12869K(19456K), [CMS Perm : 3630K->3630K(21248K)], 0.0096980 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 

注意到这条日志,其实是两个线程打印的, vmThread会打印到
2014-08-14T18:57:29.475-0800: [Full GC [CMS

的时候,通知前端的cms线程继续跑,然后通过前一条日志已经知道当前cms周期中vmThread线程已经执行完了CMS-initial-mark阶段,然后cms线程接着跑第二个阶段,即接下来的日志
2014-08-14T18:57:29.478-0800: [CMS-concurrent-mark: 0.002/0.003 secs] [Times: user=0.00 sys=0.00, real=0.00 secs](concurrent mode failure): 7748K->9797K(10240K), 0.0096760 secs]

是cms线程打印的,vmThread等这个执行完之后,然后继续往下跑,然后打印了剩下的
(concurrent mode failure): 7748K->9797K(10240K), 0.0096760 secs] 13976K->12869K(19456K), [CMS Perm : 3630K->3630K(21248K)], 0.0096980 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

这些信息,这里虽然有CMS Perm的信息,但是其实并没有对Perm进行gc


看下面两日志有啥区别呢
2014-08-14T18:57:26.451-0800: [GC [ParNew: 6827K->685K(9216K), 0.0043720 secs] 6827K->5807K(19456K), 0.0044000 secs] [Times: user=0.03 sys=0.01, real=0.00 secs]


2014-08-14T18:57:29.459-0800: [GC [ParNew (promotion failed): 8117K->7496K(9216K), 0.0051800 secs][CMS: 7765K->7748K(10240K), 0.0102620 secs] 11191K->10821K(19456K), [CMS Perm : 3632K->3630K(21248K)], 0.0154830 secs] [Times: user=0.04 sys=0.00, real=0.02 secs] 


先抛开(promotion failed)不说,后者比前者多了[CMS: 7765K->7748K(10240K), 0.0102620 secs] 11191K->10821K(19456K), [CMS Perm : 3632K->3630K(21248K)], 0.0154830 secs] 这些信息,这是为什么呢?
从字片上看都是GC,不是Full GC,因此会对新生代进行GC是肯定的,但是发现后者还对old也进行了GC,hotspot里有这样的逻辑,在对新生代gc完之后如果发现有足够的内存来分配要请求的内存了,那么就直接跳过,如果还不足够,那就还会对老生代也进行GC
注:这是ParNew+CMS的情况下的特定算法才有的
pulsar_lxl 2014-09-04
谢谢楼上的耐心回答
Global site tag (gtag.js) - Google Analytics