[讨论] ParNew 应用暂停时间偶尔会出现好几秒的情况。

moqiaoxp 2013-10-15
GC的配置:
java -Xmx7168m -Xms7168m -Xmn2048m -XX:PermSize=256m -XX:MaxPermSize=256m -Xnoclassgc -XX:+DisableExplicitGC -XX:SurvivorRatio=3 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -Xloggc:/usr/local/temp/gc.log a.java

GC日志:
2013-10-14T03:25:13.553+0000: 3641.564: [GC 3641.565: [ParNew: 1520950K->310708K(1677760K), 0.2646810 secs] 3010198K->1811594K(6920640K), 0.2667450 secs] [Times: user=1.02 sys=0.00, real=0.27 secs]
Total time for which application threads were stopped: 0.2726020 seconds
Total time for which application threads were stopped: 0.0044560 seconds
Total time for which application threads were stopped: 0.0041660 seconds
Total time for which application threads were stopped: 0.0048790 seconds
Total time for which application threads were stopped: 0.0042560 seconds
Total time for which application threads were stopped: 0.0055000 seconds
Total time for which application threads were stopped: 0.0054720 seconds
Total time for which application threads were stopped: 0.0051990 seconds
Total time for which application threads were stopped: 0.0050040 seconds
2013-10-14T03:25:17.593+0000: 3645.605: [GC 3645.606: [ParNew: 1569076K->252838K(1677760K), 0.2464030 secs] 3069962K->1763720K(6920640K), 0.2484550 secs] [Times: user=0.96 sys=0.01, real=0.25 secs]
Total time for which application threads were stopped: 0.2548290 seconds
Total time for which application threads were stopped: 0.0038900 seconds
Total time for which application threads were stopped: 0.0035850 seconds
Total time for which application threads were stopped: 0.0039930 seconds
Total time for which application threads were stopped: 0.0041000 seconds
Total time for which application threads were stopped: 0.0044230 seconds
Total time for which application threads were stopped: 0.0043610 seconds
Total time for which application threads were stopped: 0.0056040 seconds
Total time for which application threads were stopped: 0.0051020 seconds
Total time for which application threads were stopped: 8.2834300 seconds
Total time for which application threads were stopped: 0.0110790 seconds
Total time for which application threads were stopped: 0.0098720 seconds
2013-10-14T03:25:29.925+0000: 3657.936: [GC 3657.937: [ParNew: 1511206K->327272K(1677760K), 0.1897400 secs] 3022088K->1849291K(6920640K), 0.1912910 secs] [Times: user=0.74 sys=0.00, real=0.19 secs]
RednaxelaFX 2013-10-16
由于楼主标红的那行日志前面没有紧挨着GC日志,粗略可判断这个暂停不是由GC引起的。

"Total time for which application threads were stopped"这句日志并不总是跟GC相关;所有进入safepoint的事情做完了都会打这日志。

楼主需要这两个参数:
-XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1

以便了解暂停时间很长的safepoint到底是什么类型的VM operation导致的。随便目测一下是RevokeBias⋯随便乱猜的,但确实见过这个操作花很长时间才完成。
moqiaoxp 2013-10-16
好的,已经加上这两个参数,在压测了。
moqiaoxp 2013-10-17
增加参数后,今天又出现了一次,日志如下:
2013-10-17T02:45:39.094+0000: 3625.598: [GC 3625.599: [ParNew: 1748265K->71606K(1887488K), 0.1102680 secs] 2908577K->1239698K(7130368K), 0.1123360 secs] [Times: user=0.42 sys=0.01, real=0.11 secs]
Total time for which application threads were stopped: 0.1165450 seconds
Total time for which application threads were stopped: 0.0043530 seconds
Total time for which application threads were stopped: 0.0040660 seconds
Total time for which application threads were stopped: 0.0038710 seconds
Total time for which application threads were stopped: 0.0035290 seconds
Total time for which application threads were stopped: 6.8893950 seconds
2013-10-17T02:45:45.306+0000: 3631.810: [GC 3631.811: [ParNew: 1749430K->67933K(1887488K), 0.1141310 secs] 2917522K->1245861K(7130368K), 0.1159620 secs] [Times: user=0.43 sys=0.00, real=0.12 secs]

参数输出的日志:
vmop [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
3616.381: HeapIterateOperation             [    1677          0              5    ]      [     0     0     0     2  6885    ]  0 
RednaxelaFX 2013-10-17
那问题就明了了。这次的问题肯定跟GC没有关系。

楼主的Java应用肯定有从外部插入JVMTI agent,里面调用了JVMTI的IterateThroughHeap()函数去遍历整个堆的所有对象(不管对象死活),而这个功能使用了名为HeapIterateOperation的VM operation,进入safepoint,把VM里的应用线程暂停了6秒多去遍历整个Java堆。

楼主找找看哪里配置了这样的agent?如果启动参数里没有的话,可能是通过attach机制插入的。
moqiaoxp 2013-10-17
恩。测试环境上面开了jprofiler在对JVM调优,先把jprofiler去掉试试,谢谢R大。
fxl545826 2013-10-17
如果在终端有人不懂调用了个jmap之类的是不是也会这样?
RednaxelaFX 2013-10-17
fxl545826 写道
如果在终端有人不懂调用了个jmap之类的是不是也会这样?

如果是jmap -histo:live或者jmap -heap:live的话,它会触发一次full GC然后再遍历整个Java堆,而这是在同一个safepoint内完成的。光看GC日志会看到有“[Full GC”字样。跟楼主所描述的情况从日志上看表现不一样,但同样有可能会带来意外的长时间暂停。
fxl545826 2013-10-17
RednaxelaFX 写道
fxl545826 写道
如果在终端有人不懂调用了个jmap之类的是不是也会这样?

如果是jmap -histo:live或者jmap -heap:live的话,它会触发一次full GC然后再遍历整个Java堆,而这是在同一个safepoint内完成的。光看GC日志会看到有“[Full GC”字样。跟楼主所描述的情况从日志上看表现不一样,但同样有可能会带来意外的长时间暂停。

懂了
Global site tag (gtag.js) - Google Analytics