[讨论] 压测又出问题。东西有点杂,大家来看看

xgj1988 2015-02-06
版本:
操作系统:windows7 旗舰版  64bit
JAVA:    1.7.0_51

昨天做压力测试,今天早上一来,发现服务器CPU 90%+,发现压的3000多个玩家,全部掉线,1台压测客户端有1000多个TCP状态为FINAL_WAIT_1。
由于是windows服务器,我用process explorer查看。有java进程下面有4个线程相当高。分别如下:

TID   CPU 
5672  23.59
7860  22.43
8044  20.19
5976  18.64

发现对应的JAVA 线程是下面4个线程:
"GC task thread#0 (ParallelGC)" prio=6 tid=0x000000000243d000 nid=0x1628 runnable 

"GC task thread#1 (ParallelGC)" prio=6 tid=0x000000000243f000 nid=0x1758 runnable 

"GC task thread#2 (ParallelGC)" prio=6 tid=0x0000000002440800 nid=0x1eb4 runnable 

"GC task thread#3 (ParallelGC)" prio=6 tid=0x0000000002442000 nid=0x1f6c runnable 


发现GC一直在运行,这应该是新生代一直在GC。所以我jmap -heap了一下内存信息,信息如下。

using thread-local object allocation.
Parallel GC with 4 thread(s)

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   [color=red]MaxHeapSize      = 1042284544 (994.0MB)[/color]
   NewSize          = 1310720 (1.25MB)
   MaxNewSize       = 17592186044415 MB  这个为什么这么恐怖
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)
   G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 206045184 (196.5MB)
   used     = 206045184 (196.5MB)
   free     = 0 (0.0MB)
   100.0% used
From Space:
   capacity = 70254592 (67.0MB)
   used     = 0 (0.0MB)
   free     = 70254592 (67.0MB)
   0.0% used
To Space:
   capacity = 70778880 (67.5MB)
   used     = 0 (0.0MB)
   free     = 70778880 (67.5MB)
   0.0% used
PS Old Generation
   capacity = 694681600 (662.5MB)
   used     = 694555840 (662.3800659179688MB)
   free     = 125760 (0.11993408203125MB)
   99.98189674233491% used
PS Perm Generation
   capacity = 26214400 (25.0MB)
   used     = 26137024 (24.92620849609375MB)
   free     = 77376 (0.07379150390625MB)
   99.704833984375% used

11276 interned Strings occupying 1020136 bytes.



会发现一个问题,Eden 被占满,OLD GER和 PERM GEN 被占满,但是S0,S1却是0。为什么呢?
这个时候,我想到的就是把内存dump下来。看看是什么占用了内存,(其实我感觉程序没做什么,但是这个时候JAVA进程达到了1.15G)。

但是当我dump的时候
(jmap  -F -dump:format=b,file=D:\obj.bin 4192)。得到如下错误信息:
Exception in thread "main" java.lang.reflect.InvocationTargetException
        ...
Caused by: Caused by sun.jvm.hotspot.debugger.UnalignedAddressEwxception 6f006b
        at sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal$1.checkAlignment(WindbgDebuggerLocal.java:104)
        ...

然后jmap就关闭了。
之后查看生成的 bin 文件(1.2G)。

但是这个文件拿到MAT或者用jhat均不能解析完成。在MAT报以下错误。
The HPROF parser encountered a violation of the HPROF specification that it could not safely handle. This could be due to file truncation or a bug in the JVM. Please consider filing a bug at eclipse.org. To continue parsing the dump anyway, you can use -DhprofStrictnessWarning=true or set the strictness mode under Preferences > HPROF Parser > Parser Strictness. See the inner exception for details.
The HPROF parser encountered a violation of the HPROF specification that it could not safely handle. This could be due to file truncation or a bug in the JVM. Please consider filing a bug at eclipse.org. To continue parsing the dump anyway, you can use -DhprofStrictnessWarning=true or set the strictness mode under Preferences > HPROF Parser > Parser Strictness. See the inner exception for details.
(Possibly) Invalid HPROF file: Expected to read another 3,264,896,240 bytes, but only 1,215,271,719 bytes are available.
(Possibly) Invalid HPROF file: Expected to read another 3,264,896,240 bytes, but only 1,215,271,719 bytes are available.
'

最后我查到一篇文章,说去掉-F。其实-F没加之前我测试过,别说分析了。连dump都dump出东西(java cpu将近100%啊,不把java 挂起,根本不会响应其他请求吧,所以我就用了-F。但是出现上面错误)。

最后在写这篇文章的时候,我叫另外一个同事再运行一次(jmap  -F -dump:format=b,file=D:\obj.bin 4192),结果没有错误。信息如下:
C:\Users\Administrator>jmap  -F -dump:format=b,file=D:\obj.bin 4192
Attaching to process ID 4192, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.51-b03
Dumping heap to D:\obj.bin ...
Heap dump file created


写到这里,说一下我的的问题吧。

第一个问题:
Heap Configuration:
   ...
   MaxNewSize       = 17592186044415 MB  这个为什么这么恐怖
   ...
这里的MaxNewSize为什么多呢?,我内存都才4个G啊。这是jmap的bug吗?

第二个问题:
为什么Eden 被占满,OLD GEN和 PERM GEN 被占满,但是S0,S1却是0。为什么呢? 按照我对JVM的理解,我想的是OLD 被占满 应该S0,S1都要被占满吧。毕竟OLD GEN 的数据是从S1 过去的啊 .S1又是S0过去的啊。Eden -> S0 -> S1 -> Old Gen,这个顺序。现在给我的感觉是,直接Eden -> OldGen。


第三个问题:
为什么Jmap 加 -F 第一次报异常,导致生成的文件无效呢?产生的原因可能是什么呢?第二次同样的命令为什么没问题呢?(两次执行中间间隔了1个小时)。
Global site tag (gtag.js) - Google Analytics