JVM Crash, 求如何分析原因.[已分析,和提出了解决方案]

xiaoyu 2011-11-14
JDK 1.5.0_20
下面是Crash Log, 想知道详细问题, 和如何分析,调试:

(分析看2, 4, 6楼. 解决方案看6楼, 8楼有方案2的性能损耗测试)

#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0xf796880c, pid=2862, tid=3368360848
#
# Java VM: Java HotSpot(TM) Server VM (1.5.0_20-b02 mixed mode)
# Problematic frame:
# V  [libjvm.so+0x3ef80c] 
#

---------------  T H R E A D  ---------------

Current thread (0x08407b10):  JavaThread "CompilerThread0" daemon [_thread_in_native, id=2870]

siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x00000004 //出问题的指针

Registers:
EAX=0x00000000, EBX=0xf7b42810, ECX=0xc8c50320, EDX=0x00000000
ESP=0xc8c4fea0, EBP=0xc8c4fed8, ESI=0xc8c50330, EDI=0xbf2e0050
EIP=0xf796880c, CR2=0x00000004, EFLAGS=0x00010287

Top of Stack: (sp=0xc8c4fea0)
0xc8c4fea0:   00000800 00000000 c8c50330 c1192285
0xc8c4feb0:   c1192284 bf915f14 bf91a011 bf91a010
0xc8c4fec0:   c1192285 c8c50330 00000057 f7b42810
0xc8c4fed0:   0000034c 00000025 c8c4ff28 f796850f
0xc8c4fee0:   c8c50320 bf91a408 00000000 c1301620
0xc8c4fef0:   00000307 bf2e0050 bfb07900 9c032003
0xc8c4ff00:   00000010 00000004 00000004 00000012
0xc8c4ff10:   c1173f9c 00000025 bf91a408 f7b42810

Instructions: (pc=0xf796880c)
0xf79687fc:   00 00 00 00 8b 4d 08 8b 45 cc 8b 79 58 8b 14 87
0xf796880c:   8b 42 04 8b 00 85 c0 75 22 8b 4e 04 8b 52 1c 39

Stack: [0xc8bd1000,0xc8c52000),  sp=0xc8c4fea0,  free space=507k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x3ef80c]
V  [libjvm.so+0x3ef50f]
V  [libjvm.so+0x3ed062]
V  [libjvm.so+0x1a530d]
V  [libjvm.so+0x1a24b2]
V  [libjvm.so+0x14ad83]
V  [libjvm.so+0x1aaa39]
V  [libjvm.so+0x1aa391]
V  [libjvm.so+0x4e1a96]
V  [libjvm.so+0x4dbdc3]
V  [libjvm.so+0x43bc78]
C  [libpthread.so.0+0x5832]

Current CompileTask: //出问题的原因, 但是怎么调试和分析具体原因?
opto:103% !   xxx.spring.xxx.xxxxxx.executeRequest(Lxxxx/spring/xxx/RemoteInvocationContext;)Lorg/springframework/remoting/
support/RemoteInvocationResult; @ 323 (865 bytes)


---------------  P R O C E S S  ---------------

Java Threads: ( => current thread )
  0x098746b8 JavaThread "pool-2-thread-6" [_thread_blocked, id=4822]
=>0x08407b10 JavaThread "CompilerThread0" daemon [_thread_in_native, id=2870]


Other Threads:
  0x083f0bc8 VMThread [id=2863]
  0x0840b660 WatcherThread [id=2873]

VM state:not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread: None


xiaoyu 2011-11-14
分析具体如下绿色高亮的地方
#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0xf796880c, pid=2862, tid=3368360848 // SIGSEGV  是指: 无效存储访问, PC是发生问题的指令地址
#
# Java VM: Java HotSpot(TM) Server VM (1.5.0_20-b02 mixed mode)
# Problematic frame:
# V  [libjvm.so+0x3ef80c]
#

---------------  T H R E A D  ---------------

Current thread (0x08407b10):  JavaThread "CompilerThread0" daemon [_thread_in_native, id=2870]

siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x00000004 //非法指针, 0x00000000-0x0000FFFF 是保护地址

Registers:
EAX=0x00000000, EBX=0xf7b42810, ECX=0xc8c50320, EDX=0x00000000
ESP=0xc8c4fea0, EBP=0xc8c4fed8, ESI=0xc8c50330, EDI=0xbf2e0050
EIP=0xf796880c, CR2=0x00000004, EFLAGS=0x00010287

Top of Stack: (sp=0xc8c4fea0)
0xc8c4fea0:   00000800 00000000 c8c50330 c1192285
0xc8c4feb0:   c1192284 bf915f14 bf91a011 bf91a010
0xc8c4fec0:   c1192285 c8c50330 00000057 f7b42810
0xc8c4fed0:   0000034c 00000025 c8c4ff28 f796850f
0xc8c4fee0:   c8c50320 bf91a408 00000000 c1301620
0xc8c4fef0:   00000307 bf2e0050 bfb07900 9c032003
0xc8c4ff00:   00000010 00000004 00000004 00000012
0xc8c4ff10:   c1173f9c 00000025 bf91a408 f7b42810

Instructions: (pc=0xf796880c)
0xf79687fc:   00 00 00 00 8b 4d 08 8b 45 cc 8b 79 58 8b 14 87
0xf796880c:   8b 42 04 8b 00 85 c0 75 22 8b 4e 04 8b 52 1c 39

//转成汇编指令(32bit, 因为JVM是32bit的):

0000000000000000 8b4204           mov eax, [edx+0x4]      //出问题的地方, 指向了0x00000004, 0x00000000-0x0000FFFF是保护地址/NULL指针保留地址, 所以…..不能被访问
0000000000000003 8b00             mov eax, [eax]          
0000000000000005 85c0             test eax, eax           
0000000000000007 7522             jnz 0x2b                
0000000000000009 8b4e04           mov ecx, [esi+0x4]      
000000000000000c 8b521c           mov edx, [edx+0x1c]  


Stack: [0xc8bd1000,0xc8c52000),  sp=0xc8c4fea0,  free space=507k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x3ef80c]
V  [libjvm.so+0x3ef50f]
V  [libjvm.so+0x3ed062]
V  [libjvm.so+0x1a530d]
V  [libjvm.so+0x1a24b2]
V  [libjvm.so+0x14ad83]
V  [libjvm.so+0x1aaa39]
V  [libjvm.so+0x1aa391]
V  [libjvm.so+0x4e1a96]
V  [libjvm.so+0x4dbdc3]
V  [libjvm.so+0x43bc78]
C  [libpthread.so.0+0x5832]


Current CompileTask: //当前是一个编译任务
opto:103% !   xxx.spring.xxxx.xxx.executeRequest(Lxxxx/spring/xxxx/RemoteInvocationContext;)Lorg/springframework/remoting/
support/RemoteInvocationResult; @ 323 (865 bytes)


综上所诉, 这个是JVM在编译的时候, 指向了一个NULL指针,导致JVM Crash. 是JVM的一个内部Bug. 但是还不知道怎么重现. 和具体位置. (谢谢 RednaxelaFX 的帮助)


解决方案:

升级JDK 到6u29

PS: 一般请打开 ulimit -c unlimited (放在/etc/profile 中), 当JVM Crash的时候, linux 会生产 Core Dump,到时候可以使用GDB去分析(Core Dump也不是一定会产生, 有时候linux产生的时候会遇到问题).
Willam2004 2011-11-14
引用

转成汇编代指令(32bit, 因为JVM是32bit的): 
 
0000000000000000 8b4204           mov eax, [edx+0x4]      //出问题的地方, 指向了0x00000004, 0到64K是保护地址/NULL指针保留地址, 所以…..不能被访问 
0000000000000003 8b00             mov eax, [eax]           
0000000000000005 85c0             test eax, eax            
0000000000000007 7522             jnz 0x2b                 
0000000000000009 8b4e04           mov ecx, [esi+0x4]       
000000000000000c 8b521c           mov edx, [edx+0x1c]   


这个在哪里转换的?怎么转换的?
xiaoyu 2011-11-15
Willam2004 写道
引用

转成汇编代指令(32bit, 因为JVM是32bit的): 
 
0000000000000000 8b4204           mov eax, [edx+0x4]      //出问题的地方, 指向了0x00000004, 0到64K是保护地址/NULL指针保留地址, 所以…..不能被访问 
0000000000000003 8b00             mov eax, [eax]           
0000000000000005 85c0             test eax, eax            
0000000000000007 7522             jnz 0x2b                 
0000000000000009 8b4e04           mov ecx, [esi+0x4]       
000000000000000c 8b521c           mov edx, [edx+0x1c]   


这个在哪里转换的?怎么转换的?


有很多工具, VC也可以. 下面是udis (RednaxelaFX 介绍的工具), linux的工具, 简单好用, 下载地址:
http://udis86.sourceforge.net/

下载好后(我下载的是udis86-1.7.tar.gz), 放到/usr/local 目录下
$: tar zxvf udis86-1.7.tar.gz
$: cd udis86-1.7
$: ./configure
$: make
$: make install

就安装好了.

然后下面这样就能显示了.

$: echo "65 67 89 87 76 65 54 56 78 89 09 00 87" | udcli -32 -x 
337240552 2011-11-15
xiaoyu 2011-11-17
今天拿到了Core Dump,

当前JVM出错的地方 (绿色高亮地方, 而且往下看这个是C2的编译):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x0035adf0 in raise () from /lib/libc.so.6
#2  0x0035c701 in abort () from /lib/libc.so.6
#3  0xf7a13f75 in os::abort(int) () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so
#4  0xf7af2914 in VMError::report_and_die() () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so
#5  0xf7a18c4a in JVM_handle_linux_signal () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so
#6  0xf7a160c4 in signalHandler(int, siginfo*, void*) () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so
#7  <signal handler called>
#8  0xf79cb80c in PhaseIdealLoop::build_loop_late_post(Node*, PhaseIdealLoop const*) () //就这里了 (这里的地址和上面不对应, 是因为这个Core Dump 和上面的那个crash log不是对应的, 其实和新的crash log的地址是一样的).
   from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so
#9  0xf79cb50f in PhaseIdealLoop::build_loop_late(VectorSet&, Node_List&, Node_Stack&, PhaseIdealLoop const*) ()
   from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so
#10 0xf79c9062 in PhaseIdealLoop::PhaseIdealLoop(PhaseIterGVN&, PhaseIdealLoop const*, int) ()
   from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so
#11 0xf778130d in Compile::Optimize() () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so
#12 0xf777e4b2 in Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, int) ()
   from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so
#13 0xf7726d83 in C2Compiler::compile_method(ciEnv*, ciMethod*, int) () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so //C2 编译
#14 0xf7786a39 in CompileBroker::invoke_compiler_on_method(CompileTask*) ()
   from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so
#15 0xf7786391 in CompileBroker::compiler_thread_loop() () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so
#16 0xf7abda96 in compiler_thread_entry(JavaThread*, Thread*) () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so
#17 0xf7ab7dc3 in JavaThread::run() () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so
#18 0xf7a17c78 in _start(Thread*) () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so
#19 0x004a4832 in start_thread () from /lib/libpthread.so.0
#20 0x004040ae in clone () from /lib/libc.so.6


相关的URL (RednaxelaFX 纠正不是下面的bug导致的,看下面他的回复):
bug详细信息 (不确定一定是这个bug, 因为 PhaseIdealLoop::build_loop_late_post很多相关的Bug...太不靠谱了..): http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051
6u29的修复: http://www.oracle.com/technetwork/java/javase/6u29-relnotes-507960.html
JDK 7 update 1也有修复这个bug。

JIT and Loop Bugs
Three bugs reported by various parties, including Apache Lucene developers, have been fixed in JDK 6 Update 29, in addition to a fourth related bug found by Oracle (7070134, 7068051, 7044738, 7077439).
查看了1.5.21-到1.5.22的release notes 没有发现修复任何与build_loop_late_post相关的bug.

两种解决方法:

1. 升级到6u29
2. 启动加入jvm参数, 不让C2编译这个方法 -XX:CompileCommand=exclude,xxxx/spring/xxx/XXXXRequestor,executeRequest  (PS:JDK 5不支持的参数太多了...) (参考: https://bugs.eclipse.org/bugs/show_bug.cgi?id=214092)

这样性能会有一定的影响(因为只能使用解释模式执行了. 下面有一个性能损耗测试).
RednaxelaFX 2011-11-17
之前在围脖上也说了,肯定不是Bug 7068051;那个是JDK6u23才开始有的bug。
但仍应是跟server compiler的循环优化相关的bug。
xiaoyu 2011-11-17
RednaxelaFX 写道
之前在围脖上也说了,肯定不是Bug 7068051;那个是JDK6u23才开始有的bug。
但仍应是跟server compiler的循环优化相关的bug。


呵呵. (微博上没仔细看 - 估计我是漏掉了, 哈). 谢谢再次指正 (要不然就要误导别人了...).
xiaoyu 2011-11-23

为解决方案2做了一个性能损耗测试 (解决方案2: 启动加入jvm参数, 不让C2编译这个方法 -XX:CompileCommand=exclude,xxxx/spring/xxx/XXXXRequestor,executeRequest):


性能损耗测试CASE:  简单的HelloWorld (远程调用 -- 这个出问题的地方是远程调用框架的代码)

 

 

单线程测试 次数(单位:万) 总时间(单位:纳秒) 次数(单位:万) 总时间(单位:纳秒)
C1 -XX:CompileCommand 20 89665306000 100 429607309000
C2 20 82385263000 100 417963168000
时间差(单位:纳秒) 7280043000 11644141000
平均每次损耗时间(单位:纳秒) 36400.21 11644

 

 

每次损耗百分比: 3% 到 9%

 


 

Global site tag (gtag.js) - Google Analytics