JVM Crash, 求如何分析原因.[已分析,和提出了解决方案]
xiaoyu
2011-11-14
JDK 1.5.0_20
下面是Crash Log, 想知道详细问题, 和如何分析,调试: (分析看2, 4, 6楼. 解决方案看6楼, 8楼有方案2的性能损耗测试) # # An unexpected error has been detected by HotSpot Virtual Machine: # # SIGSEGV (0xb) at pc=0xf796880c, pid=2862, tid=3368360848 # # Java VM: Java HotSpot(TM) Server VM (1.5.0_20-b02 mixed mode) # Problematic frame: # V [libjvm.so+0x3ef80c] # --------------- T H R E A D --------------- Current thread (0x08407b10): JavaThread "CompilerThread0" daemon [_thread_in_native, id=2870] siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x00000004 //出问题的指针 Registers: EAX=0x00000000, EBX=0xf7b42810, ECX=0xc8c50320, EDX=0x00000000 ESP=0xc8c4fea0, EBP=0xc8c4fed8, ESI=0xc8c50330, EDI=0xbf2e0050 EIP=0xf796880c, CR2=0x00000004, EFLAGS=0x00010287 Top of Stack: (sp=0xc8c4fea0) 0xc8c4fea0: 00000800 00000000 c8c50330 c1192285 0xc8c4feb0: c1192284 bf915f14 bf91a011 bf91a010 0xc8c4fec0: c1192285 c8c50330 00000057 f7b42810 0xc8c4fed0: 0000034c 00000025 c8c4ff28 f796850f 0xc8c4fee0: c8c50320 bf91a408 00000000 c1301620 0xc8c4fef0: 00000307 bf2e0050 bfb07900 9c032003 0xc8c4ff00: 00000010 00000004 00000004 00000012 0xc8c4ff10: c1173f9c 00000025 bf91a408 f7b42810 Instructions: (pc=0xf796880c) 0xf79687fc: 00 00 00 00 8b 4d 08 8b 45 cc 8b 79 58 8b 14 87 0xf796880c: 8b 42 04 8b 00 85 c0 75 22 8b 4e 04 8b 52 1c 39 Stack: [0xc8bd1000,0xc8c52000), sp=0xc8c4fea0, free space=507k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x3ef80c] V [libjvm.so+0x3ef50f] V [libjvm.so+0x3ed062] V [libjvm.so+0x1a530d] V [libjvm.so+0x1a24b2] V [libjvm.so+0x14ad83] V [libjvm.so+0x1aaa39] V [libjvm.so+0x1aa391] V [libjvm.so+0x4e1a96] V [libjvm.so+0x4dbdc3] V [libjvm.so+0x43bc78] C [libpthread.so.0+0x5832] Current CompileTask: //出问题的原因, 但是怎么调试和分析具体原因? opto:103% ! xxx.spring.xxx.xxxxxx.executeRequest(Lxxxx/spring/xxx/RemoteInvocationContext;)Lorg/springframework/remoting/ support/RemoteInvocationResult; @ 323 (865 bytes) --------------- P R O C E S S --------------- Java Threads: ( => current thread ) 0x098746b8 JavaThread "pool-2-thread-6" [_thread_blocked, id=4822] =>0x08407b10 JavaThread "CompilerThread0" daemon [_thread_in_native, id=2870] Other Threads: 0x083f0bc8 VMThread [id=2863] 0x0840b660 WatcherThread [id=2873] VM state:not at safepoint (normal execution) VM Mutex/Monitor currently owned by a thread: None |
|||||||||||||||||||||||||||||||
xiaoyu
2011-11-14
分析具体如下绿色高亮的地方
# # An unexpected error has been detected by HotSpot Virtual Machine: # # SIGSEGV (0xb) at pc=0xf796880c, pid=2862, tid=3368360848 // SIGSEGV 是指: 无效存储访问, PC是发生问题的指令地址 # # Java VM: Java HotSpot(TM) Server VM (1.5.0_20-b02 mixed mode) # Problematic frame: # V [libjvm.so+0x3ef80c] # --------------- T H R E A D --------------- Current thread (0x08407b10): JavaThread "CompilerThread0" daemon [_thread_in_native, id=2870] siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x00000004 //非法指针, 0x00000000-0x0000FFFF 是保护地址 Registers: EAX=0x00000000, EBX=0xf7b42810, ECX=0xc8c50320, EDX=0x00000000 ESP=0xc8c4fea0, EBP=0xc8c4fed8, ESI=0xc8c50330, EDI=0xbf2e0050 EIP=0xf796880c, CR2=0x00000004, EFLAGS=0x00010287 Top of Stack: (sp=0xc8c4fea0) 0xc8c4fea0: 00000800 00000000 c8c50330 c1192285 0xc8c4feb0: c1192284 bf915f14 bf91a011 bf91a010 0xc8c4fec0: c1192285 c8c50330 00000057 f7b42810 0xc8c4fed0: 0000034c 00000025 c8c4ff28 f796850f 0xc8c4fee0: c8c50320 bf91a408 00000000 c1301620 0xc8c4fef0: 00000307 bf2e0050 bfb07900 9c032003 0xc8c4ff00: 00000010 00000004 00000004 00000012 0xc8c4ff10: c1173f9c 00000025 bf91a408 f7b42810 Instructions: (pc=0xf796880c) 0xf79687fc: 00 00 00 00 8b 4d 08 8b 45 cc 8b 79 58 8b 14 87 0xf796880c: 8b 42 04 8b 00 85 c0 75 22 8b 4e 04 8b 52 1c 39 //转成汇编指令(32bit, 因为JVM是32bit的): 0000000000000000 8b4204 mov eax, [edx+0x4] //出问题的地方, 指向了0x00000004, 0x00000000-0x0000FFFF是保护地址/NULL指针保留地址, 所以…..不能被访问 0000000000000003 8b00 mov eax, [eax] 0000000000000005 85c0 test eax, eax 0000000000000007 7522 jnz 0x2b 0000000000000009 8b4e04 mov ecx, [esi+0x4] 000000000000000c 8b521c mov edx, [edx+0x1c] Stack: [0xc8bd1000,0xc8c52000), sp=0xc8c4fea0, free space=507k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x3ef80c] V [libjvm.so+0x3ef50f] V [libjvm.so+0x3ed062] V [libjvm.so+0x1a530d] V [libjvm.so+0x1a24b2] V [libjvm.so+0x14ad83] V [libjvm.so+0x1aaa39] V [libjvm.so+0x1aa391] V [libjvm.so+0x4e1a96] V [libjvm.so+0x4dbdc3] V [libjvm.so+0x43bc78] C [libpthread.so.0+0x5832] Current CompileTask: //当前是一个编译任务 opto:103% ! xxx.spring.xxxx.xxx.executeRequest(Lxxxx/spring/xxxx/RemoteInvocationContext;)Lorg/springframework/remoting/ support/RemoteInvocationResult; @ 323 (865 bytes) 综上所诉, 这个是JVM在编译的时候, 指向了一个NULL指针,导致JVM Crash. 是JVM的一个内部Bug. 但是还不知道怎么重现. 和具体位置. (谢谢 RednaxelaFX 的帮助) 解决方案: 升级JDK 到6u29 PS: 一般请打开 ulimit -c unlimited (放在/etc/profile 中), 当JVM Crash的时候, linux 会生产 Core Dump,到时候可以使用GDB去分析(Core Dump也不是一定会产生, 有时候linux产生的时候会遇到问题). |
|||||||||||||||||||||||||||||||
Willam2004
2011-11-14
引用 转成汇编代指令(32bit, 因为JVM是32bit的): 0000000000000000 8b4204 mov eax, [edx+0x4] //出问题的地方, 指向了0x00000004, 0到64K是保护地址/NULL指针保留地址, 所以…..不能被访问 0000000000000003 8b00 mov eax, [eax] 0000000000000005 85c0 test eax, eax 0000000000000007 7522 jnz 0x2b 0000000000000009 8b4e04 mov ecx, [esi+0x4] 000000000000000c 8b521c mov edx, [edx+0x1c] 这个在哪里转换的?怎么转换的? |
|||||||||||||||||||||||||||||||
xiaoyu
2011-11-15
Willam2004 写道 引用 转成汇编代指令(32bit, 因为JVM是32bit的): 0000000000000000 8b4204 mov eax, [edx+0x4] //出问题的地方, 指向了0x00000004, 0到64K是保护地址/NULL指针保留地址, 所以…..不能被访问 0000000000000003 8b00 mov eax, [eax] 0000000000000005 85c0 test eax, eax 0000000000000007 7522 jnz 0x2b 0000000000000009 8b4e04 mov ecx, [esi+0x4] 000000000000000c 8b521c mov edx, [edx+0x1c] 这个在哪里转换的?怎么转换的? 有很多工具, VC也可以. 下面是udis (RednaxelaFX 介绍的工具), linux的工具, 简单好用, 下载地址: http://udis86.sourceforge.net/ 下载好后(我下载的是udis86-1.7.tar.gz), 放到/usr/local 目录下 $: tar zxvf udis86-1.7.tar.gz $: cd udis86-1.7 $: ./configure $: make $: make install 就安装好了. 然后下面这样就能显示了. $: echo "65 67 89 87 76 65 54 56 78 89 09 00 87" | udcli -32 -x |
|||||||||||||||||||||||||||||||
337240552
2011-11-15
|
|||||||||||||||||||||||||||||||
xiaoyu
2011-11-17
今天拿到了Core Dump,
当前JVM出错的地方 (绿色高亮地方, 而且往下看这个是C2的编译): #0 0xffffe410 in __kernel_vsyscall () #1 0x0035adf0 in raise () from /lib/libc.so.6 #2 0x0035c701 in abort () from /lib/libc.so.6 #3 0xf7a13f75 in os::abort(int) () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so #4 0xf7af2914 in VMError::report_and_die() () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so #5 0xf7a18c4a in JVM_handle_linux_signal () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so #6 0xf7a160c4 in signalHandler(int, siginfo*, void*) () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so #7 <signal handler called> #8 0xf79cb80c in PhaseIdealLoop::build_loop_late_post(Node*, PhaseIdealLoop const*) () //就这里了 (这里的地址和上面不对应, 是因为这个Core Dump 和上面的那个crash log不是对应的, 其实和新的crash log的地址是一样的). from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so #9 0xf79cb50f in PhaseIdealLoop::build_loop_late(VectorSet&, Node_List&, Node_Stack&, PhaseIdealLoop const*) () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so #10 0xf79c9062 in PhaseIdealLoop::PhaseIdealLoop(PhaseIterGVN&, PhaseIdealLoop const*, int) () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so #11 0xf778130d in Compile::Optimize() () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so #12 0xf777e4b2 in Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, int) () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so #13 0xf7726d83 in C2Compiler::compile_method(ciEnv*, ciMethod*, int) () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so //C2 编译 #14 0xf7786a39 in CompileBroker::invoke_compiler_on_method(CompileTask*) () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so #15 0xf7786391 in CompileBroker::compiler_thread_loop() () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so #16 0xf7abda96 in compiler_thread_entry(JavaThread*, Thread*) () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so #17 0xf7ab7dc3 in JavaThread::run() () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so #18 0xf7a17c78 in _start(Thread*) () from /usr/local/jdk1.5.0_20/jre/lib/i386/server/libjvm.so #19 0x004a4832 in start_thread () from /lib/libpthread.so.0 #20 0x004040ae in clone () from /lib/libc.so.6 相关的URL (RednaxelaFX 纠正不是下面的bug导致的,看下面他的回复): bug详细信息 (不确定一定是这个bug, 因为 PhaseIdealLoop::build_loop_late_post很多相关的Bug...太不靠谱了..): http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051 6u29的修复: http://www.oracle.com/technetwork/java/javase/6u29-relnotes-507960.html JDK 7 update 1也有修复这个bug。 JIT and Loop Bugs Three bugs reported by various parties, including Apache Lucene developers, have been fixed in JDK 6 Update 29, in addition to a fourth related bug found by Oracle (7070134, 7068051, 7044738, 7077439). 查看了1.5.21-到1.5.22的release notes 没有发现修复任何与build_loop_late_post相关的bug. 两种解决方法: 1. 升级到6u29 2. 启动加入jvm参数, 不让C2编译这个方法 -XX:CompileCommand=exclude,xxxx/spring/xxx/XXXXRequestor,executeRequest (PS:JDK 5不支持的参数太多了...) (参考: https://bugs.eclipse.org/bugs/show_bug.cgi?id=214092) 这样性能会有一定的影响(因为只能使用解释模式执行了. 下面有一个性能损耗测试). |
|||||||||||||||||||||||||||||||
RednaxelaFX
2011-11-17
之前在围脖上也说了,肯定不是Bug 7068051;那个是JDK6u23才开始有的bug。
但仍应是跟server compiler的循环优化相关的bug。 |
|||||||||||||||||||||||||||||||
xiaoyu
2011-11-17
RednaxelaFX 写道 之前在围脖上也说了,肯定不是Bug 7068051;那个是JDK6u23才开始有的bug。
但仍应是跟server compiler的循环优化相关的bug。 呵呵. (微博上没仔细看 - 估计我是漏掉了, 哈). 谢谢再次指正 (要不然就要误导别人了...). |
|||||||||||||||||||||||||||||||
xiaoyu
2011-11-23
为解决方案2做了一个性能损耗测试 (解决方案2: 启动加入jvm参数, 不让C2编译这个方法 -XX:CompileCommand=exclude,xxxx/spring/xxx/XXXXRequestor,executeRequest):
每次损耗百分比: 3% 到 9%
|