咨询各位大神，使用jni，当栈内空间使用为1M时，会触发core

piaoya 2013-05-06

我们的程序在使用libhdfs时，只要栈内的空间申请有1M，就会触发core，而4M则不会，请教下是什么原因？ jvm是6u32的。测试程序如下：

int main(int argc, char **argv) {
|     3
|     4     char array[1024000] = {0};
|     5
|     6     hdfsFS fs = hdfsConnect("default", 0);
|-    7     if (!fs) {
||    8         fprintf(stderr, "Oops! Failed to connect to hdfs!\n");
||    9         exit(-1);
||   10     }
|    11
|    12     hdfsDisconnect(fs);
|    13
|    14     return 0;
|    15 }

看core文件，是在初始化JVM线程时出错了，
(gdb) where
#0 0x000000302af3f53e in vfprintf () from /lib64/tls/libc.so.6
#1 0x000000302af61f54 in vsnprintf () from /lib64/tls/libc.so.6
#2 0x000000302af48001 in snprintf () from /lib64/tls/libc.so.6
#3 0x0000002a95c6351b in os::dll_build_name () from ..//../java6/jre/lib/amd64/server/libjvm.so
#4 0x0000002a958caae2 in ClassLoader::load_zip_library () from ..//../java6/jre/lib/amd64/server/libjvm.so
#5 0x0000002a958cbe37 in ClassLoader::initialize () from ..//../java6/jre/lib/amd64/server/libjvm.so
#6 0x0000002a958cbfe9 in classLoader_init () from ..//../java6/jre/lib/amd64/server/libjvm.so
#7 0x0000002a95a03ef7 in init_globals () from ..//../java6/jre/lib/amd64/server/libjvm.so
#8 0x0000002a95d748b4 in Threads::create_vm () from ..//../java6/jre/lib/amd64/server/libjvm.so
#9 0x0000002a95a72180 in JNI_CreateJavaVM () from ..//../java6/jre/lib/amd64/server/libjvm.so
#10 0x0000002a9616b742 in getJNIEnv () at hdfsJniHelper.c:503
#11 0x0000002a96164fd7 in hdfsConnectAsUser (host=0x400900 "default", port=0, user=0x0, password=0x0) at hdfs.c:201
#12 0x00000000004007bf in main ()

请问有人遇到过么？

RednaxelaFX 2013-05-07

这个样子的案例以前没见过。

piaoya 写道

只要栈内的空间申请有1M，就会触发core，而4M则不会

请问这句话能再详细点展开来说明么？

是说-Xss1m会crash，而-Xss4m不会，还是怎样？

RednaxelaFX 2013-05-09

楼主还在不？有几个疑问点：

1、您的环境里以下几个值是什么？
环境变量 LIBHDFS_OPTS 的内容是什么？里面有没有配置-Xss？
ulimit -s 的值是多少？

2、您的范例代码的这种使用方式实际上会在这个进程的初始线程（primordial thread）上创建HotSpot VM的实例；HotSpot VM会将该线程用作自己的Java主线程。

JVM可以主动创建线程用作Java线程；外部的线程也可以attach到JVM上成为Java线程。
HotSpot VM无法控制attach上来已有的线程的栈大小，只能控制自己主动创建的线程的栈大小。
JNI API允许外部线程attach到JVM成为Java线程的函数主要有两个：
JNI_CreateJVM()

引用

Creating the VM
The JNI_CreateJavaVM() function loads and initializes a Java VM and returns a pointer to the JNI interface pointer. The thread that called JNI_CreateJavaVM() is considered to be the main thread.

AttachCurrentThread()

引用

Attaching to the VM
The JNI interface pointer (JNIEnv) is valid only in the current thread. Should another thread need to access the Java VM, it must first call AttachCurrentThread() to attach itself to the VM and obtain a JNI interface pointer. Once attached to the VM, a native thread works just like an ordinary Java thread running inside a native method. The native thread remains attached to the VM until it calls DetachCurrentThread() to detach itself.

The attached thread should have enough stack space to perform a reasonable amount of work. The allocation of stack space per thread is operating system-specific. For example, using pthreads, the stack size can be specified in the pthread_attr_t argument to pthread_create.

为了检测栈溢出的情况（主要用于满足Java里栈溢出时要抛StackOverflowException的需求），HotSpot VM会在所有Java线程的栈的最顶端极限以上设置若干个不可访问的内存页（rwx权限为---），称为guard pages。其中有用于检测可恢复的栈溢出用的yellow pages，和用于检测不可恢复的栈溢出用的red pages。

在Linux/AMD64上的HotSpot VM里，StackYellowPages 与 StackRedPages 的默认值分别为 2 和 1 ，也就是说HotSpot VM默认会在主动创建Java线程时，为其栈额外设置2个page用作yellow page，再设置1个page作为red page；同时会把glibc自身的guard size设置为0。
如果HotSpot VM要创建的线程不是Java线程，那么就不会设置上述的yellow和red page，同时会把glibc的guard size设置到1个page的大小。

但是HotSpot VM怎么知道Java线程的栈的“最顶端极限”位置在哪里呢？看 JavaThread::create_stack_guard_pages() 的实现：

void JavaThread::create_stack_guard_pages() {
  if (! os::uses_stack_guard_pages() || _stack_guard_state != stack_guard_unused) return;
  address low_addr = stack_base() - stack_size();
  size_t len = (StackYellowPages + StackRedPages) * os::vm_page_size();

  int allocate = os::allocate_stack_guard_pages();
  // warning("Guarding at " PTR_FORMAT " for len " SIZE_FORMAT "\n", low_addr, len);

  if (allocate && !os::create_stack_guard_pages((char *) low_addr, len)) {
    warning("Attempt to allocate stack guard pages failed.");
    return;
  }

  if (os::guard_memory((char *) low_addr, len)) {
    _stack_guard_state = stack_guard_enabled;
  } else {
    warning("Attempt to protect stack guard pages failed.");
    if (os::uncommit_memory((char *) low_addr, len)) {
      warning("Attempt to deallocate stack guard pages failed.");
    }
  }
}

可以知道HotSpot VM是用JavaThread实例记住的stack_base()与stack_size()来算出栈顶极限位置。

而这两个值是在更早的时候记录下来的。Thread::create_vm()开头的一段逻辑如下：

  // Attach the main thread to this os thread
  JavaThread* main_thread = new JavaThread();
  main_thread->set_thread_state(_thread_in_vm);
  // must do this before set_active_handles and initialize_thread_local_storage
  // Note: on solaris initialize_thread_local_storage() will (indirectly)
  // change the stack size recorded here to one based on the java thread
  // stacksize. This adjusted size is what is used to figure the placement
  // of the guard pages.
  main_thread->record_stack_base_and_size();
  main_thread->initialize_thread_local_storage();

  main_thread->set_active_handles(JNIHandleBlock::allocate_block());

  if (!main_thread->set_as_starting_thread()) {
    vm_shutdown_during_initialization(
      "Failed necessary internal allocation. Out of swap space");
    delete main_thread;
    *canTryAgain = false; // don't let caller call JNI_CreateJavaVM again
    return JNI_ENOMEM;
  }

  // Enable guard page *after* os::create_main_thread(), otherwise it would
  // crash Linux VM, see notes in os_linux.cpp.
  main_thread->create_stack_guard_pages();

main_thread->record_stack_base_and_size(); 这里记录了stack base和stack size。

void Thread::record_stack_base_and_size() {
  set_stack_base(os::current_stack_base());
  set_stack_size(os::current_stack_size());
}

// Java thread:
//
//   Low memory addresses
//    +------------------------+
//    |                        |\  JavaThread created by VM does not have glibc
//    |    glibc guard page    | - guard, attached Java thread usually has
//    |                        |/  1 page glibc guard.
// P1 +------------------------+ Thread::stack_base() - Thread::stack_size()
//    |                        |\
//    |  HotSpot Guard Pages   | - red and yellow pages
//    |                        |/
//    +------------------------+ JavaThread::stack_yellow_zone_base()
//    |                        |\
//    |      Normal Stack      | -
//    |                        |/
// P2 +------------------------+ Thread::stack_base()
//
// Non-Java thread:
//
//   Low memory addresses
//    +------------------------+
//    |                        |\
//    |  glibc guard page      | - usually 1 page
//    |                        |/
// P1 +------------------------+ Thread::stack_base() - Thread::stack_size()
//    |                        |\
//    |      Normal Stack      | -
//    |                        |/
// P2 +------------------------+ Thread::stack_base()
//
// ** P1 (aka bottom) and size ( P2 = P1 - size) are the address and stack size returned from
//    pthread_attr_getstack()

static void current_stack_region(address * bottom, size_t * size) {
  if (os::Linux::is_initial_thread()) {
     // initial thread needs special handling because pthread_getattr_np()
     // may return bogus value.
     *bottom = os::Linux::initial_thread_stack_bottom();
     *size   = os::Linux::initial_thread_stack_size();
  } else {
     pthread_attr_t attr;

     int rslt = pthread_getattr_np(pthread_self(), &attr);

     // JVM needs to know exact stack location, abort if it fails
     if (rslt != 0) {
       if (rslt == ENOMEM) {
         vm_exit_out_of_memory(0, "pthread_getattr_np");
       } else {
         fatal(err_msg("pthread_getattr_np failed with errno = %d", rslt));
       }
     }

     if (pthread_attr_getstack(&attr, (void **)bottom, size) != 0) {
         fatal("Can not locate current stack attributes!");
     }

     pthread_attr_destroy(&attr);

  }
  assert(os::current_stack_pointer() >= *bottom &&
         os::current_stack_pointer() < *bottom + *size, "just checking");
}

address os::current_stack_base() {
  address bottom;
  size_t size;
  current_stack_region(&bottom, &size);
  return (bottom + size);
}

size_t os::current_stack_size() {
  // stack size includes normal stack and HotSpot guard pages
  address bottom;
  size_t size;
  current_stack_region(&bottom, &size);
  return size;
}

前面提到了楼主这种写法会让一个Linux进程的初始线程成为HotSpot VM的Java主线程。这种情况下获取stack base与stack size的逻辑就交给这边：

// Locate initial thread stack. This special handling of initial thread stack
// is needed because pthread_getattr_np() on most (all?) Linux distros returns
// bogus value for initial thread.
void os::Linux::capture_initial_stack(size_t max_size)

里面的实现逻辑极其恶心…请自己在这里看：http://hg.openjdk.java.net/jdk6/jdk6/hotspot/file/tip/src/os/linux/vm/os_linux.cpp

=====================

从楼主给的stack trace可以看出，Threads::create_vm()已经运行到调用init_globals()的地方。而在那之前有main_thread->create_stack_guard_pages();的动作，为Java主线程设置了上面提到的guard pages。
Linux默认的栈大小是8MB，那么HotSpot VM应该会在从栈底开始减去8MB左右的位置创建HotSpot guard pages。一切顺利的话，楼主的程序应该不会挂。

但是事实上是挂了。所以一种可能性是HotSpot VM没在正确的位置创建guard page，不幸访问到那些guard pages的话就会crash。

=====================

如果crash的时候有输出hs_err_pid<pid>.log的crash log的话，请把完整内容贴出来，以便验证到底segfault是访问什么地方引发的。
如果segfault根本不是在guard pages上的话那就是别的问题了。

发表回复

>>返回群组首页

咨询各位大神，使用jni，当栈内空间使用为1M时，会触发core

相关讨论

相关资源推荐