能介绍一下StackMapTable属性的运作原理吗？ - 讨论 - 高级语言虚拟机

群组首页 → 编程语言 → 高级语言虚拟机 → 论坛

发表回复

[讨论] 能介绍一下StackMapTable属性的运作原理吗？

watertap 2011-06-30

看虚拟机规范没看懂

RednaxelaFX能介绍一下么？

RednaxelaFX 2011-06-30

让我打捞起去年回复的一个站内信，看是否对你有帮助：
Re: 反编译后frame same和frame append是什么 RednaxelaFX 2010-07-31

引用

---------- Forwarded message ----------
From: RednaxelaFX
To: beneo
Subject: Re: 反编译后frame same和frame append是什么

您好 ^_^

从Java 6开始，JVM规范有一个更新文档，JSR 202，里面提到一种新的字节码校验算法，“类型检查”；在此之前是用“类型推导”的算法。为了支持新算法，Class文件从版本50开始添加了一个新的属性表，叫做StackMapTable，里面记录的是一个方法中操作数栈与局部变量区的类型在一些特定位置的状态。这个规范更新将会被整合到JVM规范第三版中，可以预见会在今年内与JDK7一同发布。

您提到的帖子里的frame append、frame same等指的是StackMapTable中的stack map frame。这个frame跟Java方法调用栈的栈帧（stack frame）不是同一个东西，是描述栈帧状态的一种数据结构，只用于Class文件加载时的校验。注意所谓的frame append、frame chop都是对同一个方法的栈帧而言的，并不是说“添加栈帧”“减少栈帧”或者“压入栈帧”“弹出栈帧”，而是指栈帧的内容（局部变量之类）个数上有变化。

把规范中的一小段引用一下：

JSR 202 写道

4.8.4 The StackMapTable Attribute
The stack map attribute is a variable-length attribute in the attributes table of a Code attribute. The name of the attribute is StackMapTable. This attribute is used during the process of verification by typechecking (§4.11.1).

A stack map attribute consists of zero or more stack map frames. Each stack map frame specifies (either explicitly or implicitly) a bytecode offset, the verification types (§4.11.1) for the local variables, and the verification types for the operand stack.

The type checker deals with and manipulates the expected types of a method’s local variables and operand stack. Throughout this section, a location refers to either a single local variable or to a single operand stack entry.

We will use the terms stack map frame and type state interchangeably to describe a mapping from locations in the operand stack and local variables of a method to verification types. We will usually use the term stack map frame when such a mapping is provided in the class file, and the term type state when the mapping is inferred by the type checker.

...

Each stack_map_frame structure specifies the type state at a particular bytecode offset. Each frame type specifies (explicitly or implicitly) a value, offset_delta, that is used to calulate the actual bytecode offset at which it applies. The bytecode offset at which the frame applies is given by adding 1 + offset_delta to the offset of the previous frame, unless the previous frame is the initial frame of the method, in which case the bytecode offset is offset_delta.

每个map记录了一个字节码相对偏移量，一组局部变量的校验类型以及一组操作数栈的校验类型。它的定义如下：

union stack_map_frame {
    same_frame;
    same_locals_1_stack_item_frame;
    same_locals_1_stack_item_frame_extended;
    chop_frame;
    same_frame_extended;
    append_frame;
    full_frame;
}

same_frame {
    u1 frame_type = SAME;/* 0-63 */
}

same_locals_1_stack_item_frame {
    u1 frame_type = SAME_LOCALS_1_STACK_ITEM;/* 64-127 */
    verification_type_info stack[1];
}

same_locals_1_stack_item_frame_extended {
    u1 frame_type = SAME_LOCALS_1_STACK_ITEM_EXTENDED;/* 247 */
    u2 offset_delta;
    verification_type_info stack[1];
}

chop_frame {
    u1 frame_type=CHOP; /* 248-250 */
    u2 offset_delta;
}

same_frame_extended {
    u1 frame_type = SAME_FRAME_EXTENDED;/* 251*/
    u2 offset_delta;
}

append_frame {
    u1 frame_type = APPEND; /* 252-254 */
    u2 offset_delta;
    verification_type_info locals[frame_type -251];
}

full_frame {
    u1 frame_type = FULL_FRAME; /* 255 */
    u2 offset_delta;
    u2 number_of_locals;
    verification_type_info locals[number_of_locals];
    u2 number_of_stack_items;
    verification_type_info stack[number_of_stack_items];
}

留意到，所谓的“frame same”其实就是指一个stack map frame的第一个字节在0-63的范围内，意义是该frame与前一个frame的局部变量区、操作数栈上对应位置的类型都完全一样。frame append的意义是它对应的字节码偏移量位置上操作数栈为空，局部变量区的对应位置与前一个frame相同，但比前一个额外定义了k个局部变量，k = frame_type - 251；有append就有chop，也就是局部变量个数比前一个frame少的情况。其它frame_type对应的意义有空我写篇blog来说明吧 ^_^

前面提到“一些特定位置”，指的是每个“基本块”的开始位置。一个“基本块”（basic block）就是一个方法中的代码最长的直线型一段段代码序列。“直线型”也就是说代码序列中除了末尾之外不能有控制流（跳转）指令。
一个基本块的开头可以是方法的开头，也可以是某条跳转指令的跳转目标；
一个基本块的结尾可以是方法的末尾，也可以是某条跳转指令（Java中就是goto、if*系列等；invoke*系列的方法调用指令不算在跳转指令中）。如果一个方法代码如下：

public class Foo {
    public void foo() {
        // basic block 1 start
        int i = 0;
        int j = 0;
        if (i > 0) { // basic block 1 end
          // basic block 2 start
          int k = 0;
          // basic block 2 end
        }
        // basic block 3 start
        int l = 0;
        // basic block 3 end
    }
}

那么可以看到就有3个基本块。不过在Java Class文件里StackMapTable关心的是类型检查，为了进一步压缩这个表的大小，使用的基本块定义比通常的定义要更宽松些：一个条件跳转的直落分支与条件跳转前的代码算在同一个基本块内。于是前面的例子就变成：

public class Foo {
    public void foo() {
        // basic block 1 start
        int i = 0;
        int j = 0;
        if (i > 0) {
          int k = 0;
          // basic block 1 end
        }
        // basic block 2 start
        int l = 0;
        // basic block 2 end
    }
}

这个方法就会有一个StackMapTable属性表，其中有一个stack frame map记录（本来应该是两个，但第一个是隐式的，不记录在属性表里）。

public void foo();
  Code:
   Stack=1, Locals=4, Args_size=1
   /* basic block 1 start */
   0:   iconst_0
   1:   istore_1
   2:   iconst_0
   3:   istore_2
   4:   iload_1
   5:   ifle    10
   8:   iconst_0
   9:   istore_3
   /* basic block 1 end */
   /* basic block 2 start */
   10:  iconst_0 /* stack frame map 1 refers to here */
   11:  istore_3
   12:  return
   /* basic block 2 end */

  LocalVariableTable:
   Start  Length  Slot  Name   Signature
   10      0      3    k       I
   0      13      0    this       LFoo;
   2      11      1    i       I
   4      9      2    j       I
   12      1      3    l       I

  StackMapTable: number_of_entries = 1
   frame_type = 253 /* append */
     offset_delta = 10
     locals = [ int, int ]

隐式的第一个基本块的stack frame map是从方法签名计算出来的。这个例子foo是个实例方法，没有显示声明的参数，所以参数个数是1，也就是隐藏参数this。那么在字节码偏移量0的位置上，操作数栈为空，
局部变量区：[ Foo ]

下一个基本块从字节码偏移量10开始。此处变量k已经过了作用域，所以局部变量区的有效内容应该是：
局部变量区：[ Foo, int, int ]
这就比前一个基本块开头处的状态多了2个局部变量，类型分别是[ int, int ]，所以就有了上面对应的StackMapTable项了，253 - 251 = 2。

不知道这个解释是否足够解答您的疑问呢？

有了StackMapTable之后，JVM在做Class文件的字节码校验时就可以比较方便的做类型检查。