自由讨论帖/建议帖/资料补充帖/马克帖（谢绝纯水）讨论第4页: - 讨论 - 高级语言虚拟机

[讨论] 自由讨论帖/建议帖/资料补充帖/马克帖（谢绝纯水）

RednaxelaFX 2011-03-01

新版JVM规范的草案可以下载到了，在这里：http://jcp.org/aboutJava/communityprocess/maintenance/jsr924/index3.html

这个新版本目标是定义Java SE 7中的JVM的规范，不过当前的草案暂时还没把JSR 292（“invokedynamic”）相关的变更点整合进来。相关整合工作将在Java SE 7正式发布前完成。也就是说快了。

william_ai 2011-03-02

draft中的去掉了jvms2中的Java Programing Language Concepts
draft中的chapter2.10对应jvms2的3.10，对Exceptions的介绍丰富了许多。
draft中的去掉了jvms2中的chapter3.11.10 Implementing finally
draft中的chapter2.11.10 Synchronization更详细了
draft中增将jvm中的Chapter7 Compiling for the Java Virtual Machine放到了Chapter 3中
draft中Chapter3取消了Compiling Nested Class and Interface 增加了Annotations
draft中多了许多Attributes的介绍，jvms2中只有10个，draft中足有22个

RednaxelaFX 2011-03-16

昨天IE9发布了。不过刚刚看到一个有趣的推：

Craig Walker 写道

@storminwalker
Wow. "There is no JIT compiler for 64bit IE9" "32bit IE9 runs script up to 4x as fast as 64bit IE9" http://goo.gl/aXuZp #ie9 (via @cheeaun)

这真够囧的 = =

========================

昨天还有好几个东西发布，
JRuby 1.6.0 http://jruby.org/2011/03/15/jruby-1-6-0.html
Rubinius 1.2.3 http://rubini.us/releases/1.2.3/

========================

呜哇，现在还有在DLR上实现的VBScript了：
ASP Classic Compiler

========================

CRuby的GC改进案：并行标记 https://gist.github.com/875822

========================

IcedRobot的做法是把Dalvik的dex格式可执行文件转换回到Java的Class格式然后再扔给JVM去加载。正好走了一个轮回 = =
IcedRobot/Daneel awesomeness

引用

This is the first output from Daneel, as written by Remi Forax. It is running a HelloWorld implemented for Android, running on a Java VM. It's using Smali for parsing the Dex file and ASM for writing Java bytecode. The Dex file is loaded through a special Java classloader that does all the translation.

The most awesome thing now is that Michael Starzinger at the same time proposed his own DaneelClassLoader, which is not using Smali, but instead our own Dex parser (mostly written by Michael). This thing can also run HelloWorld:

zafena development: Running .dex files on top of OpenJDK using JamVM

========================

[Yarv-devel] Tony Printezis 氏

引用

K.Sasada ko1 ＠ namikilab.tuat.ac.jp
2004年 2月 18日 (水) 02:31:59 EST
前の記事 [Yarv-devel] Re: CFP: PRO-2004-1
次の記事 [Yarv-devel] Tony Printezis 氏
記事の並び順: [ 日付 ] [ スレッド ] [ 件名 ] [ 著者 ]
shudo at computer.org wrote :
[ Re: [Yarv-devel] Tony Printezis 氏 ]
at Fri, 13 Feb 2004 14:33:45 +0900 (JST)

　ささだです。

> [題目] Java 仮想マシン (HotSpot VM) のガーベジコレクションについて (仮)

首藤さんの写真：http://www.shudo.net/photo/040216-Tony-Printezis/

　このML参加者の方では、主催者の首藤さんと発表者の前田さんで
したでしょうか。

　発表スライドのPDF、参加者の方はメールで頂けたわけですが、
やっぱり公開しちゃまずいものなのでしょうか。問い合わせてみる
といいのかな。
（見たいって人は居なかったりして :-X）

　発表自体は Sun の HotSpot での並行、並列世代別GC の話だと
思ったんですが、内容はよくわかりませんでした。Card table とか
rememberd set とかから脱落。

　fast allocation ってのは結局 thread local なヒープがあるって
ことでいいんでしょうか。thread local な eden 、なのか。

　自分がした質問すらよくわからなからなくて。
「GCのデバッグってどうやってる？」って聞いたんですが、やっぱり
変更点を追っていくとかそういう地道な作業ってことだったかなぁ。

私の日記で前田さんが色々と教えてくれました。
http://www.namikilab.tuat.ac.jp/~sasada/diary/200402.html#cc17-1
----
GCの技術的には、彼の論文や
[[http://java.sun.com/docs/hotspot/gc1.4.2/]]
[[http://java.sun.com/docs/hotspot/gc1.4.2/faq.html]]
[[http://www-106.ibm.com/developerworks/java/library/j-jtp11253/]]
[[http://www-106.ibm.com/developerworks/library/j-jtp01274.html]]
なんかと似た内容。大規模なアプリを支える本格的な処理系の話なので迫力がある。

前田の話は
[[リターンバリア|http://www.ipsj.or.jp/members/Trans/Jpn/03/2000/4109/article009.html]],
[[その英語版|http://ryujin.kuis.kyoto-u.ac.jp/~yuasa/ilc2002/]],
八杉さんの階層的グループ化コピーGC,
[[中村さんのスレッドローカルGC|http://www.ipsj.or.jp/members/Trans/Jpn/03/2003/4413/article002.html]]
[[前田のインクリメンタル旧世代GC|http://www.ipsj.or.jp/members/Trans/Jpn/03/2003/4413/article015.html]]
なんかの紹介です。
----

--
// SASADA Koichi @ namikilab.tuat.ac.jp
//

//　今年は英語を聞き取れるようになりたい...。

=============================

发现一个老物，Sun的RTSJ实现，Project Mackinac: Sun's Real-time Java Platform

引用

Project Mackinac: Sun's Real-time Java Platform
Project Mackinac is the first commercial implementation by Sun Microsystems of JSR-01, the Real-time Specification for Java (RTSJ), which combines industrial-strength real-time functionality with the usability, abstraction, and standards-based advantages of Java technology. Project Mackinac supports both hard real-time and non-real-time functionality concurrently within a single JVM. Based on the Java Hotspot platform, Project Mackinac delivers performance that is competitive with compiled solutions such as real-time C++, while retaining the ease of use and high-level abstractions that make Java so popular among developers and system architects

The first release of Project Mackinac will address the needs of real-time systems that are integrated as part of a larger system solution, targeting applications that include monitoring, control, and management functions.

For more information, please refer to:

"Suns Real-time Java Platform", a Sun Microsystems executive brief
The Real-time Java Platform, a Sun Microsystems technical white paper
The Real Time Specification for Java website

=======================

InfoQ: IBM Releases New 64-bit Java SDK for z/OS
J9 VM 2.6系的新发布

说到J9，回头记得在GR里fo上这个聚合站：http://planet-oti.muellerware.org/
OTI Smalltalk / ENVY ? MVS Smalltalk ? VisualAge Smalltalk ?
<< What are their relationships?
Wikipedia说：

引用

VisualAge for Java was based on an extended Smalltalk virtual machine which executed both Smalltalk and Java byte codes. Java natives were actually implemented in Smalltalk.

OTI Smalltalk -> Smalltalk/X Java Extensions-> OTI UVM -> IBM J9 VM

Will It Go Round in Circles?

引用

Enter Java

When Java “reared it’s head.” IBM, of course came out with VisualAge for Java. Again, the IDE was written in Smalltalk, although, in order to accomodate Java execution, OTI developed what was called the UVM (or Universal Virtual Machine) whcih could execute both Smalltalk and Java byte-codes, and used Smalltalk to implement the Java primitive functions which were implemented in C in the Sun JVM.

IBM eventually switched its focus from Smalltalk to Java. After considerable thought about this, and conversations with those who were in a position to know, I’m now convinced that this had nothing to do with the relative technical merits, and everything to do with a combined IBM/Sun platform play against Microsoft.

VisualAge for Java did eventually move away from the UVM for Java execution, primarily because the Java language had evolved to provide a standard C api for extensions. OTI began implementation of a new JVM, which was dubbed J9, which was built from the ground up to be usable in embedded systems.

OTI had a reputation of fostering the use of Smalltalk in embedded systems. When I first became aware of them in the early OOPSLA days, they were showing off products like an HP Network Analyser, and a TekTronix Digital Oscilloscope which used Smalltalk to implement their user interfaces. They were also involved with a large semiconductor company in Texas which was planning to use Smalltalk in controllers in an automated IC fabrication facility.

Around the time of the birth of J9, IBM was becoming quite interested in embedded software, under an initiative called Pervasive Computing (PVC). Because of this, OTI now an IBM Subsidiary, started working on VisualAge Micro Edition, which bundled J9 with an cross-platform IDE written, this time, in Java. About this time, I moved from IBM to OTI, and ended up working on the VAME IDE.

The Emergence of Eclipse

Work on VAME was distributed between various OTI labs. The lab in Phoenix acted as the liaison with the IBM PVC folks, the J9 VM work was done in Ottawa, Raleigh did some applications work, and some UI work, but the bulk of the VAME UI framework was done by Erich Gamma’s team at OTI Zurich. The IDE used Swing, as most Java desktop apps did in those days, and apparently that was a bit of a struggle. OTI used to have an annual all-hands internal developer conference in sunny early February Quebec, and I’ll never forget Erich, who is quite the Alpine skier, talking about the UI while showing a slide he took at some ski report showing a sign with a precariously oscillating chair-lift and a skier about to fall, with the text “Don’t Swing!”

This led to the desire to develop a new UI framework and what is now known as the standard widget set or SWT.

And the overall IDE/UI framework which replaced VAME was, of course Eclipse.

And now it seems we’ve gone from a Smalltalk IDE hosting Java development to the reverse!

Where Are They Now

Digitalk eventually merged with ParcPlace to form ParcPlace-Digitalk. For a while, both Smalltalk/V and VisualWorks/Objectworks (the ParcPlace Smalltalk-80 products) were carried, but eventually VisualWorks survived. It’s now shepherded by Cincom, which bought the rights.

Similarly, IBM eventually sold the rights to market and develop VisualAge Smalltalk (a.k.a. VAST) to Instantiations. My old friend and colleague from IBM John O’Keefe “retired” from IBM to lead the development team.

Several of the X3J20 guys went to or through Microsoft. George Bosworth worked for several years on a precursor to today’s Microsoft DLR. Davd Simmons and Allen Wirfs-Brock both work on Javascript for Microsoft.

And, of interest to the Ruby community, Gemstone is now working on Maglev, a Ruby implementation with persistent objects based on their mature Smalltalk product.

orz…微软里也有Smalltalk血液。这玩儿真是可惜了…

引用

David Simmons is long time member of the Smalltalk community and has been professionally involved with languages and tools for nearly 30 years. He has been at Microsoft in the Develop Tools Division for three years involved in language tools and technologies. He is currently architect for the Microsoft JavaScript team which is headquartered in Hyderabad India. He designed and worked on high performance language engines as part of Smallscript Corporation, and Quasar Knowledge Systems Inc. His work on S# and subsequent activities at Microsoft have been focused on issues of human complexity and scalable performance, security, and interop in dynamic language systems.

引用

John Duimovich, Java CTO and IBM distinguished engineer, has been the lead designer and implementer for the OTI/IBM virtual machine technology for the past twenty years. He has designed virtual machines for a wide range of platforms, from the implementations for embedded and real time systems to those for IBM mainframe systems. John has played a key role in the development of the IBM J9 Virtual Machine, ENVY/Smalltalk, VA/Micro Edition, and VA/Java Java IDEs. He also had a role in creating Eclipse, is the Tools PMC lead and a member of the technology PMC.

Mark Reinhold is chief architect of the Java Platform group at Oracle. His past contributions to the platform include character-stream readers and writers, reference objects, shutdown hooks, the NIO high-performance I/O APIs, library generification, and service loaders. Mark was the lead engineer for the 1.2 and 5.0 releases and the specification lead for Java SE 6. He is currently leading the Jigsaw and JDK 7 projects in the OpenJDK community. Mark holds a PhD in Computer Science from the Massachusetts Institute of Technology.

http://duimovich.blogspot.com/2006/09/performance-is-not-optional.html

Rick DeNatale: Performance Anxiety

引用

Consider the source
In his day job, paraphrasing his self description John "works for IBM on Java virtual machines and is the lead on the Eclipse tools project management commitee."

But some of my readers might be interested in John's background. John was for a very long time, the lead of the Smalltalk and Java virtual machine team at Object Technology International (OTI) dating from before the time it was acquired by IBM. Among other things John was responsible for the development of embedded Smalltalk virtual machines from OTI, which spawned the VM used in Smalltalk/V Mac, IBM Smalltalk (used in IBM/VisualAge), the 'Universal' Virtual machine which implemented Java on an extended Smalltalk VM, and which was used for the early releases of IBM/VisualAge for Java, and the J9 Java VM. A good deal of what I know about implementing VMs comes from working, lunching, and bar-hopping with John.

John had become OTI's Chief Technology Officer before OTI got assimilated into the IBMborg.

John is a brilliant guy, with a great sense of humor. Two characteristics which seem to have been requirements for a job at OTI. I'm still not sure how I ended up spending several years there.

http://talklikeaduck.denhaven2.com/2009/04/01/if-ibm-buys-sun

引用

When I was working at OTI, I was on the edges of the relationship between IBM and Sun which had been forged over Java.

A lot of the tensions were due to different goals. In the part of IBM/OTI where I worked, there was great interest in using Java for embedded applications. The IBM J9 VM came out of OTI’s legacy in building embeddable Smalltalk VMs. J9 provided a modular implementation, so the footprint could be controlled by choosing options and selecting one of a number of different core class libraries of different sizes. This was great for the embedded developer giving lots of control, but flew in the face of Sun’s “Write once, run anywhere” Java slogan. This led to a lot of friction with Sun trying to control things via the JCP. As I understand it, IBM and Sun had negotiated a special relationship over Java which gave IBM some freedom of action outside of normal Sun terms, but IBM always played by the Sun rules as much as possible.

http://talklikeaduck.denhaven2.com/2006/07/29/about-me

引用

And IBM made the jump to Java. Most of us who had been working on Smalltalk were encouraged to start working on Java and Java tools.

About this time, I took an assignment to OTI, which had been acquired by IBM Canada as a wholly owned subsidiary.

OTI itself was making an investment in Java. The first step was to turn the IBM/Smalltalk virtual machine into what was called the UVM or Universal Virtual Machine. This could run both Java and Smalltalk bytecodes. The Java “natives,” the equivalent of extensions in Ruby were written in Smalltalk instead of C. The first VisualAge for Java was written in Smalltalk on this UVM.

Eventually, as Java evolved this approach became less tenable. So OTI started building a new IDE in Java. Our first use for this was for a system to develop embedded Java applications. IBM was getting very interested in embedded software, which they called “Pervasive Computing” and had formed a division called PvC to pursue it. Embedded applications (e.g. code inside an oscilloscope, or a cell phone) written in Smalltalk was an early focus of OTI predating the relationship with IBM. Java, which was actually first developed within Sun for such embedded applications, was a natural language to employ.

The resulting development was called VisualAge/Micro Environment or VAME. The IDE and UI design was done at the OTI Zurich lab by a team led by Erich Gamma of the “Gang of Four.” This code was almost completely reworked and combined with Jeff McAffer’s component run-time architecture to become the basis for Eclipse.

After working on VAME, I moved over to a group working with customers on embedded Java applications. We majored on automotive applications, since there were a lot of car companies and suppliers who were interested in telematics systems with sophisticated software. As a “standards guy” I represented IBM in standards organizations such as the “Vehicle Expert Group” within OSGi

http://talklikeaduck.denhaven2.com/2008/12/03/the-oti-brotherhood

http://talklikeaduck.denhaven2.com/2009/06/07/geek-mugshots

引用

This mug goes into the dishwasher every night, and comes out every morning, it commemorates the J9 Java virtual machine from OTI. The logo is derived from the VisualAge logo.

引用

J9 was started as clean-room Java VM implementation for the embedded (J2ME) space
Small, Hotswap debugging, JVMPI, TCK compliant
Garbage collection was single threaded generational solution

Fast forward…

J9 continues to be clean room, but is also targeted for the desktop and server (J2SE) space
Keep things small, but features add size
Scalable collection strategy (CPU + Memory)

Dave Thomas说Smalltalk历史：
Travels With Smalltalk
You Can’t Do That With Smalltalk!

The new crop of Java virtual machines (panel)
CSET 4250 - Smalltalk

Understanding the Building Blocks of Trace Selection
The Potentials and Challenges of Trace Compilation: Lessons learned from building a trace-JIT on top of J9 JVM
哦哦，基于J9/Testarossa的trace-based compiler

Under the Hood: Inside a High-Performance JVM™ Machine
这个貌似跟JavaOne 2009下载不到的那个演示是同一个。耶，终于找到了！

Java JIT compiler and related publications

J9至少在IBM DK for Java 1.3.1 SR5就有跟着Sovereign JVM一起发布了，叫做side-car VM（而Sovereign则叫做classic VM）
http://publib.boulder.ibm.com/infocenter/ieduasst/v1r1m0/index.jsp?topic=/com.ibm.iea.was_v5/was/5.0.2/DevelopmentTools/ASTK/player.html

在早期IBM DK for Java中，Sovereign VM是唯一的JVM，而且貌似是通过-Djava.compiler的方式把JIT编译器注册到VM上的。后来J9加入，但只用于开发、、调试用，主体仍然用Sovereign。貌似是到IBM的JDK5才开始以J9为主。能确定的是到IBM的JDK6的时候就只剩下J9了。

在关于IBM JDK5的J9与老版IBM JVM的对比时，总是能看到这样的几行：
1.4.2 做OSR（MMI2JIT） / 1.5 不做OSR
1.4.2 Java栈融合在native栈上 / 1.5 用单独的Java栈

但从现在的J9看来，它在x86和x64上明明也是把Java栈与native栈融合在一起了嘛。这方面发生过变更么？

developerWorks: Maximizing Java Performance on AIX: Part 5: References and Conclusion (2004)

引用

However, we frequently receive queries about how to translate specific Sun Java command-line switches to equivalent IBM Java command-line switches. These switches almost always correspond to Garbage Collection, as a well-tuned GC is essential to any Java-based application's performance. A mapping between Sun and IBM switches is difficult because of the difference in JVM architecture. The IBM Java does not contain a Generational Garbage Collector, and does not understand any command-line switches that start with -XX. IBM Java "Sovereign" architecture is not based on Sun HotSpot architecture as well. The easiest, and in most cases the quickest, way is to throw away all Sun-specific settings when running your application on IBM Platforms, and carrying out the fine-tuning as needed. But if you are curious about how some Sun switches map to IBM switches, read on.

http://publib.boulder.ibm.com/infocenter/javasdk/v1r4m2/index.jsp?topic=/com.ibm.java.doc.diagnostics.142/html/introducingthemmi.html

===========================

http://article.gmane.org/gmane.comp.java.harmony.devel/1261/match=osr+on+stack+replacement

===========================

Applying Automated Learning to Improve Optimizing Compilers

引用

Abstract
The compilation of complex programs for efficient execution in modern
computer architectures requires that many difficult optimization problems be addressed. Recently there has been great interest in the application of automated learning techniques to improve the compilation process. In this talk I report on our recent experience investigating the use of Support Vector Machines (SVMs) to improve compilation strategies in Testarossa, the commercial Just-in-Time compiler that is used in IBM's J9 Java Virtual Machine. This talk not only presents the results of applying SVMs to this learning task, but also discusses very important practical issues, such as the precise measurement of compilation and execution time and the necessary reduction in the volume of data used for training, that arise when
integrating a machine-learned model into a commercial compilation environment. The machine-learned plans outperform the original Testarossa for start-up performance, but not for throughput performance, for which Testarossa has been highly hand-tuned for many years.

以J9为基础，应用Support Vector Machine的机器学习，改进JIT的效果

===========================

c2.com: Java History
这帖好欢乐啊

http://c2.com/cgi/wiki?IbmSmalltalk
http://c2.com/cgi/wiki?HotSpotVm

===========================

Security on JIT

===========================

Isolating Failure-Inducing Thread Schedules

引用

Jalape˜no and Scheduling
Using implementation of the garbage collector to do scheduling.
Quasi-preemptive scheduling
Safe point - is a program location where the compiler that created the method
body is able to describe where all the live references exit.
Yield point - is a safe point located at a method prologue (such as function invocation or at a loop back-edge).
Thread switching takes place only when a running thread has reached a yield point.

=============================

IBM Java Technology (2003)

=============================

黑历史收集

http://www.linkedin.com/in/davidungar

引用

David Ungar's Education

University of California, Berkeley
Ph.D., Computer Science
1980 – 1985
Participated in architectural design of SOAR, Smalltalk on a RISC microprocessor. Invented call-instruction format used today in SPARC. Lead graduate student on team building virtual machine for SOAR, Smalltalk on A RISC. On my own, modified simulator and virtual machine to benchmark effect of each architectural idea. On my own, designed, analyzed, implemented and measured the first two-generation garbage collector. Published the first performance numbers for generational collection. Variants now used by almost eery commercial Smalltalk and Java system today. Designed and built first Smalltalk system with 32-bit pointers and no time-wasting indirection (Berkeley Smalltalk).

从David Ungar的个人经历窥探SOAR与SPARC之间的关系

===========================

EclipseCon 2011: Sovereign: Migrating Java Threads to Improve Availability of Web Applications
http://www.so.in.tum.de/sovereign/index.php5/Category:Modified_VM
http://www.slideshare.net/mknauer23/sovereign-migrating-java-threads-to-improve-availability-of-web-applications

===========================

http://download.oracle.com/docs/cd/E13924_01/coh.340/e13818/appproductcheck.htm#CHHFADDF

引用

In terms of Oracle Coherence versions:

Coherence 3.x versions are supported on the Sun JDK versions 1.4 and 1.5, and JVMs corresponding to those versions of the Sun JDK. Starting with Coherence 3.4 the 1.6 JVMs are also supported.

Coherence version 2.x (currently at the 2.5.1 release level) is supported on the Sun JDK versions 1.2, 1.3, 1.4 and 1.5, and JVMs corresponding to those versions of the Sun JDK.

Often the choice of JVM is dictated by other software. For example:

IBM only supports IBM WebSphere running on IBM JVMs. Most of the time, this is the IBM "Sovereign" or "J9" JVM, but when WebSphere runs on Sun Solaris/Sparc, IBM builds a JVM using the Sun JVM source code instead of its own.

BEA WebLogic typically includes a JVM which is intended to be used with it. On some platforms, this is the BEA WebLogic JRockit JVM.

Apple Mac OS X, HP-UX, IBM AIX and other operating systems only have one JVM vendor (Apple, HP and IBM respectively).

Certain software libraries and frameworks have minimum Java version requirements because they take advantage of relatively new Java features.

On commodity x86 servers running Linux or Windows, the Sun JVM is recommended. Generally speaking, the recent update versions are recommended. For example:

Oracle recommends testing and deploying using the latest supported Sun JVM based on your platform and Coherence version.

=========================

VEE04
http://www.research.ibm.com/vee04/

Quickly Reacquirable Locks
Patent 7814488
http://home.comcast.net/~pjbishop/Dave/QRL-OpLocks-BiasedLocking.pdf

Simple and fast biased locks (PACT '10, 2010)
http://portal.acm.org/citation.cfm?id=1854287

http://www.slideshare.net/drkrab/ruby-on-the-jvm

Tuning the Java Virtual Machine for Optimal Performance: Means and Methods
IBM WEBSPHERE TECHNICAL CONFERENCE 2007
hmm，IBM的GC里TLH = TLAB，wilderness/Large Object Area = CLR的LOH么
IBM JDK5里的gencon的heap layout是nursery: semispace/copy collection; tenured: MSC么

http://webdocs.cs.ualberta.ca/~amaral/cascon/CDP05/

引用

JIT Compilation Strategy Overview in J9 JVM
Marius Pirvu, Derek Inglis, and Vijay Sundaresan
Presentation Slides: [PPT] [PDF]
Modern Java Virtual Machines (JVMs) employ Just-In-Time (JIT) compilers that optimize and generate native code at runtime in order to improve performance of Java programs. Because compilation is inherently part of the application running-time, minimizing compilation overhead is a major concern in JIT compilation. Short running programs and the start-up phase in large server or GUI-based applications are examples of scenarios where compilation time is a significant proportion of the overall execution time. To address these problems JIT developers have a few alternatives: (1) Develop cheaper optimization algorithms; (2) Restrict compilation activity to a subset of methods that are deemed to be "frequently executed"; (3) Attenuate the negative impact of compilation on the overall execution of the program. In this paper we will elaborate on solutions for the latter two options to reduce compilation overhead, solutions implemented in the TR JIT compiler developed for the J9 virtual machine at IBM.

Adaptive JIT compilers, like Sun's Hot Spot or Jikes RVM, start from the assumption that most of an application's time is spent in a small portion of the code. They dynamically identify the set of performance critical routines and optimize them heavily, while the rest of the code is either interpreted or compiled with minimal optimizations. Both J9 and the previous JVM offered by IBM (codename Sovereign) follow this principle. However, from the point of view of compilation strategy, J9 is different from Sovereign in the following respects: (1) it provides a higher optimization granularity by offering multiple optimization levels; (2) it monitors program execution continuously and may recompile methods at higher optimization levels. We will describe the compilation infrastructure implemented in the TR JIT compiler and present a comparison among different optimization strategies regarding compilation time and application's performance.

Some applications have a very flat execution profile with no clear hot-spots. A case in point is WebSphere Application Server where thousands of methods are, more or less, equally important. In these situations the JIT compiler cannot make discriminatory decisions and will have to either spend a significant amount of time compiling these methods or lower the optimization level at the expense of runtime performance. To cope with this scenario we have developed a mechanism that selectively lowers the optimization level for some parts of the execution. We will describe this mechanism and show how it enabled us to significantly reduce the start-up of WebSphere (up to 33%) without degrading the performance of J2EE applications running on %top of it.

To attenuate the negative effects of compilation we have recently implemented an asynchronous compilation mechanism: All compilations are carried out by a separate compilation thread that runs asynchronously to the Java threads. This means that a Java thread will not have to wait for the compiled version of the method, but can instead go ahead and make progress. The major benefit of the asynchronous compilation is that it increases the application's parallelism when running on multiprocessors. In addition to detailing our implementation of asynchronous compilation, we will describe other work done to minimize the impact of a long-running compilation on overall performance. We will also present experimental data showing substantial improvements in start-up of WebSphere (up to 25%) and speedup of short running applications (up to 24%).

Evaluation of a just-in-time compiler retrofitted for PHP
J9/TR用来跑PHP？

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4211021
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5387112
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=779076

OOPSLA Java VM panel notes

http://software.intel.com/en-us/articles/identifying-jvm-vectorization-simd-usage-with-the-vtunetm-performance-analyzer/
http://software.intel.com/en-us/articles/optimizing-java-on-intel-server-platforms/
http://software.intel.com/en-us/articles/multi-threading-in-a-java-environment/
http://software.intel.com/en-us/articles/how-to-spice-up-java-with-the-vtunetm-performance-analyzer-1/

The Compaq Fast Virtual Machine, June 1999.
http://www.compaq.com/java/FastVM.html

===============

Java Just-In-Time Compiler and Virtual Machine Improvements for Server and Middleware Applications (2004)(VM04)
System.currentTimeMillis() is expensive, because:
1. there is an expensive call from JITted code to native code
2. the method returns a long value that must be held in a register (increasing register pressure) or returned via the stack (which may be slower to access).
有趣，可以只对由内联同步方法带来的小同步块做lock coarsening，而不干扰原本手写的。貌似还值得顺便分析一下某个方法是不是其实整个被一个大同步块包围住（也就是说效果上跟把方法声明为synchronized类似）
PowerPC有直接清空整条cache line的指令诶。为啥x86就一直都没有
！J9的TR JIT编译器可以想办法识别出Java程序中本来需要的是unsigned int却不得不用long来做运算的地方。厉害啊

==============

http://publib.boulder.ibm.com/infocenter/ieduasst/v1r1m0/index.jsp?topic=/com.ibm.iea.was_v7/was/7.0/Architecture/WASv7_JavaRuntime/player.html
orz...J9在IBM JDK5里“取消了”OSR，然后在JDK6里又加上了“dynamic loop transfer”，换个马甲又出来了么？
-Xjit:disableDynamicLoopTransfer

Multi-threading in Java - Performance and Scalability on Multi-core Systems (2010)

引用

HotSpot JVM offers a similar approach called threadlocal
allocation buffer (TLAB). There are some differences
between a TLAB and a TLH. The TLH approach maintains
the invariant that objects within each TLH are local
to a single thread. All objects may be allocated into the
TLH, as long as they are evacuated before becoming nonlocal.
It has the advantage that thread-local objects could
be collected independently without stopping other
threads. However, TLH is complicated because it requires
the support of compilers, and its overhead such as a write
barrier have to be introduced.

In contrast, the TLAB approach allows any object to be allocated locally in the allocation buffer belonging to the thread which creates the objects. Moreover, the implementation of TLAB approach is simpler and often leads to the similar performance as a TLH. Thus we focus on the TLAB instead of the TLH in this section.

Immix: A Mark-Region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance (2008)

S. Borman. Sensible sanitation – understanding the IBM Java garbage
collector. http://www.ibm.com/developerworks/ibm/library/
i-garbage1/, Aug. 2002.

Thread-Local Heaps for Java (2002)

http://xiao-feng.blogspot.com/2008/08/what-is-thread-local-data.html

================

http://www-01.ibm.com/support/docview.wss?uid=swg27014503&aid=1

引用

IBM Java VM 1.1.2 through 1.4.2 has roots in Sun code
A new cleanroom Smalltalk VM was bought in with
the OTI acquisition in 1996
The Smalltalk VM became the J9 Java VM
Common code running on palmtops and
mainframes
Limited deployment at 1.3.1 and 1.4.2
Strategic direction for Java 5 and beyond

http://www.trl.ibm.com/people/kawatiya/pub/Ammons07vee.pdf

引用

For example, in JVMs
that use “safe-points” to support type-accurate garbage collection,
whenever the JVM wants to initiate a garbage collection cycle it
must roll forward all runnable threads to their next safe point. If instead the JVM and the operating system cooperated to ensure that
thread context–switches only occurred at JVM-level safe points, the
transition to garbage collection could be accomplished instantly.
Although rolling forward is not a signiﬁcant cost in stop-the-world
collectors, it can impact the performance of incremental collectors
such as Metronome [3].

Specialized execution environments (2008) http://portal.acm.org/citation.cfm?id=1341335

http://ti.tuwien.ac.at/rts/teaching/courses/jvmhw/files/gcslides.pdf

引用

Root Scanning - GC Safe Points
Threads run through GC safe points
Threads update \root array" at each GC safe point
I Only changes to last GC point must be recorded
Threads are only stopped at GC safe points
I Requires support from the scheduler
GC uses these root arrays as root set
I GC can only start when all threads have passed a GC safe point

Real-time Java, Part 4: Real-time garbage collection

引用

Cooperative suspend mechanism
Although Metronome uses a series of small incremental pauses to complete a GC cycle, it must still suspend the JVM for every quantum in a STW fashion. For each of these STW pauses, Metronome uses the cooperative suspend mechanism in the J9 virtual machine. This mechanism doesn't rely on any special native-thread capability for suspending threads. Rather, it uses an asynchronous-style messaging system to notify Java threads that they must release their access to internal JVM structures, including the heap, and sleep until they are signaled to resume processing. Java threads within the J9 virtual machine periodically check if a suspend request has been issued, and if so, they proceed as follows:
Release any held internal JVM structures.
Store any held object references in well-described locations.
Signal the central JVM suspend mechanism that it has reached a safe point.
Sleep and wait for a corresponding resume.
Upon resumption, threads reread object pointers and reacquire the JVM-related structures they previously held. The act of releasing JVM structures lets the GC thread process these structures in a safe fashion; reading and writing to partially updated structures can cause unexpected behavior and crashes. By storing and then reloading object pointers, the threads allow the GC the opportunity to update the object pointers during a GC quantum, which is necessary if the object is moved as part of any compaction-like operation.
Because the suspend mechanism cooperates with Java threads, it's important that the periodic checks in each thread be spaced apart with the shortest possible intervals. This is the responsibility of both the JVM and Just-in-time (JIT) compiler. Although checking for suspend requests introduces an overhead, it allows structures such as stacks to be well defined in terms of the GC's needs, letting it determine accurately whether or not values in stacks are pointers to objects.
This suspend mechanism is used only for threads currently participating in JVM-related activities; non-Java threads, or Java threads that are out in Java Native Interface (JNI) code and not using the JNI API, are not subject to being suspended. If these threads participate in any JVM activities, such as attaching to the JVM or calling the JNI API, they will cooperatively suspend until the GC quantum is complete. This is important because it lets threads that are associated with the Java process continue to be scheduled. And although thread priorities will be respected, perturbing the system in any noticeable way in these other threads can affect the GC's determinism.

harmony-dev: [arch] voluntary vs. preemptive suspension of Java threads 20050901

RednaxelaFX 2011-04-05

在用JDK6u23玩一些Serviceability Agent的功能，发现clhsdb居然还不支持amd64，很兴奋想把这功能加上去，但抓取新的HotSpot代码发觉已经有人给加上了…T T
就是这里：http://hg.openjdk.java.net/jdk7/jdk7/hotspot/diff/6da3527317ff/agent/src/share/classes/sun/jvm/hotspot/ui/classbrowser/HTMLGenerator.java
还好这个变更要patch到JDK6u23里也不难～马上动手做
<< 呃，光改那一个文件还不够用。好吧我还是把6u25的这部分整个搬过来用好了

==========================================

在JDK6u22和6u23的64-bit里，server compiler (C2/opto)给JMX报告的名字是"HotSpot 64-Bit Server Compiler"（从System.getProperties("sun.management.compiler")可以的到），而不是"HotSpot Tiered Compilers"。嗯，突然想起这个。应该是因为C1在64位上还不能完全正常工作的原因？
这个话题先前在hotspot-dev上果然有人讨论过了：http://web.archiveorange.com/archive/v/gFrEISbbJxKIaqzs4GBc
看来现在C1在x64上已经可以正常运行了。说不定6u25就会把-XX:+TieredCompilation变为server模式的标配？

嘛，从6u25b03看，-XX:+TieredCompilation虽然还不是标配，但在amd64上JIT编译器的名字确实已经变成了"HotSpot 64-bit Tiered Compilers"。实质性的进步啊
而且现在ergonomics会在允许的情况下自动把-XX:+UseCompressedOops打开。这个很棒！

==========================================

Oracle’s New x86 Servers Demonstrate World Record 4 Processor Performance on Industry-Standard Benchmark April 5, 2011

John Rose: in the blue yonder: thinking about syntax support for invokedynamic

Correction: === CALL FOR SPEAKERS -- JVM LANGUAGE SUMMIT, July 2011 ===

Assembly output from JRuby 'fib'

inspecting hotspot code using gdb

Thomas Rodriguez 写道

libjvm.so is also dynamically loaded so they
symbols won't be available until you've at least run it once.

嗯果然如此么，GDB跑过一次代码就能记住符号了

顺带挖掘一老帖

Gary Benson 写道

From gbenson at redhat.com Wed Nov 21 02:06:24 2007
From: gbenson at redhat.com (Gary Benson)
Date: Wed, 21 Nov 2007 10:06:24 +0000
Subject: 6559156: My favorite defect
In-Reply-To: <1195638574.4743ff2eabda4@web-mail.sjsu.edu>
References: <1195638574.4743ff2eabda4@web-mail.sjsu.edu>
Message-ID: <20071121100623.GB3788@redhat.com>

rgougol at email.sjsu.edu wrote:
> I wish I could trace the server compiler at this file using
> GDB. However, while trace virtual machine launcher of openjdk
> fastdebug with GDB, it produces a new process and then goes
> beyond the control. Can you tell me how I may trace the C2
> optimizations with GDB and how to put the break point for the
> new process that is produced using GDB?

(gdb) help set follow-fork-mode
Set debugger response to a program call of fork or vfork.
A fork or vfork creates a new process. follow-fork-mode can be:
parent - the original process is debugged after a fork
child - the new process is debugged after a fork
The unfollowed process will continue to run.
By default, the debugger will follow the parent process.

Cheers,
Gary

还有这个：http://mail.openjdk.java.net/pipermail/hotspot-dev/2007-October/000175.html

Tom Rodriguez 写道

Tom Rodriguez Thomas.Rodriguez at Sun.COM
Thu Oct 18 18:09:40 PDT 2007
Previous message: inspecting hotspot code using gdb
Next message: inspecting hotspot code using gdb
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Using a debugger with the java launcher can be a little tricky since it normally
sets up the LD_LIBRARY_PATH and relaunches the VM to select the right libjvm.so
which makes debuggers unhappy. libjvm.so is also dynamically loaded so they
symbols won't be available until you've at least run it once. Your
LD_LIBRARY_PATH should be set to the value of the java.library.path property
from the VM you want to run. gdb will get into various bad states if you don't
do this part. Also by default you also want to suppress a lot of signals that
the JVM uses.

I attached the script I use for launching the VM under a debugger. It works for
dbx and gdb and has a little Java class for getting the java.library.path
property from a VM. You just prefix dbxr onto the full command line you want to
debug and it sets everything up for you. The script is a little ugly and has
some special bits to deal with getting properties from a JVM which might be at
least a little broken. Obviously if you are trying to debug a JVM which won't
even boot then it will be hard to get properties from it. It's possible to
derive the proper LD_LIBRARY_PATH setting from the location of the JDK but that
was error prone enough that I switched to using a class to read it out. Anyway,
hopefully it will be useful.

tom

Squeezing Performance from the Java HotSpot Virtual Machine
<< 这个online chat的原文已经找不到了。回头得试试archive看。线索链接：
http://www.webdigity.com/index.php/topic,1310.0.Chat+Transcript%3A+Squeezing+Performance+from+the+HotSpot+Virtual+Machine.html

conflict of HotSpot and dynamic languages
http://www.engineyard.com/blog/2009/j-is-for-jvm-why-the-j-in-jruby/
Fractal example, n=6

Markus Kohler (SAP JVM)
http://kohlerm.blogspot.com/

regional collector, SATB, Taichi Yuasa << remember to write this in notes
Taichi Yuasa. *Real-time garbage collection on general-purpose machines*
Sovereign不支持异步编译的话，它是怎么支持多层编译的？每次编译都会阻塞诶。
Kenneth Ma (kenma@ca.ibm.com), Staff Software Developer, IBM Toronto Lab
Marius Lut (lut@ca.ibm.com), Staff Software Developer, IBM Toronto Lab
Real-time Java, Part 2: Comparing compilation techniques
http://webdocs.cs.ualberta.ca/~amaral/IBM-Blainey-talks/compilersurvey.pdf

引用

Sun JDK-based JIT (Sovereign)
Best of breed JIT compiler for client and server applications
Based very loosely on Sun JDK

joeq的Quad与Sovereign之间的关系？

HotSpot也是在1.4的时候才引入OSR的吧？那之前也是同步编译么？
dynamic loop transfer的专利里举例提到了Sovereign。
Jikes RVM是支持异步编译的

http://www.ibm.com/developerworks/java/library/j-nativememory-linux/
一个anti-pattern：大量使用反射，并将Constructor、Method对象缓存在强引用的cache map里。这样会同时引用住生成的调用类（GeneratedXXXAccessor），可能会吃掉很多内存（HotSpot: PermGen; J9, JRockit: native mem） << Hibernate之类的框架就会做这样的缓存，Groovy也会
大量的generated class会潜在影响内存使用量。例子除了反射外，还有：JSP、动态语言，等。
考虑到内存使用量，类的设计其实也有影响。例如一个含有大量vtable entry的类，如果它有很多子类，每个子类都只override一两个方法，那vtable消耗的空间就会显著。在C++里还专门有技巧去避免这种状况。Java程序员平时不会考虑到这个层面的问题。
如果依赖finalize()来释放DirectByteBuffer引用的native mem，释放的速度可能不够快，也会引致native OOM

>> footprint issue: 比较隐蔽，启动能成功，但运行一段时间之后会遇到OOM。实际上不是泄漏而是纯粹的不够资源用了。这种时候升64位会是好选择

pmap要分辨哪些块是栈可以跟jstack之类的输出对比看。还有个有趣的：在栈的“上面”（低地址方向）会有个4K的块，这个是guard page，用来警告就快出现爆栈

64-bit advantage:
bigger address space
more registers (in the case of x86_64)
faster (Java) long arithmetics

disadvantages:
consumes more memory
-> potentially slower because of memory bloat

http://www.slideshare.net/BillyNewport/initial-deck-on-websphere-extreme-scale-with-websphere-commerce-server
http://www.slideshare.net/dougqh/jvm-internals-nejug-nov-2010
http://www.slideshare.net/dougqh/jvm-internals-key-note
http://www.slideshare.net/pbertels/memory-management-in-a-hardwareaccelerated-jvm
http://www.slideshare.net/OracleMiddleJP/jvm-oracle-jrockit
http://www.slideshare.net/tcurdt/no-dark-magic-byte-code-engineering-in-the-real-world
http://www.slideshare.net/pascallouis/applying-compiler-techniques-to-iterate-at-blazing-speed
ftp://193.124.243.99/pub/FreeBSD/ports/java/jdk16/work/hotspot/agent/doc/transported_core.html
http://en.domotica.net/Jikes_RVM
http://www-01.ibm.com/support/docview.wss?uid=swg27012709&aid=1
http://www.slideshare.net/whitepapers/java-standard-edition-5-performance?src=related_normal&rel=1216444
http://hotspot.openjdk-rt.googlecode.com/hg/agent/doc/clhsdb.html
http://www.cs.williams.edu/~bailey/06brh.pdf
http://www.cs.technion.ac.il/~imaman/stuff/hopl.pdf
http://web.ict.kth.se/~matsbror/exjobb/msc_theses/josefinh_thesis_final.pdf

http://www.google.com/url?sa=t&source=web&cd=7&ved=0CEcQFjAG&url=http%3A%2F%2Fcodemate.wordpress.com

%2F2007%2F11%2F12%2Falan-kay-on-history-of-computer-languages-java-and-squeak

%2F&ei=pjvBTfjEK4u8vQOht52bBA&usg=AFQjCNHyROMSAn8vdDC7QIngNj7DxpM9IA

JRockit: Thread-local Area
-XXtlasize

Efficient implementation of the smalltalk-80 system, POPL 1984
<< first paper to mention JIT

http://blog.dynatrace.com/2011/04/20/the-top-java-memory-problems-part-1/
http://blog.dynatrace.com/2011/05/11/how-garbage-collection-differs-in-the-three-big-jvms/comment-page-1/#comment-45777

http://www.slideshare.net/guest83950e2/go-course-day1

A Study of Java Object Demographics
by R Jones - 2008
http://kar.kent.ac.uk/23979/1/JavaDemoJones.pdf

Selected Papers Having Made Impact on Advancement of Computer Memory Systems
http://www.cse.ohio-state.edu/~zhang/influential-papers.html

Using Virtualization to Solve Java’s GC Problem
http://communities.vmware.com/servlet/JiveServlet/previewBody/14995-102-1-18098/resolve11-virtual_gc.pdf

http://developer.apple.com/legacy/mac/library/documentation/Java/Conceptual/Java131Development/Java131Development.pdf
Apple JDK 1.3.1 TLE

http://www.atg.com/service/main.jsp?t=solutionTab&ft=searchTab&ps=solutionPanels&locale=en_US&_dyncharset=UTF-8&curResURL=%252Fservice%252Fmain.jsp%253Bjsessionid%253DDQUxAocOAKTvWYLRjne-ZA**.12%253Ft%253DsearchTab%2526SearchButton%253DFind%2526search.x%253D0%2526search.y%253D0%2526search%253Dsearch%2526useFocusTopic%253Dtrue%2526focusTopic%253D200191%2526searchstring%253DSimple%252525252520Web%252525252520Service%252525252520for%252525252520BEA%252525252520Workshop%2526sfield%253D%2526dosearch%253Dtrue%2526pn%253D1&solutionId=atg42282&isSrch=Yes

http://java.sun.com/docs/hotspot/threads/threads.html
<< 这个文档里也提到了SPARC上的JDK1.3 server模式的TLE

http://sourceware.org/systemtap/wiki/RH2010Webinar
<< IcedTea与SystemTap

http://olex.openlogic.com/wazi/2009/how-to-fix-memory-leaks-in-java/
http://www.slideshare.net/nayashkova/eclipse-memory-analyzer-presentation-763314
http://www.slideshare.net/AJohnson1/extending-eclipse-memory-analyzer

RNI in JRockit, what's the relationship of it with MS's Raw Native Interface? no relationship at all?
see jrockit.vm.RNI
what about RJNI?

code patching and synchronization
virtual stack and shadow stack <- relationship?
object layout and false sharing

JRockit JFR recording engine, thread local buffer -> global buffer

SYSTEM AND METHOD FOR OPTIMISTIC CREATION OF THREAD LOCAL OBJECTS IN A VIRTUAL MACHINE ENVIRONMENT
United States Patent Application 20080021939
http://www.freepatentsonline.com/y2008/0021939.html

Objective-C has thread local collection:
http://www.sealiesoftware.com/blog/archive/2009/08/28/objc_explain_Thread-local_garbage_collection.html

Österdahl, Henrik. A Thread-Local Heap Management System for a JVM using Read- and Write-Barriers. Master's Thesis, Stockholm, Sweden: Royal Institute of Technology, 2005.

An on-the-fly mark and sweep garbage collector based on sliding views (2003)
http://portal.acm.org/citation.cfm?id=949329

Garbage Collection and the IA64 Architecture (2000)
ftp://ftp.cs.umass.edu/pub/osl/papers/ismm00.pdf

Hierarchical real-time garbage collection (2007)
http://portal.acm.org/citation.cfm?id=1254784

http://www.sfjava.org/events/17314017/?eventId=17314017&action=detail

引用

About Dr. Ian Rogers:

Ian Rogers was a Research Fellow in the University of Manchester's Advanced Processor Technology research group. His PhD research work into the Dynamite binary translator was exploited commercially and now forms part of many binary translator products, including Apple's Rosetta. His more recent research work has been in to programming language design, runtime and virtual machine environments - in particular how to allow them to automatically create and efficiently exploit parallelism. At Azul Systems, Ian works on virtual machine design for commodity servers to systems with close to 1,000 cores and terabytes of RAM.

http://blog.thematice.com/html/y2009/05/27/506/nginx-debug-updating.html

http://webdocs.cs.ualberta.ca/~amaral/IBM-Blainey-talks/javahpc.pdf

http://www.cse.ohio-state.edu/~mikebond/bond-diss-2008-2.pdf

http://golang.org/doc/GoCourseDay2.pdf
http://golang.org/doc/GoCourseDay3.pdf

http://www.cincomsmalltalk.com/main/2011/05/cincom-smalltalk-to-be-featured-at-the-apple-store-ginza/

exception handling models
http://lambda-the-ultimate.org/node/3896#comment-58374

A Comprehensive Evaluation of Object Scanning Techniques (2011)
http://users.cecs.anu.edu.au/~steveb/downloads/pdf/scan-ismm-2011.pdf

SableVM
http://www.usenix.org/event/jvm01/gagnon/gagnon.pdf

SableJIT
http://www.sable.mcgill.ca/~dbelan2/research/presentation03.pdf

metacircular JavaScript VM
http://www.sable.mcgill.ca/~clump/cdp2010/ChevalierCDP10.pdf

Leaky regions: linking reclamation hints to program structure
http://research.microsoft.com/en-us/um/people/tharris/papers/2006-leaky-tr.pdf
<< 在Bartok里做的实验。感觉跟TLH有点像，不同的是它有人工的annotation。

哇！！这个绝对古董，老的Oracle JVM：
Oracle's JIT Compiler 20030313145617

AUTOMATICALLY UTILIZING GRAPHICS PROCESSING UNITS FROM JAVA BYTECODE
http://pcpratts.mysite.syr.edu/research/pcpratts_thesis.pdf

Visualization of Dispatching Strategies
http://e-collection.ethbib.ethz.ch/eserv/eth:27709/eth-27709-01.pdf

Trace Fragment Selection within Method-based JVMs (2008)
http://www.cs.virginia.edu/kim/docs/vee08.pdf
<< 这篇的trace JIT也是在interpreterless环境里做的嘛，那算起来这篇比SPUR更早。当然，SPUR是完全的trace based，而这个是trace与whole method混合的

http://b.hatena.ne.jp/articles/201105/2729
<< 这里说的DB2里的STMM（Self-Tuning Memory Manager）好神？
ftp://ftp.software.ibm.com/software/data/pubs/papers/stmm.pdf

http://www.kaffe.org/pipermail/kaffe/1998-October/170056.htm

CViMe: viewing conditionally compiled C/C++ sources through Java
http://portal.acm.org/citation.cfm?id=1176697

A Compiler for the Java HotSpot Virtual Machine
http://books.google.com.hk/books?id=QpL3gu-c_aoC&pg=PA133&lpg=PA133&dq=%22A+Compiler+for+the+Java+HotSpotTM+Virtual+Machine%22&source=bl&ots=d73fzOCEAM&sig=yRPn4-fkb2Q2gAspFyK5pyWf_dY&hl=zh-CN&ei=mZ_kTfHkOI-IvgPS58mSBw&sa=X&oi=book_result&ct=result&resnum=6&ved=0CE4Q6AEwBQ#v=onepage&q=%22A%20Compiler%20for%20the%20Java%20HotSpotTM%20Virtual%20Machine%22&f=false

CS453 Intro and PA1 1
http://www.cs.colostate.edu/~mstrout/CS453Spring07/Slides/070502-RegAllocNstartreview.pdf
<< register allocation for expression trees

Improvements on SSA-Based Register Allocation.
By Fernando Pereira, UCLA.
<< LLVM上做的基于SSA的LSRA优化

@mattpodwysocki
JSIL, an MSIL to JS compiler from @antumbral http://bit.ly/ls8Dj7 part of his XNA demo: http://bit.ly/iQJyyO

@robilad
Der HotSwap patch für HotSpot wird in JAXenter gewürdigt: http://bit.ly/m4qBah

继续挖掘老物。制作OpenJDK的webrev的办法：

引用

From John.Rose at Sun.COM Mon Dec 10 20:06:30 2007
From: John.Rose at Sun.COM (John Rose)
Date: Mon, 10 Dec 2007 20:06:30 -0800
Subject: for review (S): 6610906: inexplicable IncompatibleClassChangeError
References: <C99961AF-962C-45C6-A93B-53D7794A131A@Sun.COM>
Message-ID: <7680C93F-F6D9-4D2E-94CC-81BE54B7D87F@sun.com>

It looks like we've switched now away from the internal aliases.

Here's a code review I recently sent internally, which really should
have been external.

Vladimir, Tom, Steve: Thanks for the reviews; I'll put you in the
changeset comment as soon as I learn your OpenJDK names.

Non-Sun people: Welcome to the party!

-- John

P.S. My first try at sending this went (as spam) to compiler-
dev at openjdk.java.net,
which is the list for javac developers. Oops.

Begin forwarded message:

From: John Rose <John.Rose at Sun.COM>
Date: December 5, 2007 6:46:46 PM PST
To: hs-compiler Compiler <hs-compiler at sun.com>
Cc: Mark Reinhold <mr at sun.com>
Subject: for review (S): 6610906: inexplicable
IncompatibleClassChangeError

http://homepage.mac.com/rose00/work/webrev/6610906/

-- John

P.S. This is my first stumble toward a mercurial-based putback....

To get a more "open" feel for this review, I put this webrev up on an
external site.
The repo. is on my MacBook. I think only the Bugster interactions
have required SWAN access.

However, I'm sending the request to our internal mail alias, not the
openjdk group alias.
Also, this putback can't go anywhere yet, since the parent repo.
(hotspot-comp) is still the experimental one.
(My repo. is freshly cloned from http://hg.openjdk.java.net/jdk7/jdk7/
hotspot .)

I generated the webrev using this command:
/java/devtools/share/bin/webrev -ON -i 6610906-comments.html

The contents of the "author comments" file looks like this:

For putback to <a href="http://hg.openjdk.java.net/jdk7/hotspot-comp-
gate/hotspot">http://hg.openjdk.java.net/jdk7/hotspot-comp-gate/
hotspot</a>
<br><a href="http://bugs.sun.com/bugdatabase/view_bug.do?
bug_id=6610906">6610906</a>: inexplicable IncompatibleClassChangeError
<br>Summary: dependency check must treat polymorphic interfaces
consistently
<br>Reviewed-by: ??

I'll strip the formatting (which webrev should have put in for me)
and use it as a changeset comment.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20071210/fe5d2808/attachment.html

嗯…C2的calling convention相关

引用

From Steve.Goldman at Sun.COM Mon Jan 14 14:37:10 2008
From: Steve.Goldman at Sun.COM (steve goldman)
Date: Mon, 14 Jan 2008 17:37:10 -0500
Subject: weird loop formation
In-Reply-To: <bcf6f6d40801141403n7361d759q25cf1fc31a5be997@mail.gmail.com>
References: <bcf6f6d40801071129r2db8cfa7i84e82dea0d4a3ccf@mail.gmail.com>
<47840F8C.2@Sun.COM>
<bcf6f6d40801141403n7361d759q25cf1fc31a5be997@mail.gmail.com>
Message-ID: <478BE416.4040905@sun.com>

Ben Cheng wrote:
> Hi,
>
> I have a follow-up question for the problem. After looking at the generated
> code harder it seems to me that there are no callee-saved registers
> described to the compiler. If I read the x86_64.ad file correctly all the
> registers are SOC as the register save type. I tried to convert r12 through
> r15 into SOE as the C convention save type but ran into the following
> assertion in test_gamma:

You must mean convert the Java calling convention to be SOE like the C
calling convention? If so give up. Complete support for callee save
registers was removed from c2 when frameless adapters went in. In order
to get this to work you need to modify some of the deopt code. This is
doable and there is a call to an empty hook method for this in the deopt
path for use on platforms that absolutely must have callee save
registers but it isn't a simple experiment.

--
Steve

From Thomas.Rodriguez at Sun.COM Mon Jan 14 14:48:14 2008
From: Thomas.Rodriguez at Sun.COM (Tom Rodriguez)
Date: Mon, 14 Jan 2008 14:48:14 -0800
Subject: weird loop formation
In-Reply-To: <bcf6f6d40801141403n7361d759q25cf1fc31a5be997@mail.gmail.com>
References: <bcf6f6d40801071129r2db8cfa7i84e82dea0d4a3ccf@mail.gmail.com>
<47840F8C.2@Sun.COM>
<bcf6f6d40801141403n7361d759q25cf1fc31a5be997@mail.gmail.com>
Message-ID: <478BE6AE.8090301@sun.com>

A while back we made the decision not to support callee saved registers
for generated code. The source changes for this occurred in 1.6 as part
of our switch to frameless adapters. There are still some of the hooks
in place which would be needed for it to work but the interpreter
doesn't do any of the saving which would be needed for this work. I
also don't think our stubs have the needed logic either. So in general
arbitrarily switching registers to SOE in an ad file will not work.
RBP/EBP acts as a callee saved register in C2 but it is always saved and
restored it its natural location.

It might be interesting for you to try earlier versions of C2 that
supports callee saved registers and see if the performance is any
different. 1.5 was the last release to supports them. 1.6 b27 was the
last build of 1.6 that supported callee saved registers.

I have blindly poked the register allocator quite a few times and while
it has been instructive I haven't had much success. ;) As chuck said
earlier we know there are some issues with spill placement that
sometimes produce suboptimal code. It's something we want to fix but
it's going to require significant investigation before we have a real
solution. Personally I'd really like to get fix for this sometime this
year as it's creates some significant performance instability in C2.
Innocuous changes can kick the register allocator into bad places which
can make performance analysis painful.

Was your original loop representative of the problems you are seeing or
was it just an oddity your noticed during analysis?

tom

Ben Cheng wrote:
> Hi,
>
> I have a follow-up question for the problem. After looking at the
> generated code harder it seems to me that there are no callee-saved
> registers described to the compiler. If I read the x86_64.ad file
> correctly all the registers are SOC as the register save type. I tried
> to convert r12 through r15 into SOE as the C convention save type but
> ran into the following assertion in test_gamma:
>
> # To suppress the following error report, specify this argument
> # after -XX: or in .hotspotrc: SuppressErrorAt=/nmethod.cpp:1717
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> # Internal Error
> (<dir_home>/hotspot/src/share/vm/code/nmethod.cpp:1717), pid=17191,
> tid=46912512071360
> # Error: guarantee(cont_offset != 0,"unhandled implicit exception in
> compiled code")
> #
> # Java VM: OpenJDK 64-Bit Server VM (12.0-b01-jvmg mixed mode linux-amd64)
>
> I was wondering if there are quick ways to fix it so that I can
> experiment with the JVM behavior when some registers are marked as SOE.
>
> The reason I want to conduct this experiment is because we have an
> in-house benchmark which has both C++ and Java implementations, where
> the C++ version is 25% faster. After looking at the hottest loop in both
> versions I saw less optimal loop formation and 2x more spiils for the
> Java version. I am currently blindly poking the register allocator to
> see if the amount of spills can reduced.
>
> Thanks,
> -Ben
>
>
>
> On Jan 8, 2008 4:04 PM, Chuck Rasbold <Chuck.Rasbold at sun.com
> <mailto:Chuck.Rasbold at sun.com>> wrote:
>
>     Ben -
>
>     Thanks for your comment, and we've shared your concern for a while.
>
>     While I don't have the perspective to give the full historical
>     background, the strategy within C2 has been first to fully populate
>     the Ideal graph with all regions. Loop construction/optimization
>     occurs in the Ideal graph, then after code generation, the CFG
>     is formed and all MachNodes are assigned to basic blocks. This is a
>     little different than other, more traditional compilers that I'm
>     familiar with.
>
>     As for your specific example, we see the path to code improvement in
>     cases like this one in two steps:
>
>     - Teach the register spiller to be more disinclined to placing spills
>     along the back branches. This is part of a bigger effort in the near
>     term to improving C2's spilling decisions.
>
>     - Augment the the dead-block optimizer with a block layout pass in
>     addition to the dead-block and peephole tweeks that you've observed.
>     In the case where spill code is placed along the backbranch, the block
>     layout pass would rotate the loop such that basic blocks that end in
>     an unconditional branch would be moved to the top, eliminating the
>     branch-over on each iteration. Of course, the compiler is likely to
>     generate a branch-over at loop entry in this case, but that is a
>     one-time cost. This work is in progress.
>
>     For example, for loopBad, even with the extra spill, we'd want the
>     code to come out more like this:
>
>          B3: # B4 <- B5 Loop: B3-B5 inner Freq: 31480.9
>            movl    RDX, [rsp + #8] # spill
>
>          B4: #     B7 B5 <- B2 B3 Freq: 31512.2
>            movl    [rsp + #8], RDX # spill
>            movq    RSI, [rsp + #0] # spill
>            xorl    RDX, RDX        # int
>            nop     # 1 bytes pad for loops and calls
>            call,static SimpleLoop::incTripCount
>            # SimpleLoop::loopBad @ bci:10 L[0]=rsp + #0 L[1]=rsp + #8
>      L[2]=_
>     STK[0]=RBP
>            # AllocatedObj(0x0000000040803780)
>
>          B5: #     B6 B3 <- B4 Freq: 31511.6
>            # Block is sole successor of call
>            addl    RBP, RAX        # int
>            cmpl    RBP, [RSP + #8 (32-bit)]
>            jlt,s   B5 P=0.000973 C=133514.000000
>
>
>     -- Chuck
>
>     (We probably should move any further discussion to hotspot-compiler-dev)
>
>     Ben Cheng wrote:
>      > Hi Everyone,
>      >
>      > Happy New Year! In addition I'd like to greet you with a C2
>     question. :-)
>      >
>      > Recently I found a sub-optimal loop produced by the compiler. I can
>      > reproduce the problem with the following simple toy program on
>     X86_64:
>      >
>      > public class SimpleLoop {
>      >
>      >   public SimpleLoop() {
>      >   }
>      >
>      >   private int incTripCount(int v) {
>      >     return 1;
>      >   }
>      >
>      >   void loopGood(int len) {
>      >     for (int i = 0; i < len;) {
>      >       i += incTripCount(i);
>      >     }
>      >   }
>      >
>      >   void loopBad(int len) {
>      >     for (int i = 0; i < len;) {
>      >       i += incTripCount(0);
>      >     }
>      >   }
>      >
>      >   public static void main(String argv[]) {
>      >     SimpleLoop sl = new SimpleLoop();
>      >     for (int i = 0; i < 1024*1024; i++) {
>      >       sl.loopGood(1024);
>      >       sl.loopBad(1024);
>      >     }
>      >   }
>      > }
>      >
>      > The difference between loopGood and loopBad is register pressure,
>     where
>      > loopBad has spilled code but the other doesn't. For simplicity
>     reasons I
>      > have disabled inlining in the command line.
>      >
>      > For loopGood, the inner loop is all good and clean (B4 branches
>     back to
>      > B3 with jlt):
>      >
>      >
>      > 030   B3: #     B6 B4 <- B2 B4 Loop: B3-B4 inner Freq: 754005
>      > 030     movq    RSI, [rsp + #0] # spill
>      > 034     movl    RDX, RBP        # spill
>      > 036     nop     # 1 bytes pad for loops and calls
>      > 037     call,static SimpleLoop::incTripCount
>      >         # SimpleLoop::loopGood @ bci:10 L[0]=rsp + #0 L[1]=rsp + #8
>      > L[2]=_ STK[0]=RBP
>      >         # AllocatedObj(0x0000000040803880)
>      >
>      > 03c
>      > 03c   B4: #     B3 B5 <- B3 Freq: 753990
>      >         # Block is sole successor of call
>      > 03c     addl    RBP, RAX        # int
>      > 03e     cmpl    RBP, [RSP + #8 (32-bit)]
>      > 042     jlt,s   B3 P=0.999024 C=863065.000000
>      >
>      > For loopBad, however, the loop body contains one more block where a
>      > simple jlt is split into an jge and jmp:
>      >
>      > 030   B3: #     B7 B4 <- B2 B5 Loop: B3-B5 inner Freq: 31512.2
>      > 030     movl    [rsp + #8], RDX # spill
>      > 034     movq    RSI, [rsp + #0] # spill
>      > 038     xorl    RDX, RDX        # int
>      > 03a     nop     # 1 bytes pad for loops and calls
>      > 03b     call,static SimpleLoop::incTripCount
>      >         # SimpleLoop::loopBad @ bci:10 L[0]=rsp + #0 L[1]=rsp + #8
>      > L[2]=_ STK[0]=RBP
>      >         # AllocatedObj(0x0000000040803780)
>      >
>      > 040
>      > 040   B4: #     B6 B5 <- B3 Freq: 31511.6
>      >         # Block is sole successor of call
>      > 040     addl    RBP, RAX        # int
>      > 042     cmpl    RBP, [RSP + #8 (32-bit)]
>      > 046     jge,s   B6 P=0.000973 C=133514.000000
>      > 046
>      > 048   B5: #     B3 <- B4 Freq: 31480.9
>      > 048     movl    RDX, [rsp + #8] # spill
>      > 04c     jmp,s   B3
>      >
>      > 04e   B6: #     N70 <- B4 B1 Freq: 30.6849
>      > 04e     addq    rsp, 80 # Destroy frame
>      >         popq    rbp
>      >         testl   rax, [rip + #offset_to_poll_page]       # Safepoint:
>      > poll for GC
>      >
>      > 059     ret
>      >
>      > I traced the compiler internals a bit and it seems that the
>     problem is
>      > caused by poor interaction between loop construction and register
>      > allocation. In the loopGood case, B5 is also created at the first
>     place.
>      > Since there is no spilling the dead block optimizer is able to
>     coalesce
>      > B4/B5 into a single one. However, for loopBad the instruction at
>     pseudo
>      > PC 048 is the result of refilling value into RDX. Its existence
>     makes B5
>      > non-dead thus cannot be merged with B4.
>      >
>      > It seems to me that when the loop is constructed it should avoid
>     using a
>      > branch-over followed by a unconditional branch to begin with. In that
>      > way even with spilled code the loop will still look natural and
>     won't
>      > use two branches to loop back. Are there any historical reasons
>     that it
>      > is more beneficial to form the loop this way? If not, I think we
>     want to
>      > fix it to save a couple cycles for each loop.
>      >
>      > Thanks,
>      > -Ben
>
>

IBM developerWorks: Thanks for the memory - Understanding how the JVM uses native memory on Windows and Linux, 2009

Tip: Secure your code against the finalizer vulnerability

RednaxelaFX 2011-05-24

最近好多新消息没发在这回帖里…订阅了这个圈子的大家可能没留意到其实这整栋楼的内容都经常会变，有很多链接和引用都很有趣的哦

anyway，新开个回帖。

第一个旧新闻就是JRockit的许可证调整为与原本Sun JDK的许可证一样了。这意味着在生产环境使用JRockit也是免费的。
线上使用JRockit Mission Control仍然是付费项目。但许可证可以单独购买了，不需要跟WebLogic之类的Oracle产品捆绑购买。

第二个是这两天JDK 6 update 27 build 01的early access版发了。下载链接在这里：http://jdk6.java.net/6uNea.html / http://jdk6.java.net/download.html/
懒得自己build出fastdebug版JDK、但以前又没来得及下载的同学们，now's the time，下载这个版本吧。

第三个是大家可能都玩了的JsLinux，http://bellard.org/jslinux/
虽然这玩儿本身不是高级语言虚拟机，但它说明了现在的JavaScript引擎速度已经非常好了，用来做这种相对来说计算密集的东西都OK。这点相当有趣。

第四个是上个月Ian Rogers做的一个演讲，内容很硬核很不错：
Learn about JVM internals - what does the JVM do?
PPT在Azul的官网或许能下载到吧：http://www.azulsystems.com/resources/presentations

然后是一个日文的分享，讲解Dalvik GC的：
dalvikのGCをのごいてみた
http://www.slideshare.net/akachochin/dalvikvm/

VisualVM已经可以用来监控JDK7里G1的堆了。需要JDK7b141或以上版本。
http://bugs.sun.com/view_bug.do?bug_id=6946417

一本讲解HotSpot VM里G1GC的书：徹底解剖「G1GC」アルゴリズム編

OpenJDK下，JDK8项目已经正式启动。地址在http://openjdk.java.net/projects/jdk8/

7028071: Add two attributes to the OperatingSystemMXBean to provide CPU Load info
Summary: Add getProcessCpuLoad() and getSystemCpuLoad() to the OperatingSystemMXBean
http://hg.openjdk.java.net/jdk7/hotspot/jdk/rev/28269923b747

6941923: RFE: Handling large log files produced by long running Java Applications
Summary: supply optinal flags to realize gc log rotation
http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/2a241e764894

iOS 5 for Developers

引用

Automatic Reference Counting
Automatic Reference Counting (ARC) for Objective-C makes memory management the job of the compiler. By enabling ARC with the new Apple LLVM compiler, you will never need to type retain or release again, dramatically simplifying the development process, while reducing crashes and memory leaks. The compiler has a complete understanding of your objects, and releases each object the instant it is no longer used, so apps run as fast as ever, with predictable, smooth performance.

原来iOS 5里的GC是准备用refcount方式来实现。但是桌面的Objective-C运行时里的GC是mark-and-sweep（有分代，但没有compaction）吧？这个取舍还是让人困惑，多半是为了low pause而舍弃throughput了。
http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/GarbageCollection/Articles/gcEssentials.html#//apple_ref/doc/uid/TP40002452-SW1
http://clang.llvm.org/docs/AutomaticReferenceCounting.html

原来5月份的时候Maxine的顶层目录名全部改了一遍，难怪我找不到地儿了
http://kenai.com/projects/maxine/lists/users/archive/2011-05/message/0

引用

From: Douglas Simon <doug.simon@oracle.com>
To: users@maxine.kenai.com
Subject: Recent changes to Maxine code base
Date: Thu, 12 May 2011 10:24:56 +0200

1. Projects names are changing to an OSGi-like format. For example, 'T1X' is
now 'com.oracle.max.t1x'. This is part of an ongoing effort to pull apart
existing projects into finer grained projects. Having a more qualified naming
scheme helps this. An impact of this is that any time a project name is
changed or a new project is added, you will need to (re)import the projects
in Eclipse. This is done via File -> Import... -> Existing Projects into
Workspace. Or if you are using the MercurialEclipse plugin, then File ->
Import -> Projects from Local Mercurial Repository.

2. There is a new 'max eclipseprojects' command that generates/updates the
Eclipse configuration files for each of the projects. This ensures that all
projects have the same compiler and Checkstyle settings. To have Eclipse
recognize any such changes, you simply need to select all the projects and
refresh whenever you pull from the Kenai repo (as you would anyway to get the
file changes).

-Doug

地獄のGC本スピンオフ
http://www.slideshare.net/authorNari/gc-8440412

Precise garbage collection for c
http://www.slideshare.net/miura1729/precise-garbagecollection-forc

Pick up performance with generational garbage collection
Use the appropriate Java HotSpot VM 1.3.1 parameters to improve throughput
By Ken Gottry, JavaWorld.com, 01/11/02
http://www.javaworld.com/javaworld/jw-01-2002/jw-0111-hotspotgc.html

引用

Post-HotSpot JVM
The Exact VM (JVM 1.2.2) introduced exact garbage collection. Sun then improved the exact GC design in JVM 1.3 and renamed it generational GC.

...

Three types of collection algorithms
The HotSpot JVM provides three GC algorithms, each tuned for a specific type of collection within a specific generation. The copy (also known as scavenge) collection quickly cleans up short-lived objects in the new generation heap. The mark-compact algorithm employs a slower, more robust technique to collect longer-lived objects in the old generation heap. The incremental algorithm attempts to improve old generation collection by performing robust GC while minimizing pauses.

Copy/scavenge collection

Using the copy algorithm, the JVM reclaims most objects in the new generation object space (also known as eden) simply by making small scavenges -- a Java term for collecting and removing refuse. Longer-lived objects are ultimately copied, or tenured, into the old object space.

Mark-compact collection

As more objects become tenured, the old object space begins to reach maximum occupancy. The mark-compact algorithm, used to collect objects in the old object space, has different requirements than the copy collection algorithm used in the new object space.

The mark-compact algorithm first scans all objects, marking all reachable objects. It then compacts all remaining gaps of dead objects. The mark-compact algorithm occupies more time than the copy collection algorithm; however, it requires less memory and eliminates memory fragmentation.

Incremental (train) collection

The new generation copy/scavenge and the old generation mark-compact algorithms can't eliminate all JVM pauses. Such pauses are proportional to the number of live objects. To address the need for pauseless GC, the HotSpot JVM also offers incremental, or train, collection.

Incremental collection breaks up old object collection pauses into many tiny pauses even with large object areas. Instead of just a new and an old generation, this algorithm has a middle generation comprising many small spaces. There is some overhead associated with incremental collection; you might see as much as a 10-percent speed degradation.

The -Xincgc and -Xnoincgc parameters control how you use incremental collection. The next release of HotSpot JVM, version 1.4, will attempt continuous, pauseless GC that will probably be a variation of the incremental algorithm. I won't discuss incremental collection since it will soon change.

JDK 1.4的时候HotSpot VM就换上CMS了…再见，火车算法

Velocity 2011:
David Mandelin: Know Your Engines: How to Make Your JavaScript Fast

OSCON 2011
JVM for Dummies - OSCON 2011

InfoQ
Java without the GC Pauses: Keeping Up with Moore’s Law and Living in a Virtualized World

Memory Models: A Case for Rethinking Parallel Languages and Hardware - Solving the memory model problem will require an ambitious and cross-disciplinary research direction.

http://sockali.net/openjdk/

引用

This site is all about the changes I made to the OpenJDK source based on Azul Systems Memory Management Module published at http://www.managedruntime.org/.

http://makelinux.net/kernel_map/

Garbage collection in WebSphere Application Server V8, Part 2: Balanced garbage collection as a new option

ijuma Ismael Juma
Loop unrolling in HotSpot seems to be a bit of a black art, so test your loop-heavy code against multiple JVM versions http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7054211

CGO 2010: Automated just-in-time compiler tuning

顺便抓一个老文笔记：
Odds & Ends

Cliff Click 写道

More C2 Goodies
John Cavazos wrote:
… One of the things I am currently looking at is determining the right phase-ordering of optimizations applied to a particular program being optimized. I have some nice (un-published) results for JikesRVM,
but it would be nice to replicate the research for HotSpot…

I have strong arguments for nearly all orderings of our passes, so I’m curious to know about your phase-ordering results.
The only obvious transform we might be missing is PRE, but our peephole opts pretty nearly subsumes PRE (they are not mathematically equivalent – I can write programs where either PRE or the peephole stuff makes progress against the other). In practice, PRE will find essentially nothing once the peephole opts are done. You’ll notice that we do the peephole pass alot; it’s totally incremental and provably linear bounded. In other words, if there’s nothing to be gained then there’s no cost in trying. The peephole pass includes amongst other things all pessimistic forms of copy propagation, value equivalence, constant propagation (especially the null/not-null property), constant test folding, repeated test folding, dead code elimination, load-after-store opts (aliasing analysis is included for free during building of SSA form), algebraic properties, and lots more.
For HotSpot, the optimization ordering is:
Build an inline tree, including class hierarchy analysis. This is the one pass I’d be willing to move, as it happens too early.
(a single unified pass:) parse bytecodes, inline, build SSA form (the IR remains in SSA form always), do peephole opts over SSA form. This pass typically costs 40% of compile-time budget.
Fixed-point the peephole opts over SSA form, once all backedges are known after parsing.
Loop opts round 1: “beautify loops” (force polite nesting of ill-structured loops), build a loop-tree & dominator tree, split-if (zipper-peel CFG diamonds with common tests, plus some local cloning where I can prove progress), peel loops (required to remove loop-invariant null checks)
Fixed-point the peephole opts over SSA form
Loop opts round 2: “beautify loops” (force polite nesting of ill-structured loops), build a loop-tree & dominator tree, lock coarsening, split-if & peel – but if these don’t trigger because there’s nothing to gain, the do iteration-splitting for range-check-elimination & a 1st loop unrolling.
Fixed-point the peephole opts over SSA form
Conditional Constant Propagation (the optimistic kind, instead of the pessimistic kind done by the peephole pass)
Iterate loop (split-if, peel, lock coarsen – but these typically never trigger again and take very little time to check), unrolling & peephole passes, until loops are unrolled “enough”. On last pass, insert prefetches. Typically this iterates once or twice, unless this is a microbenchmark and then unrolling might happen 8 or 16 times.
Remove tail-duplication, and a bunch of other minor code-shaping optimizations e.g. absorb constant inputs into deoptimization-info in calls, or commuting Add ops so that 2-address machines can do update-in-place.
Convert “ideal” IR into machine code IR.
Build a real CFG for the 1st time, including a dominator tree, loop tree. Populate with frequencies from earlier profiling.
Global latency-aware (loop-structure-aware) scheduling.
Replace null-checks with memory references where appropriate.
Register allocate. Internally this has many passes. This pass typically costs 40% of compile-time budget.
Sort basic blocks to get good control flow ordering (forward branches predicted not-taken, backwards predicted taken, etc)
Some last-minute peephole opts.
Emit code into a buffer, including OOP-maps & deoptimization info.

http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2011-October/002459.html

David Holmes 写道

--- old/src/share/vm/runtime/vm_version.cpp Mon Oct 10 20:36:41 2011
+++ new/src/share/vm/runtime/vm_version.cpp Mon Oct 10 20:36:39 2011
@@ -128,7 +128,7 @@
#ifndef HOTSPOT_VM_DISTRO
#error HOTSPOT_VM_DISTRO must be defined
#endif
-#define VMNAME HOTSPOT_VM_DISTRO " " VMLP VMTYPE " VM"
+#define VMNAME HOTSPOT_VM_DISTRO " " VMLP EMBEDDED_ONLY("Embedded ")
VMTYPE " VM"

This changes the value of the java.vm.name system property and so has to
be approved by the CCC process. Once that is done it will be pushed to
hsx23.

原来改VM名字要经过CCC流程的。嗯这个要注意。

OpenJDK patch cutting down GC pause duration up to 8 times

------------------

.NET 4.5里的32位CLR仍然在用out-of-line stub来做各种事情，像对象创建、write barrier之类。
https://gist.github.com/2967637

mercyblitz 2011-05-24

RednaxelaFX 写道

最近好多新消息没发在这回帖里…订阅了这个圈子的大家可能没留意到其实这整栋楼的内容都经常会变，有很多链接和引用都很有趣的哦

昨天刚刚看到的，不过没有仔细看里面的SUMMARY CHANGE
http://www.java.net/download/jdk6/6u27/promoted/b01/changes/JDK6u27.list.html

RednaxelaFX 2011-10-11

JavaOne 2011资料的下载点：https://oracleus.wingateweb.com/scheduler/eventcatalog/eventCatalogJavaOne.do
录音/录像：http://www.parleys.com/#st=4&id=102979
The full strategy keynote is now available online:
http://medianetwork.oracle.com/video/player/1176404813001

http://gotocon.com/amsterdam-2011/schedule/index.jsp

http://t.co/4DQyISUK

HS23得到>=7u4或8才可以用；它将无法在7 GA或7u2中使用。这真糟糕，意味着同样HS23也无法用在6里了。
http://mail.openjdk.java.net/pipermail/mlvm-dev/2011-October/004043.html

John Duimovich on IBM & OpenJDK at JavaOne 2011
slides
video

发现耶鲁大学有个课程的资料里有放JDK5的源码出来，在http://zoo.cs.yale.edu/classes/cs433/programming/examples-java-socket/jdk1.5/
注意这是JRL许可证的。

另外，考古一下。Cliff Click在Rice读研时写过一些抽象数据结构的实现。有趣的是，考古考出了CMU网站上他的老作品，
utils/port.h：（原始链接）http://webcache.googleusercontent.com/search?q=cache:DTe-FyW3qxoJ:www.cs.cmu.edu/afs/cs.cmu.edu/project/cmcl/link.iwarp/ccom/afs/ccom/member/OldFiles/lcc/ali/cliffc/utils/port.h+&cd=2&hl=en&ct=clnk
opto/loop.h: （原始链接）http://webcache.googleusercontent.com/search?q=cache:-0Hng3JEp3YJ:www-2.cs.cmu.edu/afs/cs.cmu.edu/project/cmcl/OldFiles/OldFiles/link.iwarp/member/lcc/ali/cliffc/opto/loop.h+&cd=1&hl=en&ct=clnk
比对内容，可以发现这个utils目录里的东西与HotSpot VM的libadt目录的血缘，以及opto目录与HotSpot VM的opto目录的血缘。

RednaxelaFX 2012-01-03

JavaOne 2011我主要关注的几个session的录像

整体列表：
http://www.parleys.com/#st=4&id=102979

（先占位，回头再复制个体链接；没鼠标操作不方便orz）

RednaxelaFX 2013-02-25

好久没回过这帖了，加个马克：
http://h10025.www1.hp.com/ewfrf/wc/document?cc=us&lc=en&dlc=en&docname=c03666252
所以说有符号信息的stack trace能暴露出很多信息。这HP的HotSpot C3 compiler就大剌剌的出现在这里了

http://shuklan.com/haskell/lec03.html#/0/7

提到了Python解释器里的computed goto
http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables/

http://blogs.msdn.com/b/dotnet/archive/2013/05/01/net-crash-dump-and-live-process-inspection.aspx
.NET版Serviceability Agent的感觉…

http://wingolog.org/archives/2012/06/27/inside-javascriptcores-low-level-interpreter#bf8e6cfb6c54396f66d75c475676b94b3b65cea7

John Yates 写道

Having programmed for 40+ years I have written a number of interpreters...

How big did you say that L1 I-cache was? 64KB? 32KB? One time I had to fit the heart of a high performance x86 interpreter into 8KB :-) A total of 2K 4-byte RISC instructions. It was programmed entirely in hand scheduled Alpha assembly language. Due to a quirk of corporate history and vocabulary we called it an emulator:

http://www.hpl.hp.com/hpjournal/dtj/vol9num1/vol9num1art1.pdf

Interpreting the variable length x86 instruction set presents some interesting challenges. My main loop wrapped a dispatch to 8 byte aligned "case labels". The Alpha had enough integer registers to allow me to map the eight x86 integer registers, the PC, an 8 byte sliding instruction window, and a bunch of interpreter state permanently into registers.

I maintained the 8 bytes between PC and PC+7 in an Alpha register. The leading 2 bytes were used to fetch a 4 byte dispatch address from a table of 64K entries. Because my dispatch targets were 8 byte aligned (better use of Alpha dual instruction issue) I used the low order 3 bits to encode the instruction length. (Although x86 instructions can be up to 15 bytes the overwhelming number of executed instructions are 1 to 8 bytes long.)

I prefetched the instruction window refill data and the dispatch pointer for the next sequential instruction as part of the main dispatch. This approach overlapped those two memory accesses with the interpreter's computed transfer of control and instruction execution. Unless the current instruction altered sequential flow by the time I got back to the top of the loop I already had the new dispatch address, the length of the new instruction and data to replenish the instruction window. (I was actually part of the Alpha chip team at the time so it should come as no surprise that this pipelined approach to instruction execution is exactly what hardware CPUs do.)

The other crucial technique was "tail merging" jumping to common continuations. I often describe a interpreter as a big river with many, many tributaries. The dispatch take you to the head of some tributary that then merges with some other tributaries which in turn merge with some others... Ultimately one reach the main river for final bookkeeping and a new dispatch.

http://dl.acm.org/citation.cfm?id=231402
Efficient and language-independent mobile programs
作者有Steven Lucco。论文写的是OmniVM。公司Colusa被微软收购。后来Steven架构了Chakra。

野史：
http://c2.com/cgi/wiki?HistoryOfCsharp

引用

Here's an incomplete history of the CsharpLanguage (C#) and the MicrosoftDotNet (.NET) CommonLanguageRuntime (CLR) that should help address some of the claims that C#/.NET is a direct Java knock-off. The results may be similar, but the paths taken to get there are definitely not the same.
Nonetheless, in a strategic business sense, CsharpLanguage is Microsoft's answer to Java, and that aspect should be mentioned here as well.
Note to the original author - you might consider just fixing this text rather than ammending the corrections at the bottom
The .NET CLR is based on Colusa software's language-neutral OmniVM, a part of their Omniware product.
OmniVM was based on research carried out by Steven Lucco at Carnegie Mellon University. Steven co-founded Colusa Software in February 1994 in Berkeley, California. Omniware was released in August 1995. Colusa started working with Microsoft in February 1996. Microsoft acquired Colusa Software on March 12, 1996. Steven is currently a senior researcher at the Microsoft Bay Area Research Center.
OmniVM was appealing to Microsoft because Colusa had already created Visual Basic and C/C++ development environments for the VM. The VM was also claimed to be capable of running Java.
Microsoft took to calling the VM by the name of CVM, presumably for Colusa Virtual Machine. Or perhaps this is where the code name Cool came into being. Other names used at Microsoft include Universal Virtual Machine (UVM), and Intermediate Language (IL).
Perhaps the IL referred to the Common Executable Format. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnce30/html/cef2.asp
No, IL refers to Microsoft Intermediate Language (MSIL).
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpconjitcompilation.asp

Microsoft denies rumors that C# went by the code name Cool, however there have been many apparent slip-ups by Microsoft employees where the word Cool is used in code snippets, file paths, etc.

Advanced Models and Programs - Bytecode and real machine code
Peter Sestoft, Wed 2011-02-23* and Mon 2011-02-28
http://www.itu.dk/people/sestoft/sasp/bytecode-machinecode.pdf

Virtual Machine Bytecode Translation: From The .Net CLI To Parrot
BA in Computer Science, Emmanuel College
May 16, 2006
http://www.jnthn.net/papers/2006-cam-net2pir-dissertation.pdf

System and method for creating target byte code
US 7707547 B2
http://www.google.com/patents/US7707547
卧槽，从一种bytecode翻译为另外一种bytecode也有专利…

C++Now 2013 Presentations
https://github.com/boostcon/cppnow_presentations_2013
Julian Storer: The Projucer: Live coding with C++ and the LLVM JIT engine
http://www.rawmaterialsoftware.com/download/Projucer%20Demo.mp4*bbs* http://de.reddit.com/r/programming/comments/1195xe/live_coding_in_c_with_juce_llvmjit/

A Modular Approach to On-Stack Replacement in LLVM
http://labs.vmware.com/vee2013/docs/p143.pdf
VEE2013

Stanford CS143 compilers
http://www.stanford.edu/class/archive/cs/cs143/cs143.1128/

Klein
https://github.com/AdamSpitz/klein
https://github.com/AdamSpitz/klein/blob/master/doc/AspectsOfKlein.txt
http://kleinvm.sourceforge.net/
http://blog.selflanguage.org/2009/07/04/klein-and-squeak-vm/
http://blog.selflanguage.org/tag/klein/
Constructing a metacircular Virtual machine in an exploratory programming environment
Ramblings about Klein | Self, 2009

Self
Old Self Home Page, UCSB
Type feedback vs. concrete type inference: A comparison of optimization techniques for object-oriented languages (1995), Ole Agesen
The Cartesian Product Algorithm - Simple and Precise Type Inference of Parametric Polymorphism (1995), Ole Agesen
Optimizing Dynamically-Dispatched Calls with Run-Time Type Feedback
An Efficient Implementation of Self, a Dynamically-Typed Object-Oriented Language Based on Prototypes (1991), Craig Chambers , David Ungar , Elgin Lee
Type Inference of SELF: Analysis of Objects with Dynamic and Multiple Inheritance (1993), Ole Agesen , Jens Palsberg , Michael I. Schwartzbach
A third-generation Self implementation: Reconciling responsiveness with performance (1994), Urs Hölzle , David Ungar
Optimizing dynamically-typed object-oriented languages with polymorphic inline caches (1991), Craig Chambers , David Ungar , Elgin Lee
Reconciling Responsiveness with Performance in Pure Object-Oriented Languages, (slides) Urs Hölzle, David Ungar
ADAPTIVE OPTIMIZATION FOR SELF: RECONCILING HIGH PERFORMANCE WITH EXPLORATORY PROGRAMMING, 1994, Urs Hölzle
Programming as an Experience: The Inspiration for Self, 1995, Randall B. Smith , All B. Smith , David Ungar
Parents are Shared Parts of Objects: Inheritance and Encapsulation in SELF, 1991, CRAIG CHAMBERS, DAVID UNGAR, BAY-WEI CHANG, URS HÖLZLE
Making Pure Object-Oriented Languages Practical, 1991, Craig Chambers , David Ungar, OOPSLA 91
Do Object-Oriented Languages Need Special Hardware Support?, 1995, Urs Hölzle, David Ungar
The Design and Implementation of the SELF Compiler, an Optimizing Compiler for Object-Oriented Programming Languages, (1992), Craig Chambers , John Hennessy , Mark Linton

Object Storage and Inheritance for SELF, a Prototype-Based Object-Oriented Programming Language. Engineer’s thesis (1988)
Efficient implementation of the smalltalk-80 system (1984)

http://dl.acm.org/ft_gateway.cfm?id=1238853&type=mov&coll=DL&dl=ACM&CFID=228518743&CFTOKEN=78973405

Optimization of Object-Oriented Programs using Static Class Hierarchy Analysis (1995), Jeffrey Dean , David Grove , Craig Chambers
Using JavaScript as a Real Programming Language (2007), Tommi Mikkonen , Antero Taivalsaari , Tommi Mikkonen , Antero Taivalsaari

Efficient Implementation of the Smalltalk-80 System, 1984, Peter Deutsch, Allan Schiman

Dynamic Optimistic Interprocedural Analysis: a Framework and an Application, 2001
Igor Pechtchanski, Vivek Sarkar

Rambles around computer science, 2012
Maxine VM, Klein VM, etc
Implementing Pinocchio: a VM-less metacircular runtime library for dynamic languages, 2011, Olivier Fluckiger
学士论文怎么可以这么强的

The evolution of the Erlang VM, Joe Armstrong, Robert Virding
ErLLVM, Kostis Sagonas, Chris Stavrakakis and Yiannis Tsiouris

An Approach to the Problem of Detranslation of Computer Programs, R. N. Horspool, N. Marovac

2012-01-17: Avik Chaudhuri
The V8 Myth: Why JavaScript is not a Worthy Competitor

2012-01-17: Avik Chaudhuri
Type Inference for ActionScript

The Ins and Outs of Gradual Type Inference, Aseem Rastogi, Avik Chaudhuri, Basil Hosmer
Gradual Type Inference in ActionScript 3

Region-Based Compilation,1990,1993,1996

2006-09-17: Abdulaziz Ghuloum
An Incremental Approach to Compiler Construction
讲Ikarus Scheme的实现的

http://michaelrbernste.in/2013/06/03/real-time-garbage-collection-is-real.html

http://objectlayout.github.io/ObjectLayout/

http://try.ocamlpro.com/

2012-05-09: Pete de Zwart
CodeCache is full. Compiler has been disabled.

http://channel9.msdn.com/coding4fun/blog/Learning-how-to-write-a-3D-soft-engine-from-scratch-in-C-TypeScript-or-JavaScript

http://channel9.msdn.com/Blogs/Charles/Anders-Hejlsberg-Steve-Lucco-and-Luke-Hoban-Inside-TypeScript-09

http://channel9.msdn.com/Series/Windows-Store-apps-for-Absolute-Beginners-with-JavaScript

http://channel9.msdn.com/posts/YOW-2012-Brian-Beckman-and-Michael-Ficarra-CoffeeScript
no code or demo

http://channel9.msdn.com/Blogs/Charles/Erik-Meijer-Dave-Thomas-and-Pratap-Lakshman-Perspectives-on-JavaScript

http://channel9.msdn.com/Blogs/Charles/SPLASH-2011-Brendan-Eich-JavaScript-Today-and-Tomorrow
http://channel9.msdn.com/Shows/Going+Deep/Expert-to-Expert-Web-Programming-with-Flapjax

DLS2009
http://dl.acm.org/citation.cfm?id=1640134&coll=DL&dl=ACM&CFID=228518743&CFTOKEN=78973405

DLS2010
http://dl.acm.org/citation.cfm?id=1869631&coll=DL&dl=ACM&CFID=228518743&CFTOKEN=78973405

DLS2011
http://dl.acm.org/citation.cfm?id=2047849&coll=DL&dl=ACM&CFID=228518743&CFTOKEN=78973405

DLS2012
http://dl.acm.org/citation.cfm?id=2384577&coll=DL&dl=ACM&CFID=228518743&CFTOKEN=78973405

518
11th Compiler-Driven Performance Workshop
Ondrej Lhotak, Jose Nelson Amaral, Kit Barton, Martin Hirzel, J. Gregory Steffan
November 2012
CASCON '12: Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Publisher: IBM Corp.
Full text available: PDF (99.34 KB)
Bibliometrics: Downloads (6 Weeks): 8, Downloads (12 Months): 22, Downloads (Overall): 22, Citation Count: 0

As computer hardware continues to dramatically improve in transistor density and raw capability, the importance of compilers for bridging the gap between high-level programming languages and abundant hardware resources has never been greater. In the ...

火影看多了不是：

Konoha: implementing a static scripting language with dynamic behaviors
Kimio Kuramitsu

南京大学2009年本科毕业论文：
http://wenku.it168.com/d_000541345.shtml
Harmony虚拟机核心模块向SPARC的移植

国防科技大学的FT-1500，基于SPARC V9架构

jvm开发笔记1---class文件解析器

Reflecting on the JVM Class File Format

The Class File Lifestyle

2013-07-15: Monica Beckwith
G1: One Garbage Collector To Rule Them All, InfoQ
InfoQ

David Chase果然是牛人。之前的一篇论文：
Dynamic circular work-stealing deque
David Chase, Yossi Lev, 2005
讲Arora, Blumofe, and Plaxton （ABP）啊Chase-Lev啊啥的
所以RMO是relaxed memory ordering的缩写啊。SPARC RMO。听起来就怕怕。

mechanical-sympathy, 2013-07

Gil Tene 写道

I find no use for PrintGCApplicationConcurrentTime, but PrintGCApplicationStoppedTime is probably the only number in GC logs that you should sort-of-semi-believe.

I usually start by telling people that they should not blindly believe anything a JVM claims about stopped times (and that includes Zing), and that everyone should use observation-based measurements to keep the logs "honest". This is [partly] what I built jHiccup for. If the GC (and other pause time) logs seem to correlate with your observation-based numbers, then the report is probably fairly accurate. But if/when they don't, believe the observations and not the "I think this is how long I was doing something while I had your application paused" point of view of the JVM.

As to what PrintGCApplicationStoppedTime means: It is supposed to report the time the JVM spent in a stop-the-world safepoint of any kind. In Zing we work hard to make this number encompass the entire safepoint time, end-to-end, inclusively. Specifically, this means the time from when the very first thread was asked to get to a safepoint and stay there, until the very last thread was allowed to leave the safepoint. I believe that OpenJDK and Oracle HotSpot do a decent job of reporting with the same rules for this specific metric.

However, when it comes to other "stop time" reports, you will see potentially huge gaps in how "honest" JVMs are. Our GC times in Zing are as inclusive as PrintGCApplicationStoppedTime, but some others (e.g. the Oracle HotSpot and OpenJDK GCs) often report only the time the GC actually did work while at a safepoint, without including the time it took to get to the safepoint or the time it took to leave it (during which the GC was stalled waiting for all the mutators to arrive at the safepoint).

This may seem like nit-picking, until you realize that in many cases Time To SafePoint (a term we use at Azul so much we abbreviate it as TTSP) can dominate actual observed pause times when those drop into the handful of milliseconds. Note that this is not a "usually", "mostly", or "on the average" statement. It's a "sometimes" statement. While most of your GC-reported pause times may tend to come very close to the associated PrintGCApplicationStoppedTime output, "every once in a while" you can see a huge gap. Those are times when GC dramatically under-reported pause times. It's typically a situation where all threads but one are stalled, waiting for a laggard thread to arrive at the safepoint.

There are many causes for long TTSP in JVMs, and I consider TTSP to be "your next worst enemy" once GC pause work has been eliminated or addressed (as it is with Zing). At Azul, we've spent the past 5+ years beating TTSP times down in the Zing JVM, so we know a lot about them. The simplest example I can give is an array copy. Array copies (in regular HotSpot) do not have a safepoint in the middle. As a copy of a 200MB array represent a period of ~50-300msec (depending on HW) where a global, JVM-wide safepoint cannot be completed. If you are unlucky enough to need to get to a safepoint while one of those is going on (e.g. when GC is triggered, or if you load a class, or if a de-optimization happens), the TTSP can take end up being 10-100x as big as the GC work itself, with practically all your application seeing that length as an observed pause. In such a case, guess what the GC will report, and what PrintGCApplicationStoppedTime will report?

...

新屋良磨 - 世界最速の正規表現JITエンジンの実装
http://sssslide.com/www.slideshare.net/sinya8282/ss-21538659

http://draenog.blogspot.com/2008/04/jamvm-road-to-nowhere.html‎

http://channel9.msdn.com/posts/Shawn-Farkas-WinRT-and-the-CLR
以后会有automatic NGEN。不知道有多通用呢？是只对Windows Runtime Metadata做interop用的NGEN还是完全通用，hmm。

https://archive.fosdem.org/2013/schedule/event/android_292/attachments/slides/207/export/events/attachments/android_292/slides/207/2013_02_03_Android_292.pdf

Comparing Linux/Arm JVMs Revisited
By Jim Connors on Sep 17, 2013
https://blogs.oracle.com/jtc/entry/comparing_arm_linux_jvms_revisited

evasnowind 2013-02-26

自由讨论：
我想问有关JVM的问题，我目前需要将一种中间语言（一种基于SECD抽象机的扩展语言）翻译到Java字节码，该语言中有一种foreachAt操作，用于循环遍历，并且会维护一个计数器用于指示当前遍历的位置（相当于for(i=0;i<n;i++)中的那个i），问题是：从执行速度考虑，这个计数器放到哪里比较好？是正在执行该方法的栈帧中的局部变量区（local variable table）还是说直接在栈帧的操作数栈上？

我的想法是将计数器放到局部变量区，每次计数器需要+1时load->加1->store（或者直接调用iinc指令）。

但我的导师给我的建议则是将计数器放到栈上，因为导师认为JVM栈应该是用硬件栈来实现的，速度上会快一些。但如果将计数器放到栈上，会导致给计数器+1的操作很麻烦，而且翻译出的java虚拟机指令会多一些。

上面两种方案，楼主的意见是啥？我不太清楚JVM栈是否是利用硬件提供的栈来实现的（就是Oracle提供的Hotspot虚拟机），看JVM规范（最新的第7版规范）中文版（地址：http://icyfenix.iteye.com/blog/1256329）时看到过这样一句：“译者注：请读者注意避免混淆Stack、Heap和Java （VM）Stack、Java Heap的概念，Java虚拟机的实现本身是由其他语言编写的应用程序，在Java语言程序的角度上看分配在Java Stack中的数据，而在实现虚拟机的程序角度上看则可以是分配在Heap之中。”——这就是说，JVM栈实际是用堆来实现的？硬件有现成的栈不充分利用，似乎又有点说不过去，还请楼主答疑解惑一下，。非常感谢。

发表回复

« 上一页 1 2 3 4 5 6 7 下一页 »

>>返回群组首页

[讨论] 自由讨论帖/建议帖/资料补充帖/马克帖（谢绝纯水）

相关讨论

相关资源推荐