How do JIT interpreters handle variable names?

Let’s say I am to design a JIT interpreter that translates IL or bytecode to executable instructions at runtime. Every time a variable name is encountered in the code, the JIT interpreter has to translate that into the respective memory address, right?

What technique do JIT interpreters use in order to resolve variable references in a performant enough manner? Do they use hashing, are the variables compiled to addresses ahead of time, or am I missing something altogether?

7

Have a look at this example from Wikipedia:

for (int i = 2; i < 1000; i++) {
    for (int j = 2; j < i; j++) {
        if (i % j == 0)
            continue outer;
    }
    System.out.println (i);
}

which roughly translates into the following byte code:

0:   iconst_2
1:   istore_1
2:   iload_1
3:   sipush          1000
6:   if_icmpge       44
9:   iconst_2
10:  istore_2
11:  iload_2
12:  iload_1
13:  if_icmpge       31
16:  iload_1
17:  iload_2
18:  irem
19:  ifne            25
22:  goto            38
25:  iinc            2, 1
28:  goto            11
31:  getstatic       #84;           // Field java/lang/System.out:Ljava/io/PrintStream;
34:  iload_1
35:  invokevirtual   #85;           // Method java/io/PrintStream.println:(I)V
38:  iinc            1, 1
41:  goto            2
44:  return

Note that it reads very much like assembly language, where variables are stored at local addresses, and referred to directly by their address. There is no trace of the original variable names.

To find out how Java bytecode works in excruciating detail, you can consult Oracle’s documentation.

Further Reading
The Java® Virtual Machine Specification.

5

Variables are mostly known at parsing time and their binding and scope is relevant for parsing. JIT compiling libraries don’t really handle variables (and don’t care much about their name, type and perhaps scope).

  • libjit handle values (which includes formals & locals, as locations).
  • GNU lightning deals with virtual registers in its instruction set
  • asmjit is tied to x86-64 and deals with registers and stack frame locations
  • GCCJIT deals with lvalues (including formals & locals, as locations)
  • LLVM internal language is mostly SSA

The main point is that a JIT would deal with “locations” or “values” not with “variables”. So your bytecode won’t know about “variables” (except perhaps thru debugging related meta-data).

If you are designing a JIT (and not designing and implementing your programming language) you should think in terms of locations and values, and not of variables. Perhaps you should think in terms of formal semantics (look at SECD for example of an abstract VM).

If you know some Scheme or Lisp, I recommend reading Queinnec’s Lisp In Small Pïeces book. It is dealing with the many ways to implement Lisp like languages, including thru bytecode.

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *