Figure 11.3
Objects handling. Since memory allocations are run-time dependent, all the shown addresses are arbitrary examples.
However, the compiler’s job is not done yet. Since the language allows different methods in different classes to have the same name, the compiler must ensure that the right method is applied to the right object. Further, due to the possibility of method overriding in a subclass, compilers of object-oriented languages must do this determination at run-time. When run-time typing is out of the picture, for example, in languages like Jack, this determination can be done at compile-time. Specifically, in each method call like x . m(y), the compiler must ensure that the called method m() belongs to the class from which the x object was derived.
11.1.2 Commands Translation
We now describe how high-level commands are translated into the target language. Since we have already discussed the handling of variables, objects, and arrays, there are only two more issues to consider: expression evaluation and flow control.
Evaluating Expressions
How should we generate code for evaluating high-level expressions like x+g(2,y,-z)*5? First, we must “understand” the syntactic structure of the expression, for example, convert it into a parse tree like the one depicted in figure 11.4. This parsing was already handled by the syntax analyzer described in chapter 10. Next, as seen in the figure, we can traverse the parse tree and generate from it the equivalent VM code.
The choice of the code generation algorithm depends on the target language into which we are translating. For a stack-based target platform, we simply need to print the tree in postfix notation, also known as Right Polish Notation (RPN). In RPN syntax, an operation like f(x, y) is expressed as x, y, f (or, in the VM language syntax, push x, push y, call f). Likewise, an operation like x + y, which is +(x, y) in prefix notation, is stated as x, y, + (i.e., push x, push y, add). The strategy for translating expressions into stack-based VM code is straightforward and is based on recursive post-order traversal of the underlying parse tree, as follows:
Figure 11.4
Code generation.
The reader can verify that when applied to the tree in figure 11.4, this algorithm generates the stack-machine code shown in the figure.
Translating Flow Control
High-level programming languages are equipped with a variety of control flow structures like if, while, for, switch, and so on. In contrast, low-level languages typically offer two basic control primitives: conditional goto and unconditional goto. Therefore, one of the challenges faced by the compiler writer is to translate structured code segments into target code utilizing these primitives only. As shown in figure 11.5, the translation logic is rather simple.
Two features of high-level languages make the compilation of control structures slightly more challenging than that shown in figure 11.5. First, a program normally contains multiple instances of if and while statements. The compiler can handle this multiplicity by generating and using unique label names. Second, control structures can be nested, for example, if within while within another while and so on. This complexity can be dealt with easily using a recursive compilation strategy.
11.2 Specification
Usage
The Jack compiler accepts a single command line parameter, as follows:
Figure 11.5
Compilation of control structures.
Where
source
is either a file name of the form Xxx . jack (the extension is mandatory) or a directory name containing one or more . jack files (in which case there is no extension). The compiler compiles each Xxx.jack file into a file named Xxx.vm, created in the same directory in which the source file is located. If source is a directory name, each . jack file located in it is compiled, creating a corresponding .vm file in the same directory.
11.2.1 Standard Mapping over the Virtual Machine
The compiler translates each .jack file into a .vm file containing one VM function for each constructor, function, and method found in the . jack file (see figure 7.8). In doing so, every Jack-to-VM compiler must follow the following code generation conventions.
File and Function Naming
Each . jack class file is compiled into a separate .vm file. The Jack subroutines (functions, methods, and constructors) are compiled into VM functions as follows:
■ A Jack subroutine xxx() in a Jack class Yyy is compiled into a VM function called Yyy . xxx.
■ A Jack
function or constructor
with k arguments is compiled into a VM function that operates on k arguments.
■ A Jack
method
with k arguments is compiled into a VM function that operates on
k
+ 1 arguments. The first argument (argument number 0) always refers to the this object.
Memory Allocation and Access
■ The
local variables
of a Jack subroutine are allocated to, and accessed via, the virtual local segment.
■ The
argument variables
of a Jack subroutine are allocated to, and accessed via, the virtual argument segment.
■ The
static variables
of a . jack class file are allocated to, and accessed via, the virtual static segment of the corresponding .vm file.
■ Within a VM function corresponding to a Jack method or a Jack constructor, access to the fields of the this object is obtained by first pointing the virtual this segment to the current object (using pointer 0) and then accessing individual fields via this index references, where index is an non-negative integer.
■ Within a VM function, access to array entries is obtained by first pointing the virtual that segment (using pointer 1) to the address of the desired array entry and then accessing the array entry via that 0 references.
Subroutine Calling
■ Before calling a VM function, the caller (itself a VM function) must push the function’s arguments onto the stack. If the called VM function corresponds to a Jack method, the first pushed argument must be a reference to the object on which the method is supposed to operate.
■ When compiling a Jack method into a VM function, the compiler must insert VM code that sets the base of the this segment properly. Similarly, when compiling a Jack constructor, the compiler must insert VM code that allocates a memory block for the new object and then sets the base of the this segment to point at its base.
Returning from Void Methods and Functions
High-level void subroutines don’t return values. This abstraction is handled as follows:
■ VM functions corresponding to
void
Jack methods and functions must return the constant 0 as their return value.
■ When translating a do sub statement where sub is a void method or function, the caller of the corresponding VM function must pop (and ignore) the returned value (which is always the constant 0).
Constants
■ null and false are mapped to the constant 0. True is mapped to the constant
-1 (this constant can be obtained via push constant 1 followed by neg).
Use of Operating System Services
The basic Jack OS is implemented as a set of VM files named Math . vm, Array . vm, Output . vm, Screen . vm, Keyboard . vm, Memory . vm, and Sys.vm (the API of these compiled class files was given in chapter 9). All these files must reside alongside the VM files generated by the compiler. This way, any VM function can call any OS VM function for its effect. In particular, when needed, the compiler should generate code that uses the following OS functions:
■ Multiplication and division are handled using the OS functions Math. multiply ( ) and Math . divide ( ).
■ String constants are created using the OS constructor String.new(length). String assignments like x=“cc...c” are handled using a series of calls to the OS routine String . appendChar (nextChar).
■ Constructors allocate space for new objects using the OS function Memory.alloc(size).
11.2.2 Compilation Example
Compiling a Jack program (one or more . jack class files) involves two main tasks: parsing the code using the compilation engine developed in the previous chapter, and generating code according to the guidelines and specifications given above. Figure 11.6 gives a “live example” of many of the code generation issues mentioned in this chapter.