The Elements of Computing Systems: Building a Modern Computer from First Principles (22 page)

BOOK: The Elements of Computing Systems: Building a Modern Computer from First Principles
7.78Mb size Format: txt, pdf, ePub
■ (Symbol): This pseudo-command binds the Symbol to the memory location into which the next command in the program will be stored. It is called “pseudocommand” since it generates no machine code.
 
(The remaining conventions in this section pertain to assembly programs only.)
 
Constants and Symbols
Constants
must be non-negative and are written in decimal notation. A user-defined symbol can be any sequence of letters, digits, underscore (_), dot (.), dollar sign ($), and colon (:) that does not begin with a digit.
 
Comments
Text beginning with two slashes (//) and ending at the end of the line is considered a comment and is ignored.
 
White Space
Space characters are ignored. Empty lines are ignored.
 
Case Conventions
All the assembly mnemonics must be written in uppercase. The rest (user-defined labels and variable names) is case sensitive. The convention is to use uppercase for labels and lowercase for variable names.
6.2.2 Instructions
The Hack machine language consists of two instruction types called addressing instruction (
A
-instruction) and compute instruction (
C
-instruction). The instruction format is as follows.
The translation of each of the three fields comp, dest, jump to their binary forms is specified in the following three tables.
6.2.3 Symbols
Hack assembly commands can refer to memory locations (addresses) using either constants or symbols. Symbols in assembly programs arise from three sources.
 
Predefined Symbols
Any Hack assembly program is allowed to use the following predefined symbols.
Note that each one of the top five RAM locations can be referred to using two predefined symbols. For example, either R2 or ARG can be used to refer to RAM[2].
 
Label Symbols
The pseudo-command (Xxx) defines the symbol Xxx to refer to the instruction memory location holding the next command in the program. A label can be defined only once and can be used anywhere in the assembly program, even before the line in which it is defined.
 
Variable Symbols
Any symbol Xxx appearing in an assembly program that is not predefined and is not defined elsewhere using the (Xxx) command is treated as a variable. Variables are mapped to consecutive memory locations as they are first encountered, starting at RAM address 16 (0x0010).
6.2.4 Example
Chapter 4 presented a program that sums up the integers 1 to 100. Figure 6.2 repeats this example, showing both its assembly and binary versions.
Figure 6.2
Assembly and binary representations of the same program.
 
6.3 Implementation
The Hack assembler reads as input a text file named Prog.asm, containing a Hack assembly program, and produces as output a text file named Prog.hack, containing the translated Hack machine code. The name of the input file is supplied to the assembler as a command line argument:
The translation of each individual assembly command to its equivalent binary instruction is direct and one-to-one. Each command is translated separately. In particular, each mnemonic component (field) of the assembly command is translated into its corresponding bit code according to the tables in section 6.2.2, and each symbol in the command is resolved to its numeric address as specified in section 6.2.3.
We propose an assembler implementation based on four modules: a Parser module that parses the input, a
Code
module that provides the binary codes of all the assembly mnemonics, a
SymbolTable
module that handles symbols, and a main program that drives the entire translation process.
 
A Note about API Notation
The assembler development is the first in a series of five software construction projects that build our hierarchy of translators (
assembler
,
virtual machine, and compiler
). Since readers can develop these projects in the programming language of their choice, we base our proposed implementation guidelines on language independent APIs. A typical project API describes several modules, each containing one or more routines. In object-oriented languages like Java, C++, and C#, a module usually corresponds to a class, and a routine usually corresponds to a method. In procedural languages, routines correspond to functions, subroutines, or procedures, and modules correspond to collections of routines that handle related data. In some languages (e.g., Modula-2) a module may be expressed explicitly, in others implicitly (e.g., a
file
in the C language), and in others (e.g., Pascal) it will have no corresponding language construct, and will just be a conceptual grouping of routines.
6.3.1 The Parser Module
The main function of the parser is to break each assembly command into its underlying components (fields and symbols). The API is as follows.
Parser:
Encapsulates access to the input code. Reads an assembly language command, parses it, and provides convenient access to the command’s components (fields and symbols). In addition, removes all white space and comments.
6.3.2 The
Code
Module

Other books

Tempest Reborn by Peeler, Nicole
Fatal February by Barbara Levenson
Briar's Champion by Levey, Mahalia
One Hundred Days of Rain by Carellin Brooks
14 Degrees Below Zero by Quinton Skinner
Heartbreaker by Laurie Paige