今日の英語 - An Incremental Approach to Compiler Construction その2

速読の練習。速読はしない。文の構造を意識しつつ読み進める習慣をつける。方法論はTOEFLテスト速読・速聴大特訓 基礎編で。

ルールは以下の通り。今日から別の論文。もしあわないようであればやめる。

  • トピックセンテンスは {{ }} で囲む
  • シグナルワードは == ==
  • パラグラフのトピックは必ず書く。
  • サポート情報のスタイルを書く
  • 列挙には(1)(2)(3)と振る
  • 対比には (A)(a) のように目印
  • 例示は何に対するサポートかを一言書き添える
  • 原因・結果には => を

3. Writing a Compiler in 24 Small Steps

small steps

{{Now that we described the development methodology, we turn our
attention to the actual steps taken in constructing a compiler.}} This
section is a brief description of 24 incremental stages: ==the first ==is a
small language composed only of small integers, and ==the last== covers
most of the requirements of R5RS. A more detailed presentation of
these stages is in the accompanying extended tutorial.

3.1 Integers

Let's write small compiler that returns fixnum

The simplest language that we can compile and test is composed
of the fixed-size integers, ==or(言い換え)== fixnums. {{Let’s write a small compiler
that takes a fixnum as input and produces a program in assembly
that returns that fixnum.}} ==Since== we don’t know yet how to do that,
we ==ask== for some help from another compiler that does know: gcc.
Let’s write a small C function that returns an integer:

int scheme_entry(){
  return 42;
}

Let’s compile it using gcc

{{Let’s compile it using gcc -O3 --omit-frame-pointer -S
test.c and see the output.}} The most relevant lines of the output
file are the following:

1. .text
2. .p2align 4,,15
3. .globl scheme_entry
4. .type scheme_entry, @function
5. scheme_entry:
6. movl $42, %eax
7. ret

see the output lines

(列挙)==Line 1== starts a text segment, where code is located. ==Line 2=== aligns
the beginning of the procedure at 4-byte boundaries (not important
at this point). ==Line 3== informs the assembler that the scheme entry
label is global so that it becomes visible to the linker. Line 4
says that scheme entry is a function. ==Line 5== denotes the start of
the scheme entry procedure. ==Line 6== sets the value of the %eax
register to 42. ==Line 7== returns control to the caller, which expects
the received value to be in the %eax register.

トピックセンテンスはなし。あえて言うなら全体。

generating code from scheme

{{Generating this file from Scheme is straightforward.}} Our compiler
takes an integer as input and prints the given assembly with
the input substituted in for the value to be returned.

(define (compile-program x)
  (emit "movl $~a, %eax" x)
  (emit "ret"))

run-time system for test

{{To test our implementation, we write a small C run-time system
that calls our scheme entry and prints the value it returns:}}

/* a simple driver for scheme_entry */
#include <stdio.h>
int main(int argc, char** argv){
  printf("%d\n", scheme_entry());
  return 0;
}

3.2 Immediate Constants

Other immediate values

{{Values in Scheme are not limited to the fixnum integers.}} (列挙)Booleans,
characters, and the empty list form a collection of immediate values.
Immediate values are those that can be stored directly in
a machine word and ==therefore== do not require additional storage.
The types of the immediate objects in Scheme are disjoint, ==consequently,==
the implementation cannot use fixnums to denote booleans
or characters. The types must also be available at run time to allow
the driver to print the values appropriately and to allow us to
provide the type predicates (discussed in the next step).

using tag for type information and value

{{==One== way of encoding the type information is by dedicating some
of the lower bits of the machine word for type information and
using the rest of the machine word for storing the value.}} Every type
of value is defined by a mask and a tag. The mask defines which bits
of the integer are used for the type information and the tag defines
the value of these bits.

detail of tags

{{For fixnums, the lower two bits (mask = 11b) must be 0
(tag = 00b).}} This leaves 30 bits to hold the value of a fixnum.
Characters are tagged with 8 bits (tag = 00001111b) leaving 24
bits for the value (7 of which are actually used to encode the ASCII
characters). Booleans are given a 7-bit tag (tag = 0011111b), and
1-bit value. The empty list is given the value 00101111b.

extend compiler to handle the immediate types

{{We extend our compiler to handle the immediate types appropriately.}}
The code generator must convert the different immediate
values to the corresponding machine integer values.

(define (compile-program x)
  (define (immediate-rep x)
    (cond
    ((integer? x) (shift x fixnum-shift))
    ...))
  (emit "movl $~a, %eax" (immediate-rep x))
  (emit "ret"))

extension ofr driver

{{The driver must also be extended to handle the newly-added
values.}} The following code illustrates the concept:
#include
#define fixnum_mask 3
#define fixnum_tag 0
#define fixnum_shift 2
...
int main(int argc, char** argv){
int val = scheme_entry();
if((val & fixnum_mask) == fixnum_tag){
printf("%d\n", val >> fixnum_shift);
} else if(val == empty_list){
printf("()\n");
} ...
return 0;
}

3.3 Unary Primitives

simple call primitives

{{We extend the language now to include calls to primitives that accept
one argument.}} We start with the simplest of these primitives:
add1 and sub1. ==To== compile an expression in the form (add1 e),
we first emit the code for e. That code would evaluate e placing its
value in the %eax register. What remains to be done is incrementing
the value of the %eax register by 4 (the shifted value of 1). The machine
instruction that performs addition/subtraction is addl/subl.
(define (emit-expr x)
(cond
*1
(emit "addl $~a, %eax" (immediate-rep 1)))
...))
(else ...)))

integer->char, char->integer

{{The primitives integer->char and char->integer can be
added next.}} To convert an integer (assuming it's in the proper
range) to a character, the integer (already shifted by 2 bits) is
shifted further 6 bits to make up a total of char-shift, the result is
then tagged with the char-tag. Converting a character to a fixnum
requires a shift to the right by 6 bits. The choice of tags for the
fixnums and characters is ==important== for realizing this concise and
potentially fast conversion.

tag の choice か。

null?, zero? and not.

{{We implement the predicates null?, zero?, and not next.}}
There are many possible ways of implementing each of these predicates.
The following sequence works for zero? (assuming the
value of the operand is in %eax):
1. cmpl $0, %eax
2. movl $0, %eax
3. sete %al
4. sall $7, %eax
5. orl $63, %eax

detail of zero?

Line 1 compares the value of %eax to 0. Line 2 zeros the value
of %eax. Line 3 sets %al, the low byte of %eax, to 1 if the two
compared values were equal, and to 0 otherwise. Lines 4 and 5
construct the appropriate boolean value from the one bit in %eax.
The predicates integer? and boolean? are handled similarly
with the exception that the tag of the value must be extracted (using
andl) before it is compared to the fixnum/boolean tag.

3.4 Binary Primitives

using stack for save intermediate values

Calls to binary, and higher-arity, primitives cannot in general be
evaluated using a single register since evaluating one subexpression
may overwrite the value computed for the other subexpression. {{==To==
implement binary primitives (such as +, *, char

stack

{{The stack is arranged as a contiguous array of memory locations.}}
A pointer to the base of the stack is in the %esp register. The
base of the stack, 0(%esp), contains the return-point. The returnpoint
is an address in memory where we return after computing the
value and therefore should not be modified. We are free to use the
memory locations above the return-point (-4(%esp), -8(%esp),

  • 12(%esp), etc.) to hold our intermediate values.

using stack index

{{==In order to== guarantee never overwriting any value that will be
needed after the evaluation of an expression, we arrange the code
generator to maintain the value of the stack index.}} The stack index
is a negative number that points to the first stack location that
is free. The value of the stack index is initialized to −4 and is
decremented by 4 (the word-size, 4 bytes) every time a new value is
saved on the stack. The following segment of code illustrates how
the primitive + is implemented:
(define (emit-primitive-call x si)
(case (primcall-op x)
*2
(emit "addl ~a(%esp), %eax" si))
...))
The other primitives (-, *, =, <, char=?, etc.) can be easily
implemented by what we know so far.

3.5 Local Variables

local variables

Now that we have a stack, implementing let and local variables
is straightforward. {{All local variables will be saved on the stack
and an environment mapping variables to stack locations is maintained.}}
When the code generator encounters a let-expression, it
=-first== evaluates the right-hand-side expressions, one by one, saving
the value of each in a specific stack location. ==Once== all the righthand-
sides are evaluated, the environment is extended to associate
the new variables with their locations, and code for the body of the
let is generated in the new extended environment. When a reference
to a variable is encountered, the code generator locates the
variable in the environment, and emits a load from that location.

(define (emit-expr x si env)
(cond
*3
*4
(define (emit-let bindings body si env)
(let f *5
(cond
*6
(else
(let ((b (car b*)))
(emit-expr (rhs b) si env)
(emit "movl %eax, ~a(%esp)" si)
(f (cdr b*)
(extend-env (lhs b) si new-env)
(- si wordsize)))))))

*1:immediate? x) (emit "movl $~a, %eax" (immediate-rep x))) ((primcall? x) (case (primcall-op x) ((add1) (emit-expr (primcall-operand1 x

*2:add1) ...) ((+) (emit-expr (primcall-operand2 x) si) (emit "movl %eax, ~a(%esp)" si) (emit-expr (primcall-operand1 x) (- si wordsize

*3:immediate? x) ...) ((variable? x) (emit "movl ~a(%esp), %eax" (lookup x env))) ((let? x) (emit-let (bindings x) (body x) si env

*4:primcall? x) ...) ...

*5:b* bindings) (new-env env) (si si

*6:null? b*) (emit-expr body si new-env