今日の英語 - An Incremental Approach to Compiler Construction その3

速読の練習。速読はしない。文の構造を意識しつつ読み進める習慣をつける。方法論はTOEFLテスト速読・速聴大特訓 基礎編で。

ルールは以下の通り。

  • トピックセンテンスは {{ }} で囲む
  • シグナルワードは == ==
  • パラグラフのトピックは必ず書く。
  • サポート情報のスタイルを書く
  • 列挙には(1)(2)(3)と振る
  • 対比には (A)(a) のように目印
  • 例示は何に対するサポートかを一言書き添える
  • 原因・結果には => を

3.6 Conditional Expressions

{{Conditional evaluation is simple at the assembly-level.}} (例示:簡単)The simplest
implementation of (if test conseq altern) is:

(define (emit-if test conseq altern si env)
   (let ((L0 (unique-label)) (L1 (unique-label)))
   (emit-expr test si env)
   (emit-cmpl (immediate-rep #f) eax)
   (emit-je L0)
   (emit-expr conseq si env)
   (emit-jmp L1)
   (emit-label L0)
   (emit-expr altern si env)
   (emit-label L1)))

The code above first evaluates the test expression and compares
the result to the false value. Control is transferred to the alternate
code if the value of the test was false, otherwise, it falls through to
the consequent.

3.7 Heap Allocation

heap

{{Scheme's pairs, vector, strings, etc. do not fit in one machine word
and must be allocated in memory.}} We allocate all objects from
one contiguous area of memory. The heap is preallocated at the
start of the program and its size is chosen to be large enough to
accommodate our current needs. A pointer to the beginning of the
heap is passed to scheme entry to serve as the allocation pointer.
We dedicate one register, %esi, to hold the allocation pointer. Every
time an object is constructed, the value of %esi is incremented
according to the size of the object.

mechanism for distinguishable heap objects

{{The types of the objects must ==also== be distinguishable from
one another.}} We use a tagging scheme similar to the one used
for fixnums, booleans, and characters. Every pointer to a heapallocated
object is tagged by a 3-bit tag (列挙(1)001b for pairs, (2)010b for
vectors, (3)011b for strings, 101b for symbols, and (4)110b for closures;
(5)000b, 100b and 111b were already used for fixnums and the other
immediate objects). ==For== this tagging scheme to work, we need to
guarantee that the lowest three bits of every heap-allocated object
is 000b so that the tag and the value of the pointer do not interfere.
==This== is achieved by always allocating objects at double-word (or
8-byte) boundaries.

example of pairs

{{Let's consider how pairs are implemented first.}} A pair requires
two words of memory to hold its car and cdr fields. A call to
(cons 10 20) can be translated to:
movl $40, 0(%esi) # set the car
movl $80, 4(%esi) # set the cdr
movl %esi, %eax # eax = esi | 1
orl $1, %eax
addl $8, %esi # bump esi

car/cdr

{{The primitives car and cdr are simple; we only need to remember
that the pointer to the pair is its address incremented by 1.}}
Consequently, the car and cdr fields are located at −1 and 3 from
the pointer. For example, the primitive caddr translates to:
movl 3(%eax), %eax # cdr
movl 3(%eax), %eax # cddr
movl -1(%eax), %eax # caddr

Vectors and strings

{{Vectors and strings are different from pairs in that they vary in
length.}} This has ==two==(列挙) implications: (1) we must reserve one extra
memory location in the vector/string to hold the length, and (2)
after allocating the object, the allocation pointer must be aligned to
the next double-word boundary (allocating pairs was fine because
their size is a multiple of 8). For example, a call to the primitive
make-vector translates to:
movl %eax, 0(%esi) # set the length
movl %eax, %ebx # save the length
movl %esi, %eax # eax = esi | 2
orl $2, %eax
addl $11, %ebx # align size to next
andl $-8, %ebx # object boundary
addl %ebx, %esi # advance alloc ptr

Strings

{{Strings are implemented similarly except that the size of a string
is smaller than the size of a vector of the same length.}} The primitive
string-ref (and string-set!) must also take care of converting
a byte value to a character (and vise versa).

3.8 Procedure Calls

difficulty of lambda

{{The implementation of procedures and procedure calls are perhaps
the hardest aspect of constructing our compiler.}} ==The reason == for its
difficulty is that Scheme's lambda form performs more than one
task and the compiler must tease these tasks apart. (列挙 more than one task)==First==, a lambda
expression closes over the variables that occur free in its body so we
must perform some analysis to determine the set of variables that
are referenced, but not defined, in the body of a lambda. ==Second==,
lambda constructs a closure object that can be passed around.
==Third==, the notion of procedure calls and parameter-passing must
be introduced at the same point. We'll handle these issues one at a
time starting with procedure calls and forgetting all about the other
issues surrounding lambda.

labeles

{{We extend the language accepted by our code generator to contain
top-level labels (each bound to a code expression containing a
list of formal parameters and a body expression) and label calls.}}
::= (labels ((lvar ) ...) )
::= (code (var ...) )
::= immediate

var
(if )
(let ((var ) ...) )
(primcall prim-name ...)
(labelcall lvar ...)

Code generation for the new forms is as follows:
• For the labels form, a new set of unique labels are created and
the initial environment is constructed to map each of the lvars
to its corresponding label.
• For each code expression, the label is first emitted, followed
by the code of the body. The environment used for the body
contains, in addition to the lvars, a mapping of each of the
formal parameters to the first set of stack locations (−4, −8,
etc.). The stack index used for evaluating the body starts above
the last index used for the formals.
• For a (labelcall lvar e ...), the arguments are evaluated
and their values are saved in consecutive stack locations, skipping
one location to be used for the return-point. Once all of the
arguments are evaluated, the value of the stack-pointer, %esp
is incremented to point to one word below the return-point. A
call to the label associated with the lvar is issued. A call
instruction decrements the value of %esp by 4 and saves the address
of the next instruction in the appropriate return-point slot.
Once the called procedure returns (with a value in %eax), the
stack pointer is adjusted back to its initial position.
Figure 2 illustrates the view of the stack from the caller and callee
perspective.

3.9 Closures

add closure

{{Implementing closures on top of what we have so far should be
straightforward.}} ==First==, we modify the language accepted by our
code generator as follows:
• The form (closure lvar var ...) is added to the language.
This form is responsible for constructing closures. The
first cell of a closure contains the label of a procedure, and the
remaining cells contain the values of the free variables.
• The code form is extended to contain a list of the free variables
in addition to the existing formal parameters.
• The labelcall is replaced by a funcall form that takes an
arbitrary expression as a first argument instead of an lvar.

structure of closure

{{The closure form is similar to a call to vector.}} The label
associated with the lvar is stored at 0(%esi) and the values of
the variables are stored in the next locations. The value of %esi is
tagged to get the value of the closure, and %esi is bumped by the
required amount.

closure pointer

{{The code form, in addition to associating the formals with the
corresponding stack locations, associates each of the free variables
with their displacement form the closure pointer %edi.}}

funcall

{{The funcall evaluated all the arguments as before but skips
not one but two stack locations}}: ==one ==to be used to save the current
value of the closure pointer, and ==one== for the return point. ==After== the
arguments are evaluated and saved, the operator is evaluated, and
its value is moved to %edi (whose value must be saved to its stack
location). The value of %esp is adjusted and an indirect call through
the first cell of the closure pointer is issued. Upon return from the
call, the value of %esp is adjusted back and the value of %edi is
restored from the location at which it was saved.

gap between Scheme and our language

One additional problem needs to be solved. {{The source language
that our compiler accepts has a lambda form, and none of
the labels, code, closure forms.}} So, Scheme input must be converted
to this form before our code generator can accept it. The
conversion is easy to do in two steps:
1. Free-variable analysis is performed. Every lambda expression
appearing in the source program is annotated with the set of
variables that are referenced but not defined in the body of the
lambda. For example,
(let *1
(lambda (y) (lambda () (+ x y))))
is transformed to:
(let *2
(lambda (y) (x) (lambda () (x y) (+ x y))))
2. The lambda forms are transformed into closure forms and the
codes are collected at the top. The previous example yields:
(labels *3 (closure f1 x)))

*1:x 5

*2:x 5

*3:f0 (code () (x y) (+ x y))) (f1 (code (y) (x) (closure f0 x y)))) (let ((x 5