今日の英語 - An Incremental Approach to Compiler Construction その1

速読の練習。速読はしない。文の構造を意識しつつ読み進める習慣をつける。方法論はTOEFLテスト速読・速聴大特訓 基礎編で。

ルールは以下の通り。今日から別の論文。もしあわないようであればやめる。

  • トピックセンテンスは {{ }} で囲む
  • シグナルワードは == ==
  • パラグラフのトピックは必ず書く。
  • サポート情報のスタイルを書く
  • 列挙には(1)(2)(3)と振る
  • 対比には (A)(a) のように目印
  • 例示は何に対するサポートかを一言書き添える
  • 原因・結果には => を

Abstract

Compiler seems to be difficult

{{Compilers are perceived to be magical artifacts, carefully crafted
by the wizards, and unfathomable by the mere mortals.}} Books on
compilers are better described as wizard-talk: written by and for
a clique of all-knowing practitioners. Real-life compilers are too
complex to serve as an educational tool. And the gap between
real-life compilers and the educational toy compilers is too wide.
The novice compiler writer stands puzzled facing an impenetrable
barrier, "better write an interpreter instead."

goal of this paper

{{The goal of this paper is to break that barrier.}} We show that
building a compiler can be as easy as building an interpreter. The
compiler we construct accepts a large subset of the Scheme programming
language and produces assembly code for the Intel-x86
architecture, the dominant architecture of personal computing. The
development of the compiler is broken into many small incremental
steps. Every step yields a fully working compiler for a progressively
expanding subset of Scheme. Every compiler step produces
real assembly code that can be assembled then executed directly
by the hardware. We assume that the reader is familiar with the
basic computer architecture: its components and execution model.
Detailed knowledge of the Intel-x86 architecture is not required.

our hope

The development of the compiler is described in detail in an
extended tutorial. Supporting material for the tutorial such as an
automated testing facility coupled with a comprehensive test suite
are provided with the tutorial. {{It is our hope that current and future
implementors of Scheme find in this paper the motivation for developing
high-performance compilers and the means for achieving
that goal.}}

この章

メインパラグラフは2つめ。コンパイラは難しいから脱却したい。

1. Introduction

Compilers have traditionally been regarded as complex

{{Compilers have traditionally been regarded as complex pieces of
software.}} The perception of complexity ==stems mainly from(複雑さの原因)== traditional
methods of teaching compilers as well as the lack of available
examples of small and functional compilers for real languages.

There is no gradual progress into the field of compiler writing.

Compiler books are polarized into ==two extremes==. Some of them
focus on (1つめ)"educational" toy compilers while ==the others== focus on
(2つめ) "industrial-strength" optimizing compilers. The toy compilers are
too simplistic and do not prepare the novice compiler writer to
construct a useful compiler. The source language of these compilers
often lacks depth and the target machine is often fictitious. Niklaus
Wirth states that "to keep both the resulting compiler reasonably
simple and the development clear of details that are of relevance
only for a specific machine and its idiosyncrasies, we postulate
an architecture according to our own choice"[20]. ==On the other==
extreme, advanced books focus mainly on optimization techniques,
and thus target people who are already well-versed in the topic.
{{There is no gradual progress into the field of compiler writing.}}

The probrem of the usual approach to introducing compilers

{{The ==usual== approach to introducing compilers is by describing
the structure and organization of a finalized and polished compiler.}}
The sequencing of the material as presented in these books mirrors
the passes of the compilers. Many of the issues that a compiler
writer has to be aware of are solved beforehand and only the final
solution is presented. The reader is not engaged in the process of
developing the compiler.

loss of foucus on the big picture and too much focus on individual passes

{{In these books, the sequential presentation of compiler implementation
leads to =(こちらはloss)loss== of focus on the big picture. ==(こちらはmuch)Too much== focus
is placed on the individual passes of the compiler;}} ==thus == the reader
is not actively aware of the relevance of a single pass to the other
passes and where it fits in the whole picture. Andrew Appel states
that "a student who implements all the phases described in Part I of
the book will have a working compiler"[2]. Part I of Appel's book
concludes with a 6-page chapter on "Putting it all together" after
presenting 11 chapters on the different passes of Tiger.

practical topis are ommitted of placed on appendix

{{==Moreover==, practical topics ==such as== (列挙1)code generation for a real
machine, (列挙2)interfacing to the operating system or to other languages,
(列挙3)heap allocation and (列挙4)garbage collection, and (列挙5)the issues surrounding
dynamic languages are either omitted completely or placed
in an appendix}}. Muchnick states that "most of the compiler material
in this book is devoted to languages that are well suited for
compilation: languages that have static, compile-time type systems,
that do not allow the user to incrementally change the code, and
that typically make much heavier use of stack storage than heap
storage"[13].

この章

2つめのパラグラフがメインアイディア。コンパイラの本には初心者向けかつ現実的なものがない。

2. Preliminary Issues

To develop a compiler, there are a few decisions to be made.

{{To develop a compiler, there are a few decisions to be made.}}
(decisions列挙)(列挙1)The source language, (列挙2)the implementation language, and (列挙3)the target
architecture must be selected. (列挙4)The development time frame must
be set. (列挙5)The development methodology and the final goal must be
decided. For the purpose of our tutorial, we made the decisions
presented == below.==

2.1 Our Target Audience

{{We do ==not assume == that the reader knows anything about assembly
language beyond the knowledge of the computer organization,
memory, and data structures.}} The reader ==is assumed== to have very
limited or no experience in writing compilers. Some experience
with writing simple interpreters is helpful, but not required.
We == assume == that the reader has basic knowledge of C and the C
standard library (e.g. malloc, printf, etc.). Although our compiler
will produce assembly code, some functionality is easier to
implement in C; implementing it directly as assembly routines distracts
the reader from the more important tasks.

2.2 The Source Language

{{In our tutorial, we choose a subset of Scheme as the source programming
language.}} The simple and uniform syntax of Scheme
obviates the need for a lengthy discussion of scanners and parsers.
The execution model of Scheme, with strict call-by-value evaluation,
simplifies the implementation. ==Moreover==, all of the Scheme
primitives in the subset can be implemented in short sequences of
assembly instructions. ==Although== not all of Scheme is implemented
in the first compiler, all the major compiler-related issues are tackled.
The implementation is a middle-ground between a full Scheme
compiler and a toy compiler.
In choosing a specific source language, we gain the advantage
that the presentation is more concrete and eliminates the burden
of making the connection from the abstract concepts to the actual
language.

2.3 The Implementation Language

{{Scheme is chosen as the implementation language of the compiler.}}
Scheme’s data structures are simple and most Scheme programmers
are familiar with the basic tasks such as constructing and processing
lists and trees. The ability to manipulate Scheme programs
as Scheme data structures greatly simplifies the first steps of constructing
a compiler, ==since== the issue of reading the input program is
solved. Implementing a lexical-scanner and a parser are pushed to
the end of the tutorial.
Choosing Scheme as the implementation language ==also== eliminates
the need for sophisticated and specialized tools. These tools
add a considerable overhead to the initial learning process and distracts
the reader from acquiring the essential concepts.

2.4 Choosing The Target Architecture

{{We choose the Intel-x86 architecture as our target platform.}} The
x86 architecture is the dominant architecture on personal computers
and thus is widely available.
Talking about compilers that are detached from a particular
architecture puts the burden on the reader to make the connection
from the abstract ideas to the concrete machine. Novice compiler
writers are unlikely to be able to derive the connection on their own.
==Additionally==, the compiler we develop is small enough to be easily
portable to other architectures, and the majority of the compiler
passes are platform independent.

2.5 Development Time Frame

{{The development of the compiler must proceed in small steps
where every step can be implemented and tested in one sitting.}} Features
that require many sittings to complete are broken down into
smaller steps. The result of completing every step is a fully working
compiler. The compiler writer, ==therefore==, achieves progress in every
step in the development. This is ==in contrast== with the traditional development
strategies that advocate developing the compiler as a series
of passes only the last of which gives the sense of accomplishment.
With our approach of incremental development, where every
step results in a fully working compiler for some subset of Scheme,
the risk of not "completing" the compiler is minimized. This approach
is useful for people learning about compilers on their own,
where the amount of time they can dedicate constantly changes. It
is also useful in time-limited settings such as an academic semester.

2.6 Development Methodology

{{We advocate the following iterative development methodology:}}
(列挙)
1. Choose a small subset of the source language that we can compile directly to assembly.
2. Write as many test cases as necessary to cover the chosen subset of the language.
3. Write a compiler that accepts an expression (in the chosen subset
of the source language) and outputs the equivalent sequence of assembly instructions.
4. Ensure that the compiler is functional, i.e. it passes all the tests
that are written beforehand.
5. Refactor the compiler, if necessary, making sure that none of
the tests are broken due to incorrect refactoring.
6. Enlarge the subset of the language in a very small step and repeat
the cycle by writing more tests and extending the compiler
to meet the newly-added requirements.

A fully working compiler for the given subset of the language
is available at every step in the development cycle starting from
the first day of development. The test cases are written to help ensure
that the implementation meets the specifications and to guard
against bugs that may be introduced during the refactoring steps.
Knowledge of compilation techniques as well as the target machine
is built incrementally. The initial overhead of learning the assembly
instructions of the target machine is eliminated instructions are
introduced only when they are needed. The compiler starts small
and is well focused on translating the source language to assembly,
and every incremental step reinforces that focus.

2.7 Testing Infrastructure

{{The interface to the compiler is defined by one Scheme procedure,
compile-program, that takes as input an s-expression representing
a Scheme program.}} The output assembly is emitted using an
emit form that routes the output of the compiler to an assembly
file.
Defining the compiler as a Scheme procedure ==allows us== to develop
and debug the compiler interactively by inspecting the output
assembly code. It also allows us to utilize an automated testing facility.
There are ==two==(列挙) core components of the testing infrastructure:
(1)the test-cases and (2)the test-driver.
The test cases are made of sample programs and their expected
output. For example, the test cases for the primitive + may be
defined as follows:

(test-section "Simple Addition")
(test-case (+ 10 15) "25")
(test-case (+ -10 15) "5")
...

The test-driver iterates over the test cases performing ==the following==(列挙)
actions: (1) The input expression is passed to compile-program
to produce assembly code. (2) The assembly code and a minimal
run-time system (to support printing) are assembled and linked to
form an executable. (3) The executable is run and the output is
compared to the expected output string. An error is signaled if any
of the previous steps fails.

2.8 The End Goal

{{For the purpose of this paper, we define the end goal to be writing
a compiler powerful enough to compile an interactive evaluator.}}
Building such a compiler forces us to solve many interesting problems.
A large subset of Scheme's core forms (lambda, quote, set!,
etc) and extended forms (cond, case, letrec, internal define
etc.) must be supported by the compiler. Although most of these
forms are not essential, their presence allows us to write our programs
in a more natural way. In implementing the extended forms,
we show how a large number of syntactic forms can be added without
changing the core language that the compiler supports.
A large collection of primitives (cons, car, vector?, etc.)
and library procedures (map, apply, list->vector, etc.) need
to be implemented. Some of these library procedures can be implemented
directly, while others require some added support from
the compiler. ==For example==, some of the primitives cannot be implemented
without supporting variable-arity procedures, and others
require the presence of apply. Implementing a writer and a reader
requires adding a way to communicate with an external run-time
system.

この章

1つめのパラグラフ。