Today in 362: Steps in Compilation Administriva ECA is still screwy Fill out the surveys Today The steps in compilation Sample Pascal program "Compile" it (primarily early stages) A compiler translates from one language (high-level) to another language (low-level). Goals of compilation: (1) Relatively fast (2) Generates "good" code You can make your compiler somewhat faster by breaking it up into components/phases. { Here is a sample program. } program sample(input,output). var val: integer; { Procedure: helper. Helps } procedure helper(v: integer); begin end; { helper } begin { main } end. {main } Phase 1: Lexical Analysis or Tokenizing Turns program into a series of "tokens" Drop whitespace. Drop comments. Group characters PROGRAM ID(sample) OPENPAREN ID(input) COMMA ID(output) RIGHTPAREN PERIOD A good lexical analyzer is O(#characters). Phase 2: Syntactic Analysis or Parsing Represent program according to grammatical structure A simple sentence has the form noun-phrase verb-phrase period. The strange person in the blue jacket typed awkwardly. sentence noun-phrase "The strange person in the blue jacket" verb-phrase verb "typed" adverb-list "awkwardly" period This type of analysis is helpful in compilation. Why? * Gives a uniform methodology * Adds some meaning (interpretable from the structure) * Eases translation * Resolves ambiguity 3 + 4 * 5 3 - 4 - 5 exp exp INT(3) OP(+) exp exp INT(4) OP(*) exp CHAR('a') Phase 3: Clean up the parse tree Removes the tokens that are useful for parsing and disambiguating, but not much else Phase 4: Semantic analysis Make sure that the structure makes sense (e.g., that variables are declared and that types are correct). Some interesting theoretical results which I don't recall. Phase 5: Translate to simple assembly language Goal: easy to translate into Restrict or enlarge the operations Typically no restrictions on registers Phases 1 through 5 are typically called "front end" (from source text to generic intermediate representation) The "back end" translates the SAL into machine code The "front end/back end" model permits lots of reuse Translate lots of languages to the SAL M Translate the SAL to lots of real assembly languages N M*N compilers The "front end/back end" makes intuitive sense: High level processing vs. low-level processing Modularity is good Back to the back Register allocation A register is a small memory unit close to the processor Optimization Improve the code Translation from simple assembly code to real assembly code