CSC362 2004S.L01 (Class 41): Code Improvement Admin: * Custom course evaluation forms will be available outside my office this afternoon. Please pick one up and bring a *filled out* form to class on Friday. * Class Friday meets at Dairy Barn. Get your order to Ananta asap. * You have now reached the top of my grading heap. * Final questions on the translator or PAL or whatever? * Constants are considered variables because it makes polymorphism easier. * To access something in memory, you use new MemLoc(variable that gives address) * Can I dereference a MemLoc? You can now. Overview: * Improvement basics * Goals * Places to improve * Side note: Loops * Detour: Flow Graphs and Dataflow * From intermediate to actual code (code generation) * Favorite peephole optimizations /Improvement Basics/ * Since the first high-level language (Fortran), efficiency of generated code has been a focus of compiler writers. * Compiled code (nearly) equivalent in efficiency to hand-written code. * Some idiot called the process of improving code efficiency "optimization" * Basic goal: Make more efficient use of computing resources: * Less memory * Less processor time <- The main focus * (Less code space) * These days, good compilers "write" better assembly code than all but the best assembly language programmers * Improvement can be done at many stages in the compilation process: * Source code: * Choose better algorithms [The best source of improvement] * Use integers rather than floats * Turn tail-recursive functions into loops * Use smaller size types (floats rather than doubles, ints rather than longs) * Intermediate code: * Peephole optimizations (look at small sequences of instructions and improve) * Global optimzations (code duplication and propagation, code inlining, etc.) * Code generation * Take advantage of machine architecture * Generate alternatives, analyze expected cost ADD MEM[r1] MEM[r2] -> MEM[r3] * Assume two kinds of adds on the target architecture ADD ADDR reg0 -> Add something stored in memory to reg0, store in reg0 ADD reg1 reg0 -> Add reg1 to reg0, store in reg0 * Translate: MOVE MEM[r1] r4 MOVE MEM[r2] r5 ADD r4 r5 MOVE r5 MEM[r3] * Alt. translate MOVE MEM[r1] r4 ADD MEM[r2] r4 MOVE r4 MEM[r3] * Target code: * Peephole optimizations (look at small sequences of instructions and improve) /Example: Simple loop/ for i := 1 to 5 do writeln(i); MOVE $1 t LOOP: ARG t CALL writeln JEQ t $5 END_LOOP ADD $1 t -> t JUMP LOOP END_LOOP Unrolled ARG $1 CALL writeln ARG $2 CALL writeln ARG $3 CALL writeln ARG $4 CALL writeln ARG $5 CALL writeln /Example: Nested loops/ type: matrix = array[1..N,1..N] of integer; var: a, b: matrix; begin .... { Copy a to b } for i := 1 to N do for j := 1 to N do b[i,j] := a[i,j] end vs. for j := 1 to N do for i := 1 to N do b[i,j] := a[i,j] If the arrays are big enough, depending on how the arrays are stored, one of the two is likely to be *much slower* than the other. How to store a in memory (row major) a[1,1], a[1,2], a[1,3] ... a[1,n], a[2,1], ... vs. (column major) a[1,1], a[2,1], a[3,1] ... a[n,1], a[1,2], ... If you make big jumps through memory, what happens? PAGING! * Solutions? * Hack it! Implement your own multi-dimensional arrays so you know the ordering. * The compiler can analyze the patterns of array usage and figure out what is best. * ? Common peephole optimizations: * Copy propagation: * Given: MOV x y * Replace all instances of y by x in the following code until x or y changes * Advantages: Can potentially eliminate the MOV, can save temporaries * Common subexpression elimination * Given: ADD x y -> q; ADD x y -> r; ... * Replace the second ADD by MOVE q r * Replace all copies of r by q in the following code (copy prop) * Given: ADD x y -> q ... JUMP FOO ... ADD x y -> r ... JUMP FOO FOO: ADD x y -> s * Use ADD x y -> t MOV t q ... JUMP FOO ... ADD x y -> t MOV t r ... JUMP FOO FOO: MOV t s * Arithmetic improvements * From: MUL a $2 -> b * To: ADD a a -> b * From: MUL a $16 -> b * To: LSHIFT a 4 -> b * Other arithcmetic improvements require code analysis (in loops) ADD $1 i -> i MUL i $4 -> j ... * Replace by ADD $1 i -> i ADD $4 j -> j for q := 1 to N do a[q] := q+1; To compute a[q] we must add q*size-of-elt - base-index to base-of-a Friday: Wrapup! Have a good day.