Exercises

Course links

Exercise 12

More from the textbook:

(a) Exercises 8.10.1 and 8.10.2 (pages 572 and 573).

(b) Exercises 7.7.1 and 7.7.2 (pages 493 and 494).

Exercise 11

More from the textbook:

(a) Exercise 8.2.5 (page 517).

(b) Exercise 8.4.1 (pages 531 and 532).

(c) Exercises 8.6.1, 8.6.3, and 8.6.4 (pages 548 and 549).

Exercise 10

Once again, I have selected exercises from the textbook:

(a) Exercise 7.2.3 (page 440).

(b) Exercise 7.3.1 and 7.3.2 (page 451). If you find the ML function difficult to follow because of the unfamiliar notation, you can work instead with the equivalent Scheme procedure

(define main
  (lambda ()
    (letrec ((fib0 (lambda (n)
                      (letrec ((fib1 (lambda (n)
                                        (letrec ((fib2 (lambda (n)
                                                          (+ (fib1 (- n 1))
                                                             (fib1 (- n 2))))))
                                          (if (>= n 4)
                                              (fib2 n)
                                              (+ (fib0 (- n 1))
                                                 (fib0 (- n 2))))))))
                        (if (>= n 2)
                            (fib1 n)
                            1)))))
      (fib0 4))))

Exercise 9

Our exercise 9 is selected from some of the textbook's exercises:

(a) Exercise 5.4.4 (page 337).

(b) Exercise 5.4.5 (page 337).

(c) Any two of exercises 5.5.1 through 5.5.5 (page 352), with the condition that you must use at least two different general parsing strategies. (That is: you may not choose to do only 5.5.1 and 5.5.2, because they both use recursive-descent parsing, and you may not choose to do only 5.5.3 and 5.5.4, because they both use LL parsing.)

In the implementation, you'll have to generate dummy code for instances of the nonterminals C and L. You may assume for purposes of this exercise that every instance of C results in the generation of the single three-address-code statement t = 0 and every instance of L results in the generation of the single three-address-code statement u = 1.

Please put the files containing your solution into a tarball and e-mail the tarball to me as an attachment to a message with the subject line "[CSC 362] Exercise 9."

Exercise 8

(a) List the tokens of Dilua and write a C type definition of an enumeration type for the token tags, making sure that none of the values in the enumeration is 0.

(b) Write a C type definition for nodes in abstract syntax trees for Dilua (and also define a type for pointers to such nodes).

(c) Write a specification of the grammar of Dilua for the bison parser generator. All of Dilua's binary operators are left-associative and have equal precedence, and all of Dilua's unary operators have higher precedence (i.e., bind more tightly) than any of the binary operators; use these facts, if necessary, to resolve ambiguities in the given grammar.

(d) Add appropriate actions to the bison specification, so that it builds an abstract syntax tree for a Dilua chunk, assuming that the yylex function has been defined so that, each time it is invoked, it returns the tag of the next token in the source file (cast to type int), or 0 when the end of the source file is reached. In addition, each action should print what it does to standard output, to facilitate testing and debugging.

Please put the files containing your solution into a tarball and e-mail the tarball to me as an attachment to a message with the subject line "[CSC 362] Exercise 8."

Exercise 7

The file /home/stone/courses/compilers/code/dilua.bnf contains the syntax for a small programming language, expressed in Backus-Naur form.

Each symbol or keyword token is enclosed in apostrophes. Generic tokens for identifiers (NAME), numerals (NUMBER), string literals (STRING), binary operators (BINOP), and unary operators (UNOP) are fully capitalized. Nonterminals of the grammar are written in lower case with no enclosing quotation marks.

(a) If this grammar is ambiguous, try to recast it to remove the ambiguities. Document your way of resolving each one.

(b) If the grammar resulting from part (a) contains any instances of left recursion, eliminate them, using the techniques described in section 4.3.3 of the text.

(c) Use left factoring, if necessary, to modify the grammar resulting from part (b), making it more suitable for LL parsing.

(d) Determine which nonterminals of the grammar resulting from part (c) are nullable.

(e) Compute the FIRST function for every grammar symbol in the language, using the techniques described in section 4.4.2 of the text.

(f) Compute the FOLLOW function for every nonterminal, using the techniques described in section 4.4.2 of the text.

(g) Construct a predictive parsing table for the language, using the techniques described in section 4.4.3 of the text.

Please submit the result of each part, (a) through (g), electronically, either as a single long text file or as a tarball containing a file for each result. E-mail the text file or tarball to me as an attachment to a message with the subject line "[CSC 362] Exercise 7".

Exercise 6

Our exercise 6 will be the textbook's exercise 4.2.8 (pages 208 and 209). I prefer electronic submission, but will also accept legible hard copy.

Exercise 5

Design, write, compile, and test a Flex input file that generates a lexical analyzer for the ABC programming language, as described in exercise 4 below.

In class, we discussed some unexpectedly difficult tokenization problems that ABC presents. Here are my recommendations for dealing with these:

      1..5            <NUMERAL_INT, 1>  <RANGE>  <NUMERAL_INT, 5>
      1..foo          <NUMERAL_INT, 1>  <RANGE>  <NAME, foo>
      foo..5          <NAME, foo>  <RANGE>  <NUMERAL_INT, 5>
      5e7             <NUMERAL_FLOAT, 50000000.0>
      5 e7            <NUMERAL_INT, 5>  <NAME, e7>
      5e*             (syntax error at asterisk)
      5 e*            <NUMERAL_INT, 5>  <NAME, e>  <STAR>
      5e7'foo'        <NUMERAL_FLOAT, 50000000.0>  <TEXT_DISPLAY, "foo">
      5 e7'foo'       <NUMERAL_INT, 5>  <NAME, e7'foo'>
      5e'foo'         (syntax error at first apostrophe)
      5 e'foo'        <NUMERAL_INT, 5>  <NAME, e'foo'>

Ideally, the lexical analyzer that Flex generates should follow these recommendations.

Please submit a tarball containing (1) the Flex input file, (2) the C program that it generates, (3) a set of input files containing tests, (4) the command lines that you used to construct the test runs, and (5) one or more output files showing the results of those tests. (Use the shell facilities and/or the tee utility to collect these output files.) E-mail the tarball to me as an attachment to a message with the subject line "[CSC 362] Exercise 5".

Exercise 4

Programs in the ABC programming language are sequences of names, keywords, numerals, text displays, symbols, and newlines:

A comment in ABC begins with a backslash character and ends at the end of the line (i.e., on GNU or Unix systems, with the next following newline character).

In addition to its value field, each token of ABC must also contain a string field indicating the file in which the token was found and an unsigned int field indicating the line on which it appeared. Note that, for a newline token, the line is the one beginning with the counted spaces, not any preceding empty or comment line.

The assignment is to design, write, and test a lexical analyzer for ABC. The main program should take one or more command-line arguments, each of which should be a file containing ABC code, and should recover and write to standard output all of the tokens from each of those files in turn, in the order named on the command line. Each token should be written on a separate line. You may choose your own (human-readable) format for displaying tokens, provided that you make all of the fields of the token visible in the output.

I strongly recommend implementing a function that reads from a given FILE * (assumed to be open for input) and either constructs and returns the next token from that file or an otherwise out-of-range value, such as a null pointer, indicating that the file contains no more tokens. The main program can then deal with the mechanics of parsing the command line, opening and closing the source files at the right times, and so on.

Please submit a tarball containing (1) the C program implementing your lexical analyzer, which may consist of several files, (2) a set of input files containing tests, (3) the command lines that you used to construct the test runs, and (4) one or more output files showing the results of those tests. (Use the shell facilities and/or the tee utility to collect these output files.) E-mail the tarball to me as an attachment to a message with the subject line "[CSC 362] Exercise 4".

Exercise 3

Let's extend the prefix notation for Boolean expressions that we introduced in exercise 2 to include a conditional expression analogous to the ternary conditional expressions of C and Java, with the syntax

    boolean_expression -> 'C' boolean_expression boolean_expression boolean_expression

When a conditional expression is evaluated, its first subexpression (the test) is evaluated first. If the value of the test is true, then the second subexpression (the consequent) is evaluated, and its value becomes the value of the entire conditional expression. If the value of the test is false, then the third subexpression (the alternative) is evaluated, and its value becomes the value of the entire conditional expression. The subexpression that is not selected is not evaluated.

Let's also add an assignment operation, with the syntax

    boolean_expression -> 'G' variable boolean_expression

Executing an assignment causes the value of the boolean subexpression to be stored into the variable, overwriting any previous value. The value of an assignment expression is the value of its boolean subexpression, as in C or Java.

Since assignment has a side effect, we'll now also need to stipulate an evaluation order for the binary Boolean operators. Let's say that each of them must always evaluate both operands fully and that, in each case, the left operand must be fully evaluated before the right operand.

Finally, it will be useful to have a neutral binary operator that simply orders the side effects of its subexpressions, as the comma operator does in C or Java:

    boolean_expression -> 'Z' boolean_expression boolean_expression

In other words, Z evaluates both of its subexpressions, like the other binary operators, from left to right. The value of the second subexpression becomes the value of the entire Z-expression; the first subexpression is evaluated only for its side effect.

The exercise is to write and test a front end for this extended language of Boolean expressions -- a program that reads in a Boolean expression from a file designated by the user (as a command-line argument) and writes out, to standard output, three-address code that would compute the value of the Boolean expression.

The grammar for the three-address code that is emitted should be this:

    program -> statement '\n' statements

    statement -> optional_label unlabelled_statement

    optional_label -> label ':'
        | 

    unlabelled_statement -> variable '=' expression
        | 'ifFalse' vorc 'goto' label
        | 'ifTrue' vorc 'goto' label
        | 'goto' label
        | 'stop'

    vorc -> variable
        | 'true'
        | 'false'

    expression -> vorc
        | 'not' vorc
        | vorc 'and' vorc
        | vorc 'or' vorc
        | vorc 'eqv' vorc
        | vorc 'xor' vorc

    statements -> statement '\n' statements
        |

Lexically, a label should be a capital letter L followed by the decimal numeral for a positive integer, and a variable should either be a lower-case letter (for variables occurring in the input expression) or the whorl character @ followed by the decimal numeral for a non-negative integer (for variables added by the code generator). We'll adopt the convention that the value stored in @0 when it halts will be the output from the three-address code program when it is run.

The '\n' notation in the first and last rules is meant to say that every statement is terminated by a newline.

Executing the stop statement halts the execution of the three-address-code program (no matter where it appears in the sequence of statements.

For instance, the expression ZGpTZGqFZGrTCrNEAFqKTXrNqAqr might yield the three-address code

        p = true
        q = false
        r = true
        ifFalse r goto L1
        @1 = false or q
        @2 = not q
        @3 = r xor @2
        @4 = true and @3
        @5 = @1 eqv @4
        @6 = not @5
        @0 = @6
        goto L2
L1:     @7 = q or r
        @0 = @7
L2:     stop

Other, equivalent code sequences are also possible, and there are some fairly obvious optimizations that one can make. In the extreme case, since this language has no input mechanism, one might even incorporate an interpreter right into the front end and reduce every program that is semantically correct (i.e., never uses an uninitialized variable and always assigns a value to @0) to either

        @0 = true
        stop

or

        @0 = false
        stop

However, the exercise will be of greater relevance to compiler construction if you postpone the actual evaluation of the expression and concentrate on the mechanics of producing correct three-address code.

As usual, I recommend starting by developing test cases -- simple examples of what your program should do with the values it reads in. Then write the most straightforward parser and code generator you can that will handle your test cases correctly. Then develop more test cases and adapt your program to handle them. Repeat until satisfied.

You may again assume that no expression contains more than 1023 characters. However, as several of you discovered in exercise 2, the prefix notation can be parsed with no lookahead, reading in each character of the expression only when it is needed, without ever having to store it in a buffer. This makes it possible to remove the upper bound on expression size.

Please submit a tarball containing (1) the C program implementing your front end, which may consist of several files, (2) a set of matched input and output files, each input file holding one of your test expressions and each output file holding the output that your program produced for that test expression, and (3) a Makefile directing the compilation of your program. You may want to add a tests target to the Makefile that will, when executed, run your program on each input file and generate a new output file. E-mail the tarball to me as an attachment to a message with the subject line "[CSC 362] Exercise 3".

Exercise 2

Boolean expressions like those that we discussed as an extended example in the class session of January 29 can also be written in prefix notation. Here's a BNF grammar for them, using T and F as Boolean literals (for "true" and "false," respectively), N for the negation operator, K for conjunction, A for disjunction, E for equivalence, and X for non-equivalence (``exclusive or''):

    boolean_expression -> 'T'
        | 'F'
        | variable
        | 'N' boolean_expression
        | 'K' boolean_expression boolean_expression
        | 'A' boolean_expression boolean_expression
        | 'E' boolean_expression boolean_expression
        | 'X' boolean_expression boolean_expression

We'll let any lower-case letter of the English alphabet be a variable.

No parentheses are needed, because the arity of each operation is known, and no operator precedence or associativity indicators are needed, either, because the structure of the expression unambiguously determines the order of operations.

The goal of the exercise is to write a translator that will read in a Boolean expression in this prefix notation and write it back out again in the infix notation that we specified in class, which used this grammar:

    boolean_expression -> disjunction 'iff' boolean_expression
        | disjunction 'xor' boolean_expression
        | disjunction
    disjunction -> conjunction 'or' disjunction
        | conjunction
    conjunction -> negation 'and' conjunction
        | negation
    negation -> 'not' negation
        | basic_expression
    basic_expression -> 'true'
        | 'false'
        | variable
        | '(' boolean_expression ')'

For clarity, we set off these Boolean operators from their operands with spaces. So, for example, NEAFqKTXrNq might be translated into not ((false or q) iff (true and (r xor not q))).

(a) Write a set of test cases for the translator -- Boolean expressions in prefix notation and their infix equivalents.

(b) Convert the prefix grammar shown above into a syntax-directed translation scheme by adding actions to it.

(c) Adapt the translation scheme, if necessary, to facilitate parsing.

(d) Write the C fragments for a syntax-directed predictive parser for the prefix grammar.

(e) Write, in C, and test a syntax-directed translator that converts an expression of that grammar into an equivalent expression of the infix grammar. Document any simplifications or optimizations that you make in the parser fragments (as the authors do in section 2.5.4 of the textbook), briefly explaining how they preserve the intended results of translation. Provide a user interface that reads Boolean expressions in prefix form from standard input, one per line, and outputs the translations, one per line, to standard output, stopping at end of file or when a syntactically incorrect input is encountered. You may assume that no input line will contain more than 1023 characters.

Part (e) is more difficult if the translation algorithm is forbidden to introduce parentheses that are theoretically superfluous (such as the parentheses around ``false or q'' and those around "true and (r xor not q)" in the example above). For purposes of this exercise, a program need not constrain the translation in this way in order to receive full credit.

Please submit a tarball containing (1) a text file containing your syntax-directed translation scheme, with notes about any adaptations that you made in order to simplify parsing; (2) the C program implementing your translator, which may consist of several files; and (3) an input file containing the test cases you devised in part (a), and an output file containing the corresponding translations. E-mail it to me as an attachment to a message with the subject line "[CSC 362] Exercise 2".

Exercise 1

The header file /home/stone/courses/compilers/code/lists.h supplies a data structure for singly-linked lists and defines an interface to it, supporting a vaguely Scheme-like set of primitive operations. The exercise is to document, write, and test an efficient implementation of this interface, in a file named lists.c. You can use the standard C99 libraries, but no others, and you (or your team) must write all the code; you may not copy or adapt it from other sources.

You should submit your lists.c file and any supporting files that you'd like me to see or use in evaluating your implementation, such as a test program that checks that your implementation is correct. Send an e-mail to stone@cs.grinnell.edu with the subject line "[CSC 362] Exercise 1" and attach your file(s) to it; if there are three or more such files, however, create a tarball (an archive file in .tar or .tgz format) or a ZIP archive file and attach that instead.

The exercises in this course are adapted to a programming style in which you write documentation first and revise it as needed after each change to the code, and in which you create test cases for each function before coding it. You are not required to adopt this style, but it's likely that I'll bear down on any errors in your code that appear to have resulted from unwisely rejecting it.