More from the textbook:
(a) Exercises 8.10.1 and 8.10.2 (pages 572 and 573).
(b) Exercises 7.7.1 and 7.7.2 (pages 493 and 494).
More from the textbook:
(a) Exercise 8.2.5 (page 517).
(b) Exercise 8.4.1 (pages 531 and 532).
(c) Exercises 8.6.1, 8.6.3, and 8.6.4 (pages 548 and 549).
Once again, I have selected exercises from the textbook:
(a) Exercise 7.2.3 (page 440).
(b) Exercise 7.3.1 and 7.3.2 (page 451). If you find the ML function difficult to follow because of the unfamiliar notation, you can work instead with the equivalent Scheme procedure
(define main
(lambda ()
(letrec ((fib0 (lambda (n)
(letrec ((fib1 (lambda (n)
(letrec ((fib2 (lambda (n)
(+ (fib1 (- n 1))
(fib1 (- n 2))))))
(if (>= n 4)
(fib2 n)
(+ (fib0 (- n 1))
(fib0 (- n 2))))))))
(if (>= n 2)
(fib1 n)
1)))))
(fib0 4))))
Our exercise 9 is selected from some of the textbook's exercises:
(a) Exercise 5.4.4 (page 337).
(b) Exercise 5.4.5 (page 337).
(c) Any two of exercises 5.5.1 through 5.5.5 (page 352), with the condition that you must use at least two different general parsing strategies. (That is: you may not choose to do only 5.5.1 and 5.5.2, because they both use recursive-descent parsing, and you may not choose to do only 5.5.3 and 5.5.4, because they both use LL parsing.)
In the implementation, you'll have to generate dummy code for instances of
the nonterminals C and L. You may assume for
purposes of this exercise that every instance of C results in
the generation of the single three-address-code statement t = 0 and
every instance of L results in the generation of the single
three-address-code statement u = 1.
Please put the files containing your solution into a tarball and e-mail the tarball to me as an attachment to a message with the subject line "[CSC 362] Exercise 9."
(a) List the tokens of Dilua and write a C type definition of an enumeration type for the token tags, making sure that none of the values in the enumeration is 0.
(b) Write a C type definition for nodes in abstract syntax trees for Dilua (and also define a type for pointers to such nodes).
(c) Write a specification of the grammar of Dilua for the bison parser generator. All of Dilua's binary operators are left-associative and have equal precedence, and all of Dilua's unary operators have higher precedence (i.e., bind more tightly) than any of the binary operators; use these facts, if necessary, to resolve ambiguities in the given grammar.
(d) Add appropriate actions to the bison specification, so that
it builds an abstract syntax tree for a Dilua chunk, assuming that the
yylex function has been defined so that, each time it is invoked, it
returns the tag of the next token in the source file (cast to type int), or 0 when the end of the source file is reached. In
addition, each action should print what it does to standard output, to
facilitate testing and debugging.
Please put the files containing your solution into a tarball and e-mail the tarball to me as an attachment to a message with the subject line "[CSC 362] Exercise 8."
The file /home/stone/courses/compilers/code/dilua.bnf contains the syntax for a small programming language, expressed in Backus-Naur form.
Each symbol or keyword token is enclosed in apostrophes. Generic tokens
for identifiers (NAME), numerals (NUMBER), string literals
(STRING), binary operators (BINOP), and unary operators
(UNOP) are fully capitalized. Nonterminals of the grammar are
written in lower case with no enclosing quotation marks.
(a) If this grammar is ambiguous, try to recast it to remove the ambiguities. Document your way of resolving each one.
(b) If the grammar resulting from part (a) contains any instances of left recursion, eliminate them, using the techniques described in section 4.3.3 of the text.
(c) Use left factoring, if necessary, to modify the grammar resulting from part (b), making it more suitable for LL parsing.
(d) Determine which nonterminals of the grammar resulting from part (c) are nullable.
(e) Compute the FIRST function for every grammar symbol in the language, using the techniques described in section 4.4.2 of the text.
(f) Compute the FOLLOW function for every nonterminal, using the techniques described in section 4.4.2 of the text.
(g) Construct a predictive parsing table for the language, using the techniques described in section 4.4.3 of the text.
Please submit the result of each part, (a) through (g), electronically, either as a single long text file or as a tarball containing a file for each result. E-mail the text file or tarball to me as an attachment to a message with the subject line "[CSC 362] Exercise 7".
Our exercise 6 will be the textbook's exercise 4.2.8 (pages 208 and 209). I prefer electronic submission, but will also accept legible hard copy.
Design, write, compile, and test a Flex input file that generates a lexical analyzer for the ABC programming language, as described in exercise 4 below.
In class, we discussed some unexpectedly difficult tokenization problems that ABC presents. Here are my recommendations for dealing with these:
1..5 <NUMERAL_INT, 1> <RANGE> <NUMERAL_INT, 5>
1..foo <NUMERAL_INT, 1> <RANGE> <NAME, foo>
foo..5 <NAME, foo> <RANGE> <NUMERAL_INT, 5>
5e7 <NUMERAL_FLOAT, 50000000.0>
5 e7 <NUMERAL_INT, 5> <NAME, e7>
5e* (syntax error at asterisk)
5 e* <NUMERAL_INT, 5> <NAME, e> <STAR>
5e7'foo' <NUMERAL_FLOAT, 50000000.0> <TEXT_DISPLAY, "foo">
5 e7'foo' <NUMERAL_INT, 5> <NAME, e7'foo'>
5e'foo' (syntax error at first apostrophe)
5 e'foo' <NUMERAL_INT, 5> <NAME, e'foo'>
Ideally, the lexical analyzer that Flex generates should follow these recommendations.
Please submit a tarball containing (1) the Flex input file, (2) the C
program that it generates, (3) a set of input files containing tests, (4)
the command lines that you used to construct the test runs, and (5) one or
more output files showing the results of those tests. (Use the shell
facilities and/or the tee utility to collect these output files.)
E-mail the tarball to me as an attachment to a message with the subject
line "[CSC 362] Exercise 5".
Programs in the ABC programming language are sequences of names, keywords, numerals, text displays, symbols, and newlines:
e, followed by one or more decimal digits; a sign (+ or -) can optionally appear between e and the digit sequence. A
numeral token must have a field in which the value of the numeral is
stored. This value should be a C double if the numeral contains either a
point or an exponent part and a C unsigned long otherwise. (You should use
tag values to distinguish numerals with values of different types.):, ,,
@, |, [, ], =,
{, }, ;, .., <, <=, >=,
>, <>, ~, +, -, *, /,
**, */, /*, (, ), ^, ^^,
#, <<, ><, and >>. You should use a different
tag value for each of these.A comment in ABC begins with a backslash character and ends at the end of the line (i.e., on GNU or Unix systems, with the next following newline character).
In addition to its value field, each token of ABC must also contain a string field indicating the file in which the token was found and an unsigned int field indicating the line on which it appeared. Note that, for a newline token, the line is the one beginning with the counted spaces, not any preceding empty or comment line.
The assignment is to design, write, and test a lexical analyzer for ABC. The main program should take one or more command-line arguments, each of which should be a file containing ABC code, and should recover and write to standard output all of the tokens from each of those files in turn, in the order named on the command line. Each token should be written on a separate line. You may choose your own (human-readable) format for displaying tokens, provided that you make all of the fields of the token visible in the output.
I strongly recommend implementing a function that reads from a given FILE * (assumed to be open for input) and either constructs and returns
the next token from that file or an otherwise out-of-range value, such as a
null pointer, indicating that the file contains no more tokens. The main
program can then deal with the mechanics of parsing the command line,
opening and closing the source files at the right times, and so on.
Please submit a tarball containing (1) the C program implementing your
lexical analyzer, which may consist of several files, (2) a set of input
files containing tests, (3) the command lines that you used to construct
the test runs, and (4) one or more output files showing the results of
those tests. (Use the shell facilities and/or the tee utility to
collect these output files.) E-mail the tarball to me as an attachment to
a message with the subject line "[CSC 362] Exercise 4".
Let's extend the prefix notation for Boolean expressions that we introduced in exercise 2 to include a conditional expression analogous to the ternary conditional expressions of C and Java, with the syntax
boolean_expression -> 'C' boolean_expression boolean_expression boolean_expression
When a conditional expression is evaluated, its first subexpression (the test) is evaluated first. If the value of the test is true, then the second subexpression (the consequent) is evaluated, and its value becomes the value of the entire conditional expression. If the value of the test is false, then the third subexpression (the alternative) is evaluated, and its value becomes the value of the entire conditional expression. The subexpression that is not selected is not evaluated.
Let's also add an assignment operation, with the syntax
boolean_expression -> 'G' variable boolean_expression
Executing an assignment causes the value of the boolean subexpression to be stored into the variable, overwriting any previous value. The value of an assignment expression is the value of its boolean subexpression, as in C or Java.
Since assignment has a side effect, we'll now also need to stipulate an evaluation order for the binary Boolean operators. Let's say that each of them must always evaluate both operands fully and that, in each case, the left operand must be fully evaluated before the right operand.
Finally, it will be useful to have a neutral binary operator that simply orders the side effects of its subexpressions, as the comma operator does in C or Java:
boolean_expression -> 'Z' boolean_expression boolean_expression
In other words, Z evaluates both of its subexpressions, like the
other binary operators, from left to right. The value of the second
subexpression becomes the value of the entire Z-expression; the
first subexpression is evaluated only for its side effect.
The exercise is to write and test a front end for this extended language of Boolean expressions -- a program that reads in a Boolean expression from a file designated by the user (as a command-line argument) and writes out, to standard output, three-address code that would compute the value of the Boolean expression.
The grammar for the three-address code that is emitted should be this:
program -> statement '\n' statements
statement -> optional_label unlabelled_statement
optional_label -> label ':'
|
unlabelled_statement -> variable '=' expression
| 'ifFalse' vorc 'goto' label
| 'ifTrue' vorc 'goto' label
| 'goto' label
| 'stop'
vorc -> variable
| 'true'
| 'false'
expression -> vorc
| 'not' vorc
| vorc 'and' vorc
| vorc 'or' vorc
| vorc 'eqv' vorc
| vorc 'xor' vorc
statements -> statement '\n' statements
|
Lexically, a label should be a capital letter L followed by the
decimal numeral for a positive integer, and a variable should either be a
lower-case letter (for variables occurring in the input expression) or the
whorl character @ followed by the decimal numeral for a non-negative
integer (for variables added by the code generator). We'll adopt the
convention that the value stored in @0 when it halts will be the
output from the three-address code program when it is run.
The '\n' notation in the first and last rules is meant to say that
every statement is terminated by a newline.
Executing the stop statement halts the execution of the
three-address-code program (no matter where it appears in the sequence of
statements.
For instance, the expression ZGpTZGqFZGrTCrNEAFqKTXrNqAqr might
yield the three-address code
p = true
q = false
r = true
ifFalse r goto L1
@1 = false or q
@2 = not q
@3 = r xor @2
@4 = true and @3
@5 = @1 eqv @4
@6 = not @5
@0 = @6
goto L2
L1: @7 = q or r
@0 = @7
L2: stop
Other, equivalent code sequences are also possible, and there are some
fairly obvious optimizations that one can make. In the extreme case, since
this language has no input mechanism, one might even incorporate an
interpreter right into the front end and reduce every program that is
semantically correct (i.e., never uses an uninitialized variable and always
assigns a value to @0) to either
@0 = true
stop
or
@0 = false
stop
However, the exercise will be of greater relevance to compiler construction if you postpone the actual evaluation of the expression and concentrate on the mechanics of producing correct three-address code.
As usual, I recommend starting by developing test cases -- simple examples of what your program should do with the values it reads in. Then write the most straightforward parser and code generator you can that will handle your test cases correctly. Then develop more test cases and adapt your program to handle them. Repeat until satisfied.
You may again assume that no expression contains more than 1023 characters. However, as several of you discovered in exercise 2, the prefix notation can be parsed with no lookahead, reading in each character of the expression only when it is needed, without ever having to store it in a buffer. This makes it possible to remove the upper bound on expression size.
Please submit a tarball containing (1) the C program implementing your
front end, which may consist of several files, (2) a set of matched input
and output files, each input file holding one of your test expressions and
each output file holding the output that your program produced for that
test expression, and (3) a Makefile directing the compilation of
your program. You may want to add a tests target to the Makefile that will, when executed, run your program on each input file and
generate a new output file. E-mail the tarball to me as an attachment to a
message with the subject line "[CSC 362] Exercise 3".
Boolean expressions like those that we discussed as an extended example in
the class session of January 29 can also be written in prefix notation.
Here's a BNF grammar for them, using T and F as Boolean
literals (for "true" and "false," respectively), N for the negation
operator, K for conjunction, A for disjunction, E for
equivalence, and X for non-equivalence (``exclusive or''):
boolean_expression -> 'T'
| 'F'
| variable
| 'N' boolean_expression
| 'K' boolean_expression boolean_expression
| 'A' boolean_expression boolean_expression
| 'E' boolean_expression boolean_expression
| 'X' boolean_expression boolean_expression
We'll let any lower-case letter of the English alphabet be a variable.
No parentheses are needed, because the arity of each operation is known, and no operator precedence or associativity indicators are needed, either, because the structure of the expression unambiguously determines the order of operations.
The goal of the exercise is to write a translator that will read in a Boolean expression in this prefix notation and write it back out again in the infix notation that we specified in class, which used this grammar:
boolean_expression -> disjunction 'iff' boolean_expression
| disjunction 'xor' boolean_expression
| disjunction
disjunction -> conjunction 'or' disjunction
| conjunction
conjunction -> negation 'and' conjunction
| negation
negation -> 'not' negation
| basic_expression
basic_expression -> 'true'
| 'false'
| variable
| '(' boolean_expression ')'
For clarity, we set off these Boolean operators from their operands with
spaces. So, for example, NEAFqKTXrNq might be translated into
not ((false or q) iff (true and (r xor not q))).
(a) Write a set of test cases for the translator -- Boolean expressions in prefix notation and their infix equivalents.
(b) Convert the prefix grammar shown above into a syntax-directed translation scheme by adding actions to it.
(c) Adapt the translation scheme, if necessary, to facilitate parsing.
(d) Write the C fragments for a syntax-directed predictive parser for the prefix grammar.
(e) Write, in C, and test a syntax-directed translator that converts an expression of that grammar into an equivalent expression of the infix grammar. Document any simplifications or optimizations that you make in the parser fragments (as the authors do in section 2.5.4 of the textbook), briefly explaining how they preserve the intended results of translation. Provide a user interface that reads Boolean expressions in prefix form from standard input, one per line, and outputs the translations, one per line, to standard output, stopping at end of file or when a syntactically incorrect input is encountered. You may assume that no input line will contain more than 1023 characters.
Part (e) is more difficult if the translation algorithm is forbidden to
introduce parentheses that are theoretically superfluous (such as the
parentheses around ``false or q'' and those around "true and
(r xor not q)" in the example above). For purposes of this exercise, a
program need not constrain the translation in this way in order to receive
full credit.
Please submit a tarball containing (1) a text file containing your syntax-directed translation scheme, with notes about any adaptations that you made in order to simplify parsing; (2) the C program implementing your translator, which may consist of several files; and (3) an input file containing the test cases you devised in part (a), and an output file containing the corresponding translations. E-mail it to me as an attachment to a message with the subject line "[CSC 362] Exercise 2".
The header file /home/stone/courses/compilers/code/lists.h supplies a data structure for singly-linked lists and defines an interface to it, supporting a vaguely Scheme-like set of primitive operations. The exercise is to document, write, and test an efficient implementation of this interface, in a file named lists.c. You can use the standard C99 libraries, but no others, and you (or your team) must write all the code; you may not copy or adapt it from other sources.
You should submit your lists.c file and any supporting files that you'd like me to see or use in evaluating your implementation, such as a test program that checks that your implementation is correct. Send an e-mail to stone@cs.grinnell.edu with the subject line "[CSC 362] Exercise 1" and attach your file(s) to it; if there are three or more such files, however, create a tarball (an archive file in .tar or .tgz format) or a ZIP archive file and attach that instead.
The exercises in this course are adapted to a programming style in which you write documentation first and revise it as needed after each change to the code, and in which you create test cases for each function before coding it. You are not required to adopt this style, but it's likely that I'll bear down on any errors in your code that appear to have resulted from unwisely rejecting it.