Lab exercise #10: Parsing parenthesized expressions

Course links

External links

The parenthesis-mating problem

One of the traditional difficulties for novice Scheme programmers is making sure that the parentheses in each expression are correctly balanced and nested. Today we'll write a Java class that provides several methods that can help with this problem.

A correctly balanced and nested expression either contains no parentheses at all, or else consists of a left parenthesis, zero or more correctly balanced and nested expressions, and a right parenthesis, in that order. For the purpose of parenthesis-matching, we'll ignore everything except parentheses; our objective is simply to make sure that every left parenthesis precedes and mates to a right parenthesis, and vice versa, and that everything that is enclosed in a mated pair of parentheses is similarly balanced.

Using a stack of unmated left parentheses

We can test this property with the help of a stack, as follows. We start with an empty stack. Given a string -- the expression to be checked -- we walk through the characters of the string from left to right, looking at each one. Whenever we find a left parenthesis, we push it onto the stack. Whenever we find a right parenthesis, we pop the stack. We ignore all other characters. If, at the end of this process, the stack is once more empty, then we encountered exactly the same number of left and right parentheses; moreover, each right parenthesis can be mated to the left parenthesis that we popped off the stack when we encountered it.

A stack is the appropriate data structure to use here, because of its "last in, first out" discipline: Any right parenthesis that we encounter should be mated to the most recently encountered left parenthesis that has not already been mated (and hence removed from the stack).

What if the nesting of the parentheses is incorrect, as in the string "())())((()", say? Having equal numbers of left and right parentheses doesn't guarantee that they can be mated up correctly. Using a stack takes care of that problem, too: In processing an expression with incorrect nesting, we'll always encounter a point at which we try to pop an empty stack. In that case, we'll just catch the resulting exception and report that the expression isn't correctly nested and balanced.

  1. Create a paren package, and within that package a Parser class. Write a method isBalanced for the Parser class that takes a String as argument and returns a boolean value indicating whether the parentheses in that string are correctly nested and balanced, using the stack strategy described above. The stack can be a local variable of the method. You can use java.util.Stack, java.util.LinkedList, or a Stack class of your own design and construction, as you prefer.
  2. Add a second method, closeOff, that takes a String as argument and returns a similar String, but with as many right parentheses added at the end as are required to mate all the unmated left parentheses in the given string. So, for instance, if the argument is "((() ()) (()", closeOff should return "((() ()) (()))".

Different bracketing symbols

Traditionally, grouping in mathematical expressions is indicated not only by parentheses, but also by square brackets, curly braces, angle brackets, and even more exotic symbol pairs. Usually, such expressions aren't considered well-formed unless the symbol that is placed at the left end of a group mates to a symbol at the right end that is its mirror image in shape -- a left square bracket to a right square bracket, a less-than character to a greater-than character, and so on.

  1. Adapt the isBalanced method so that it pushes any of the left-end characters (left parenthesis, left square bracket, left curly brace, less-than character) onto the stack when it is encountered, and pops the stack when any of the right-end characters (right parenthesis, right square bracket, right curly brace, greater-than character) is encountered, but returns false if it discovers that the left-end character popped in this way is not the mirror-image character of the right-end character that has just been encountered. (So, for instance, giving it the argument "({<}>)" should result in false.)
  2. Adapt the closeOff method to finish off an unbalanced String, as in the previous section, but have it supply the correct right-end mates for the unmated left-end characters in the given String.
  3. Give the Parser class a constructor that takes two String values as arguments. The two strings should be of equal length. A Parser should treat the characters in the first string as left-end characters and those in the second string as the corresponding right-end characters. Adapt the isBalanced and closeOff methods to treat these characters as the grouping symbols. (So new Parser("([{<", ")]}>") should reproduce the behavior described in this section, and new Parser("(", ")") the behavior from the previous section.)

Extras

  1. Add to Parser a method that takes a correctly balanced string returns a list containing every substring of that string that begins with a left-end character and ends with the mated right-end character. (Hint: When you push the left-end character onto the stack, store its position in the string along with it.)
  2. The methods that we have so far implemented for Parser will work pretty well on actual Scheme code, provided that none of the comments, string literals, or character literals that occur in the code contains any of the symbols used for grouping. If we can just arrange for isBalanced and closeOff to ignore comments, string literals, and character literals, we have a working utility. Write a static method that takes any String as argument and returns a similar string, but with the Scheme comments, character literals, and string literals removed. (Note: Any semicolon that occurs in the text of a Scheme expression but is not part of a string or character literal marks the beginning of a comment that ends at the next following newline character. Any occurrence of the sequence #\ that is not part of a string literal marks the beginning of a character literal. The rest of the character literal, after the #\, is either a single character or space or newline. And any occurrence of a double quotation mark that is not part of comment or character literal indicates the beginning of a string literal that ends at the next following unescaped double quotation mark. As one traverses the interior of the string literal from left to right, any unescaped occurrence of a backslash causes the character that follows it to be escaped.)
  3. Revise isBalanced and closeOff so that they strip out all Scheme comments, character literals, and string literals before attempting to mate the parentheses. Make sure that the string that closeOff returns retains all of these comments, character literals, and string literals, however.