Computational linguistics · Exercises
Fall, 2010 · Department of Computer Science · Grinnell College

Exercise 2

Some text-messaging systems designed to work on mobile phones that provide only a numeric keypad, not a full typewriter keyboard, use a set of interface conventions collectively called T9 to process the user's keystrokes. Most implementations of T9 map letters to digits as telephones in the United States have done for seventy years, with extensions to accommodate the letters Q and Z and the most common punctuation marks:

The sender of the message inputs the digits to which the letters of the message are mapped (for instance, 43556196753 for Hello,world). The phone's input software guesses which of the possible letter combinations was intended with the help of a stored dictionary of common. In cases where the software cannot determine which of several possibilities is correct (e.g., 668437 might be mother, motifs, or movies), it provides the message sender with a menu of choices. In cases where the software cannot find any plausible choice in its dictionary, it can fall back on an alternative mechanism in which the user selects each letter from a menu or by repeated taps on the same key.

Part (a) of the exercise is to write a Stuttgart finite-state transducer that, in generation mode, maps any sequence of letters and common punctuation marks to the corresponding digit string.

In analysis mode, the transducer you write will input any digit string (containing only the digits 1 through 9) and output all the sequences of letters and common punctuation marks that map to it. There will be many of these, and almost none of them will be English words.

Part (b) of the exercise is to design, write, and test a Scheme procedure that takes as argument a string made up of digits 1 through 9 and returns as value a list of the words that map to that string and occur in the file /usr/share/dict/words.

This exercise will be due on Friday, October 1.

Exercise 1

To indicate that an English noun has plural number, one usually adds /əz/ (in some dialects, /ɨz/) if the noun ends in /s/, /z/, /ʃ/, /ʒ/, /tʃ/ or /dʒ/, /s/ if the noun ends in any other unvoiced consonant, or /z/ if the noun ends in any other voiced consonant or in a vowel.

This phonological regularity is not reflected in English spelling, however, which notoriously does not correspond exactly to pronunciation. Is there a comparable regularity for text? In other words, can one deduce the written plural form of a regularly inflected English noun from its (singular) reference form?

Write a Scheme procedure that takes as its argument a string representing the reference form of an English noun and returns its plural form. Suggest at least one way to test this procedure, and implement the test. Create an R6RS Scheme library that contains and exports both the pluralizing procedure and a procedure that automatically tests it.

To submit the source code for your library and any supporting files that you'd like me to see or use in evaluating your implementation, send an e-mail to stone@cs.grinnell.edu with the subject line "[CSC 205] Exercise 1" and attach your file(s) to it.

This exercise will be due on Monday, September 13.

I am indebted to David Rosen for calling my attention to errors in an earlier statement of this exercise.

· ·
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 United States License.

This text is available on the World Wide Web as

http://www.cs.grinnell.edu/~stone/courses/computational-linguistics/exercises.html


John David Stone · stone@cs.grinnell.edu