Multi-Processing Application: Counting Sentences and Words in a File
Goals: This lab ties together several components of this course,
including multi-processing, communication between processes using shared
memory, access of files, character processing, and the use of loop
invariants to develop programs.
Problem: Write a program that reads a text file and computes the
following basic analysis of sentences and words:
-
the number of sentences in the file,
-
the number of words in each sentence,
-
the total number of words in the file, and
-
the average number of words per sentence.
Definitions: While a careful analysis of words and
sentences might require complex rules on sentence structure, work for this
lab depends on the following simple rules.
Words: A word is defined as any of the following:
-
a sequence of (one or more) letters, or
-
a sequence of letters, followed by a hyphen or an apostrophe, followed by a
sequence of letters.
Sentences: A sentence is defined as
-
any sequence of words, followed by a period, question mark, or exclamation
point, EXCEPT that
-
a period following a capital letter is not considered to be the end of a
sentence.
Examples:
Here are three sentences, each with exactly eight words:
- Henry M. Walker teaches three courses this semester.
- Travel through O'Hare airport slows greatly during rain!
- The Readers-Writers Problem involves independent processes that
communicate.
Work for this Lab: Program ~walker/c/concurrency-linux/read-write-2.c contains
version 2 of the Readers-Writers problem from the class concurrency
handout, with minor modifications for Linux and for output.
Using this program as a base, this lab asks you to make the following
modifications:
-
Change the buffer size and/or access as needed, so that the Writer will
place character data in the buffer rather than integers, and so that the
Reader will extract this character data. (The
in and
out pointers will still need to specify logical integer
addresses, but buffer will need to be an array of characters.)
-
Modify the Writer process, so that it reads a file name from standard
input, then reads successive characters from this file, and places those
characters in the buffer. When the file has been completely read, the
Writer should place an
EOF character in the buffer before it
terminates.
-
Modify the Reader process, so that it reads successive characters from the
buffer and processes each character as needed. Two types of output are
required:
-
When a sentence is completely read, the Reader should print the sentence
number and word count for that sentence.
-
When all sentences are read, the Reader should print the total number of
sentences and the average number of words per sentence.
Notes:
-
Processing in the Reader and in the Writer should each involve only 1
(non-nested) loop, based on the processing of a single character. (No other
loops are allowed in this program.)
-
In the case of the Reader process, one or more additional variables may be
needed to keep track of the type of previous character read.
-
In your code, you must specify explicitly a loop invariant for each loop.
Work to turn in:
-
Use the specified format for submitting
assignments to list and run the code.
-
Write a paragraph or two about what cases need to be checked as part of a
complete package of testing, explain how your testing includes these cases,
and comment on the correctness of your results.
-
Testing of the program must include processing of the file ~walker/195/course-info .
This document is available on the World Wide Web as
http://www.cs.grinnell.edu/~walker/courses/195.fa01/lab.multi-processing.html