I have a text file that contains the raw data collected in an imaginary political poll of somewhere between 1200 and 1500 Iowa residents. Each person's responses occupy one line of the file, and the line is supposed to be in the following format:
480669729 D X-- YNYY-YYNYNNY-YYY 08 79
The assignment is to write a program that will determine and print out the answers to the following questions:
Do all the lines of the data file actually follow the prescribed format? Reprint each non-conforming line, labelled with its line number in the source file, and do not use any information recovered from that line in answering subsequent questions.
Was any individual polled more than once? Print out the Social Security number of anyone who was polled more than once and discard all of that individual's responses; do not use any information provided by such individuals in answering subsequent questions.
How many Democrats, Republicans, independents, and ``other'' persons participated in the poll? How many residents of Poweshiek County (county 79)? How many non-residents of Poweshiek?
Among the Democratic participants, what percentage voted for Clinton? for Dole? for Perot? What percentage did not vote for any of these three?
For each of the sixteen issue questions: Of the Democratic voters who answered either ``yes'' or ``no,'' what percentage answered ``yes''? What about Republicans? independents? ``others''? Clinton voters? Dole voters? Perot voters? residents of Poweshiek county? non-residents of Poweshiek county? (It may happen that all the members of one of these categories give the ``don't know'' answer to a given issue question. In such a case, your program should print out two stars---**---instead of trying to compute a percentage.)
For each interviewer code xx, how many persons did interviewer xx poll?
The file /u2/stone/courses/scheme/html/exercise-8.dat contains sample data that you can use as a test run. (The file /u2/stone/courses/scheme/html/exercise-8.output shows the results of that test run, in one possible format.) However, I'll be running your programs on one or more other data sets, so don't tailor the code to the sample data.
This document is available on the World Wide Web as
http://www.math.grin.edu/~stone/courses/scheme/exercise-8.html
created November 11, 1997
last revised November 11, 1997