Exam 2 – Expanding Your Scheme Skills

Assigned
Wednesday, Oct 10, 2018
Due
Wednesday, Oct 17, 2018 by 10:30pm
Collaboration
You may not receive help from any person other than the instructor for this exam. See the exam procedures for a detailed description of the resources you can and cannot use.
Submitting
Upload your completed exam file to http://gradescope.com. To upload your exam, click on CSC151-01 on your gradescope dashboard, click Exam 2, and then drag your exam file to the upload window. If you want to update your exam you can resubmit the exam file; I will only grade the last version you upload before the deadline.

Please read the exam procedures page for policies, turn-in procedures, and grading details. If you have any questions about the exam, check the Q&A at the bottom of this page. We will generally spend some time in class every day on questions and answers while the exam is in progress.

While the exam is out, please check back periodically to see if we have reported any new errata.

Complete the exam using the exam2.rkt starter source code.

Problem 1: Finding the Median of Three Values

Topics: Booleans and Predicates, Conditionals

We have seen a number of different ways to conditionally execute Scheme code in this course. For this problem, you will use three different techniques to implement a procedure that finds the median of three values. You will first write documentation for a median3 procedure that should apply to all of your implementations. Consider the following example uses of median3:

> (median3 1 2 3)
2
> (median3 -100 5 6)
5
> (median3 3 3 4)
3
> (median3 15.6 19.2 128)
19.2
> (median3 19 2 145)
19

You will implement this procedure three different ways: using if, using cond, and finally using short-circuit evaluation. Your implementations may only use numeric constants, lambda, and, or, not, <, <=, =, >=, >, +, and - unless otherwise specified below.

a. Write 6P-style documentation for the procedure (median3 x y z). This procedure should return the median value of the three numeric parameters.

b. Write, but do not document, the procedure (median3-if x y z), which should find the median of three values using at least one if. In addition to the if expression(s), you may only use the procedures and syntax listed above.

c. Write, but do not document, the procedure (median3-cond x y z), which should find the median of three values using cond. In addition to the cond expression(s), you may only use the procedures and syntax listed above.

d. Write, but do not document, the procedure (median3-short-circuit x y z), which should find the median of three values using the short-circuit evaluation of and and or. You may only use the procedures and syntax listed at the top of this problem.

Problem 2: Selecting Columns from a Table

Topics: Tables, Heterogeneous Lists

Many of the datasets we will look at in this class contain far more information than what we need for specific tasks. Sometimes this information is contained in extra rows of a table that we can filter out, but other times a table will contain unnecessary columns. Write and document a procedure (select-columns table column-indices) that takes a table as input along with a list of numbers called column-indices. This procedure should then produce a table that contains only the requested column indices.

The following examples all use the US ZIP Code dataset from the reading data from files lab. The examples assume you have the file us-zip-codes.csv on your desktop.

> (define zips (read-csv-file "~/Desktop/us-zip-codes.csv"))
> (take (select-columns zips (list 0 1 2)) 3)
'(("00501" 40.922326 -72.637078)
  ("00544" 40.922326 -72.637078)
  ("00601" 18.165273 -66.722583))
> (take (select-columns zips (list 0 3 4)) 3)
'(("00501" "Holtsville" "NY")
  ("00544" "Holtsville" "NY")
  ("00601" "Adjuntas" "PR"))
> (take (select-columns zips null) 3)
'(() () ())
> (take (select-columns zips (list 5)) 3)
'(("Suffolk") ("Suffolk") ("Adjuntas"))

Do not forget to test your procedure carefully. You are required to show at least three additional example uses of your procedure to receive full credit on this problem.

Reminder: List and column indices start at zero in Scheme.

Request: Please do not include your username in the examples. You can replace your username with “student” after running your examples, or use the “~” character to refer to your home directory without using your username as in the examples above.

Problem 3: Whatzitdo?

Topics: lists, local bindings, code reading, documentation

Consider the following interesting procedure definition.

(define s (lambda (l) (let ([a (iota (length l))]) (let ([a (map
even? a)]) (let ([a cadr] [b (map list l a)]) (let* ( [c (filter a b)])
(list (map car c) (map car (filter (negate cadr) b))))))))) 

Determine what this procedure does and then do the following.

a. Reformat the procedure so that it follows reasonable standards. (You need not show this version; we only want to see the version in part b.)

b. Rename the procedure, parameters, and local variables so that they will make sense to the reader.

c. Write 6P-style documentation for the renamed procedure.

d. Explain how the procedure achieves its goal.

Problem 4: Plotting Census Data

Topics: Reading Data from Files, Heterogeneous Lists, Tables, Predicates, Conditionals, Displaying Data

We collect large datasets for different purposes and use them to predict some possibilities or generate new information. For this experiment, we will be using a dataset extracted by Barry Becker from the 1994 Census database, originally obtained from the UCI Machine Learning Repository. The dataset is available in the file uci-census.csv.

The original use for this dataset was to build a model that predicts whether an individual will make over $50,000 a year based on a collection of attributes. You can read about the dataset and see a list of attributes in the dataset’s codebook uci-census-codebook.txt.

Prepare a Data Table

First, write code to load the dataset into a table. You do not need to put the loading or selecting code into a procedure; you can just define a variable census that will hold the data from this dataset.

Count Individuals in Groups

In preparation for plotting our dataset, write Scheme code that allows you to count the number of individuals listed as Male or Female in one of the other categories. For example, you could report the number of Male and Female individuals in each working class, education level, marital status, or race. You will likely want to write a procedure to help with this process. If you do so, please document this procedure with 4Ps. Please include the results produced by your counting procedure in your example output for this problem.

Prepare Results for Plotting

You will be plotting the results from your counting procedure using a histogram. Your counts will need to be in a specific format to support plotting. For example, this output shows the expected counts for each workclass, broken down by sex; in this case, the first number is the count of male individuals in the group, and the second count is the number of female individuals in the group. Your solution must produce data in this form automatically; a solution that requires you to manually copy values into a data format will not receive full credit.

'(("State-gov" (809 489))
  ("Self-emp-not-inc" (2142 399))
  ("Private" (14944 7752))
  ("Federal-gov" (645 315))
  ("Local-gov" (1258 835))
  ("?" (997 839))
  ("Self-emp-inc" (981 135))
  ("Without-pay" (9 5))
  ("Never-worked" (5 2)))

Plot Your Results

Present your results with a stacked histogram plot. You should include the code to generate the plot from the original dataset in your submission; I will re-run the code to generate the plot. The example output below shows the plot for the example data in the previous part.

Plot of individuals grouped by employment type and sex

You may find it useful to adjust the plot width using the optional #:width parameter to plot, especially if you choose a dataset variable with many possible values.

Discuss Your Results

Finally, explain in 4–5 sentences what this plot tells you about the dataset, the individuals in the dataset, or the types of observations in the dataset.

Problem 5: Counting List Values

Topics: Lists, Recursion, Checking Preconditions

We have seen procedures to check if a list contains a particular value, or to find the location of a value. We might also like to know how many times a particular value appears in a list. You can imagine that this procedure would be a useful building block for tally-all.

Write and document the procedure (count-occurrences lst val), which returns the number of times val appears in lst. Your implementation must check all preconditions in addition to specifying them in your 6P-style documentation.

Your procedure must produce the following example output. Do not forget to come up with at least three of your own examples to check your implementation.

> (count-occurrences (list 1 2 1 3 1 4 1 5 1 6) 1)
5
> (count-occurrences (iota 5) -1)
0
> (count-occurrences "hello" #\l)
Error! Error: count-occurrences expected a list for the first parameter.
> (count-occurrences (string->list "hello") #\l)
2
> (count-occurrences (list 3 3.0 (+ 1 2)) 3)
2
> (count-occurrences (list "hello" "world" "hello" "class" "hello" "exam2") "hello")
3

Hint: While the (list-contains lst val) procedure we saw in class used recursion, you do not necessarily have to use recursion in your solution.

Questions and Answers

We will post answers to questions of general interest here while the exam is in progress. Please check here before emailing questions!

Problem 1

Will the inputs for median3-_ always be in ascending order?
No. I have added an exmaple that shows the correct behavior for other input orders.
How should we handle cases with duplicated values?
There are three possible situations with duplicate values:
  1. The larger value is duplicated. In this case the median and the max are the same.
  2. The smaller value is duplicated. In this case the median and the min are the same.
  3. All three values are the same. In this case the median is the same as both the min and max.

Problem 2

Does our select-columns procedure have to work for any kind of data?
You can assume the input will be a table (a list of lists).

Problem 3

The code may have some redundant or unused expressions. Should we rewrite it to clean up the code?
You can remove code you think is unused. If there are redundant expressions you can eliminate them by using let to bind that expression with a name. You do not need to remove anything to receive full credit on this problem if you can make the code readable.
Should we rename the variables that are set in let expressions?
Yes!
Can we restructure the code to simplify it?
As long as the procedure works in more or less the same way that’s fine.

Problem 4

Will we receive full credit if our code to tally entries in each category is dataset-specific?
No, but the penalty will be small. You can write code that extracts counts of Male and Female respondents for each category by repeating your code to earn 90% on the problem, but I will give full credit only to answers that use a more general-purpose solution that could work for any of the discrete columns in the dataset.
Does it matter whether “Male” or “Female” counts come first in our processed data?
No, as long as your legend is correct.
The CSV file gives us values with spaces around them. What gives?
Welcome to the real world! You only need to clean up values when it makes it hard for you to do your work.
Do we have to document procedures for this problem with 6Ps?
No. 4Ps is fine.
Is there a minimum number of stacked bars we need?
Not really, as long as there’s a bar for every column in the table.
Do we need to write generic code for all parts of this problem, including setting axis labels on the plot?
No. The only part that needs to be generic to receive full credit is the part that wrangles the data into the appropriate form for plotting with a stacked histogram.
What do we need to show in our examples for this problem?
You can just give one long example. This will be all the steps you run to get from a table to your plot.

General Questions

Are we allowed or required to use recursion?
You do not have to use recursion, but you can.
Do we need to document locally-defined procedures?
No.
Why can’t I write a recursive local procedure?
You have to use letrec. You can read ahead and use this if you like.
Are we allowed to talk to others about general issues from the first few weeks of class, such as the syntax of section?
No. We’ve had too many problems with “slippery slope” issues.
Questions should go to the instructors.
What is a general question?
A question that is about the exam in general, not a particular problem.
Do all the sections have the same exam?
Yes.
Can we still invoke the “There’s more to life” clause if we spend more than five hours on the exam?
Yes. However, we really do recommend that you stop at five hours unless you are very close to finishing. It’s not worth your time or stress to spend more effort on the exam. It is, however, worth your time to come talk to us, and perhaps to get a mentor or more help (not on this exam, but on the class). There’s likely some concept you’re missing, and we can help figure that out.
What do you mean by “implement?”
Write a procedure or procedures that accomplish the given task.
Do we have to make our code concise?
You should strive for readable and correct code. If you can make it concise, that’s a plus, but concision is secondary to readability and correctness. Long or muddled code is likely to lose points, even if it is correct.
Much of your sample 6P-style documentation has incomplete sentences. Can we follow that model? That is, can we use incomplete sentences in our 6P-style documentation?
Yes, you can use incomplete sentences in 6P-style documentation.
You tell us to start the exam early, but then you add corrections and questions and answers. Isn’t that contradictory? Aren’t we better off waiting until you’ve answered the questions and corrected any errors?
We think you’re better able to get your questions answered early if you start early. Later questions will generally receive a response of “See the notes on the exam.”
How do we know what our random number is?
You should have received instructions on how to generate your random number on the day the exam was distributed. If you don’t have a number, ask your professor for one before submitting your exam.
To show we’ve tested the code informally, would you just like us to just post the inputs we used to test the procedure? If so, how should we list those?
Copy and paste the interactions pane into the appropriate place in the definitions pane. Put a #| before the pasted material. Put a |# after the pasted material.
Should we cite our partner from a past lab or assignment if we use code from a past lab or assignment?
You should cite both yourself and your partner, although you should do so as anonymously as possible. For example “Ideas taken from the solution to problem 7 on assignment 2 written by student 641321 and partner.”
If we write a broken procedure and replace it later, should we keep the previous one?
Yes! This will help us give you partial credit if your final procedure isn’t quite right. But make sure to comment out the old one using #| and |# or semicolons. You might also add a note or two as to what you were trying to do.
Do I need to document helper procedures?
You must document helpers using the 4Ps.
Can I use procedures that we have not covered in class?
Probably not. Ask about individual procedures if you’re not sure.
Can I use list-ref? I’d swear we’ve used it, but I can’t find where.
You may certainly use list-ref.
I discovered a way to define procedures without using lambda that looks like (define (proc params) body). Can I use that?
No.
Is extra credit available on this exam?
No explicit extra credit is available. However, we may find that you’ve come up with such wonderful answers that we cannot help but give you extra credit. (Don’t laugh. It happens almost every time.)
DrRacket crashed and ate my exam. What do I do?
Send your instructor the exam file and any backups that DrRacket seems to have made. (Those tend to have a similar file name, but with pound signs or tildes added at the front or back.)
How do I know when you’ve added a new question or answer?
We try to add a date stamp to the questions and answers.
Should we cite readings from the CSC 151 Web site? If so, how much information should we give?
Yes. The title and URL should suffice.
Can I use other websites as a reference for the exam?
As the exam policies page states, “This examination is open book, open notes, open mind, open computer, open Web.” Note that it also states “ If you find ideas in a book or on the Web, be sure to cite them appropriately.”
Do we have to cite the exam itself?
No.
How do I generate a random six digit number?
(random 1000000). If you end up with a number less than six digits, add zeros to the left side of the number until you have six digits.
Some of the problems are based on a recent homework assignment. Should I cite my work on that assignment?
If you refer to your work on that assignment, you should cite it.
How do I cite my work on an assignment?
Something like “Student 123456 and partner. Homework Assignment 3. CSC 151.01. 6 September 2018.”
I see the comment ; STUB on some of the procedures. What does that mean?
We use the term “STUB” for a procedure that gives some default value rather than the correct one. Stub procedures are useful when you need the procedure to exist during development, but have not yet had time to implement it. You should remove these comments as you replace stub procedures with real implementations.
How exhaustive do our postconditions need to be when writing 6P-style documentation?
Your postconditions are not a test suite, but they should capture all of the procedure’s behavior. You do not need to give specific input/output examples, but describe what is true about the output as much as you can in a few concise lines.
Are there any requirements for informal tests?
Sort of. They should be good tests, though they need not be exhaustive. Your informal tests only count if they are different from the examples on the exam.
Do we need to show examples of running our helper procedures?
No, but you should be checking them carefully. You don’t need to include these examples in your exam.

Errata

Please check back periodically in case we find any new issues.

Problem 2

  • The starter code included a documentation block, but had filled in median3 as the procedure instead of select-columns. [+1 point]

Problem 4

  • The description of the dataset was missing the word “by”. [+1/2 point]
  • The word “number” was misspelled as “numer”. [+1/2 point]
  • The word “output” was misspelled as “ouptut”. [+1/2 point]
  • The word “heterogeneous” was misspelled as “heterogenous”. [+1/2 point]

Citations

Some of the problems on this exam are based on—and at times copied from—problems on previous exams for the course. Those exams were written by Charlie Curtsinger, Janet Davis, Rhys Price Jones, Titus Klinge, Samuel A. Rebelsky, John David Stone, Henry Walker, and Jerod Weinman. Many were written collaboratively, or were themselves based upon prior examinations, so precise credit is difficult, if not impossible.

Some problems on this exam were inspired by conversations with our students and by correct and incorrect student solutions on a variety of problems. We thank our students for that inspiration. Usually, a combination of questions or discussions inspired a problem, so it is difficult and inappropriate to credit individual students.