Due: You will do this lab in class on Wednesday, October 31. You are not required to turn in your work; however, if you don't finish the lab in class, you may wish to finish it on your own.
Goals:Collaboration: Please work in groups of 2-3.
Contents:Open a terminal window and start an interactive python session.
As you work, be mindful of good pair programming practices: The navigator should be actively engaged in planning and reviewing the code, but should let the driver drive. The navigator and driver should periodically switch roles, so that everyone has a chance to drive.
In the exercises before class, you may have written a regular expression that matches the word 'spam', but not 'SPAM', 'Spam', or 'SpaM'. Let's rectify that by writing some expressions that are case insensitive (that is, they treat alphabetic characters as the same whether they are upper or lower case.)
The documentation of the re module includes a list of flags that can be used to alter the interpretation of regular expressions. The VERBOSE (or X) flag should be familiar from the reading. The IGNORECASE (or I) flag looks like it should be helpful here.
(a) Here are a pair of examples with and without the re.IGNORECASE flag. Try them and see what happens.
re.findall(r'(spam)', 'I Love Spam! Eat spam at Spamalot!')
re.findall(r'(spam)', 'I Love Spam! Eat spam at Spamalot!', re.IGNORECASE)
(b) You might want to modify the expression you wrote before class so that it replaces all instances of 'spam' regardless of case. Unfortunately, the re.sub() procedure does not have a flags parameter!
Do a Google search for "python case insensitive re.sub" and try to find a solution. (Ask me if you don't find it within a couple of minutes.) Then modify the following expression so that 'Spam' and 'spam' are both replaced with 'SPAM (TM)'.
re.sub(r'spam', 'SPAM (TM)', 'I Love Spam! Eat spam at Spamalot!')
I've developed an expression that matches course numbers such as CSC223 and ANT104. Each match tuple contains the department (e.g., CSC) and the number (e.g., 223). Departments must be exactly three capital letters. Course numbers must be exactly three digits, starting with 1, 2, or 3.
>>> pattern = re.compile(r'\b([A-Z]{3,3})([123]\d\d)\b')
>>> pattern.match('CSC223').groups()
('CSC', '223')
>>> pattern.match('ANT104').groups()
('ANT', '104')
(a) Try using this pattern to match the course numbers of some courses you have taken.
(b) Some people put a space between the department and the number. Modify the regular expression so that it permits, but does not require, a space between the two parts.
(c) View the registrar's official list of course offerings. How are the 'official' course numbers different from the informal ones I've described here?
(d) You probably noticed that the official course numbers have a dash between the department and the number. Modify your expression so that it permits either a space or a hyphen between the two parts, but not both.
(e) The official course numbers also include a section number. Modify your regular expression so that it can parse course numbers of the form CSC-151-01 or TUT-100-02 and report the section number along with the department and course number.
(f) FINALLY, note that some courses have lab sections, such as CSC-211L-01. Modify your regular expression so that it can parse course numbers with the optional L designation.
Switch navigator and driver if you haven't already!
Consider breakfast orders of the following form."""EGGS: Over easy
MEAT: SPAM (TM)
BREAD: English muffin
BEVERAGE: coffee, OJ"""
"""EGGS: scrambled
MEAT: None
BREAD: Whole wheat, very dark
BEVERAGE: tea, Earl Grey, hot"""
The labels are completely predictable, but the part we're interested in---what the diner wishes to eat---is not.
(a) Write an expression to parse the EGGS line, e.g.,
re.search(r'YOUR REGEXP HERE', 'EGGS: Over easy').groups()
(b) Write an expression to take a string containing a complete order of the form above and produce a tuple consisting of (eggs, meat, bread, beverage). For example:
('Over easy', 'SPAM (TM)', 'English muffin', 'coffee, OJ')
An important hint: Recall that . matches any single character and \n matches the end of a line. It might also help you to know that \s matches any single whitespace character (see PPR, p. 116).
Many of you have discovered that text editors sometimes use tabs, and sometimes use spaces. This can make a file difficult to read when it is opened in a text editor using different tab stops. One obvious solution to this problem is to write a script that replaces each tab character with 8 spaces.
(a) Write an expression that uses re.sub() to replace each instance of the tab character (which is represented by \t) with eight spaces (which, incidentally, you can write as ' '*8). You might consider the following strings as examples: '\tspam', 'coffee\ttea\tjuice', ' Look ma, no tabs!'
(b) Write an expression that does the same thing using the replace() method of string objects (which is tried and abandoned in section 7.2 of the reading).
(c) Which approach do you think is more efficient? Why? Can you think of any other reasons to use one approach rather than the other?
(d) Open another terminal window and open a file named tabs-to-spaces.py. Paste the text below into the file and write the file.
import sys
lines = sys.stdin.readlines() # Read all lines from standard input to a list of lines
for line in lines:
sys.stdout.write(line) # Write each line back to standard output
(e) Invoke your program (on itself!) as follows:
python tabs-to-spaces.py < tabs-to-spaces.py
The < character in the command says "read standard input from the following file."
You'll notice the output is really boring: It just prints the contents of the file, like the cat program does.
(f) Modify this script so that it actually does what the name says: Replaces tabs with spaces before writing the line back out.
You can redirect the output to a new file with a command such as the following:
python tabs-to-spaces.py < tabs-to-spaces.py > tabless-tabs-to-spaces.py
who command(a) In a new terminal window, run the who command to list all active logins on your workstation. Do your best to understand the structure of the output.
(b) In your interactive python session, cut, paste, and test the
following procedure, which allows you to obtain the output of the who command as a list of lines.
import os
def who():
f = os.popen('who') # Open a pipe to read the results of the who command
lines = f.readlines()
f.close()
return lines
(c) Write a new procedure, currentUsers(), that calls the who() procedure and returns a list containing only the usernames of those who are currently logged in.
For example,
>>> currentUsers()
['davisjan', 'davisjan', 'davisjan']
(d) If your wrote your procedure using regular expressions, rewrite it using the split() method of the string class instead.
(e) If you have extra time, figure out how to remove duplicate usernames from the result of currentUsers().
Janet Davis (davisjan@cs.grinnell.edu)
Created October 23, 2007