CSC 153 | Grinnell College | Spring, 2009 |
Computer Science Fundamentals | ||
This lab continues the discussion of Linux from the introductory Linux lab and also provides examples of how C programs can be used within the Linux environment.
In the GNU/Linux system, there are 3 standard "files" that are always open and available for use, called stdin, stdout, and stderr. (I've put quotation marks around "files" because, while Linux treats them like files behind the scenes, they do not seem much like files to the user. For example, they can not be found in the file system tree.)
By now, we should have discussed these files in class, but for a brief review:
However, we can use "I/O re-direction" to re-assign any of these files (also called channels) elsewhere.
Many Linux commands send their output to stdout, so by default their output is sent to the screen. However, with output re-direction we can send the output to a file instead.
Give the command
ls -l > dirlist.txt
(Think of the "greater than" symbol as an arrow in this context, re-directing the output from ls to the file dirlist.txt.)
Now view the file dirlist.txt with the command
more dirlist.txt
Give the command
ls >> output.txt
What is the difference between what the two symbols
>
and >>
do? (You may want to try both
with the same output file a few times to be sure.)
To explore the meaning of stdin, let's consider the Linux utility cat one more time. Like many Linux utilities that accept input from a file, specifying an input file for cat is actually optional. If we invoke cat without a file name, it takes its input from stdin (i.e., from the keyboard) instead. This comes in handy when you want to create short text files, again because you can do so without firing up a more substantial text editor.
Type "cat > temp.txt", and then follow that with some text you want to insert into the file temp.txt. For example, your session could look something like the following:
$ cat > temp.txt Here is the text that I want to insert. It is ok to include multiple lines. How do we stop? Just type ctrl-d. <ctrl-d>
Recall from the previous Linux lab that ctrl-d is used in Linux as the "end of input" character. Thus, typing ctrl-d at the end of your session indicates to cat that it has received all the input you are going to send it, which causes the cat command to end.
Review the file you have just created in an editor, such as emacs Has the final ctrl-d character has been stored in the file? Do you understand why (or why not)?
As a final note regarding cat, here is the synopsis for cat from the man pages: "cat [OPTION] [FILE]...". The fact that FILE is enclosed in square brackets tells us the input file for cat is optional. (You should also be able to see that cat can be given multiple input files instead.)
Just as we can re-direct stdout to a file, we can also re-direct
stdin to come from a file with the <
symbol. Thus,
if a program expects to receive its input from stdin, we can cause
the input to come from a regular file instead.
Program ~walker/c/examples/count-chars.c reads input from the keyboard (i.e., stdin) and counts the number of times each letter occurs together with the number of non-letters. The program uses character input (to be discussed in a few weeks), and the approach is reasonably straightforward. The program uses an array count[0]..count[26] to count the number of times various characters arise in the input. In particular, count[0] gives the number of a's, count[1] gives the number of b's, ..., and count[25] gives the number of z's. The last array element count[26] is used to count all non-letters.
Copy count-chars.c to your account and compile it.
Read through the code, and write a paragraph explaining how the code
works.
Hint: EOF is a special character used in C to designate
the end of a file.
Run count-chars in a terminal window, using ctrl-d to end the input. (For technical reasons related to the terminal window, you may need to hit ctrl-d twice to end the input stream.) Check that the program works correctly.
Rather than read material from the keyboard, use count-chars to count the number of letters in a file by redirecting the input. For example, to determine the frequency count of characters in the program count-chars.c itself, use the command:
count-chars < count-chars.c
A final type of I/O redirection within Linux is called a "pipe" (represented by the vertical bar "|", which on many keyboards is located on the same key as the backslash). A pipe allows you to connect utilities in a "pipeline." A pipe causes the data sent to stdout from one utility to be re-directed into stdin of another utility. We say that the output from the first utility is "piped" to the input of the next utility.
Use to following command that allows you to page through a long directory listing.
ls -l /bin | less
The fact that many Linux utilities can accept their input from one file, multiple files, or stdin makes them very versatile. Pipes take this versatility one step farther: any command or utility that can accept input from stdin can also accept input from a pipe. Any command or utility that sends output to stdout can also send output through a pipe.
To combine more than one program, we can string multiple pipes together to create longer pipelines, like so:
command | command | command
Many Linux utilities that accept input from stdin and send output to stdout are known as "filters" because, in one way or another, they filter their input to produce output. These utilities are especially apt for creating useful pipelines. In the first laboratory on Basic Linux Commands, you have worked with several filters, including less, tail, and cat.
The following table lists some additional filters. All of these filters can accept their input from regular files, stdin, or a pipe. None of them modify their original input; rather, they generate new output that reflects a modified version of the input.
Utility | Description | Example usage |
wc | "word count" - counts characters, words, and lines in input | wc -l ~/.bashrc |
sort | sorts lines in input | sort -k2 ~coahranm/share/csc201/sciencefac.txt |
uniq | "unique" - removes (or reports) duplicate lines in input | uniq ~coahranm/share/csc201/duplicates.txt |
grep | searches for a target string in input | grep li ~coahranm/share/csc201/duplicates.txt |
cut | removes parts of lines from input | cut -d' ' -f2 ~coahranm/share/csc201/sciencefac.txt |
diff | reports the differences between two files | diff ~walker/public_html/courses/153.sp09/labs/lab-linux-c.shtml ~walker/public_html/courses/153.sp09/labs/lab-linux-c.shtml~ |
Try each of the examples given in the table, looking up the commands as needed, to be sure you understand them.
Consider the following piped command:
count-chars < count-chars.c | grep ":" | sort -n -r -k2
You already used the first step, count-chars < count-chars.c, in step 5 of this lab.
Use the filters given above (and other utilities if needed) to perform the following tasks. Note that for some of the tasks you may need to combine filters with pipes.
Count the lines of source code in a program you wrote earlier this semester for this course.
Determine the number of user accounts on the MathLAN. Recall that each account has a directory in /home.
Print a list (in the terminal window, not on a printer) of faculty in Grinnell College's Science Division, sorted by last name. The file ~coahranm/share/csc201/sciencefac.txt contains the data you need.
Print a list of faculty in the Biology Department. Your list should not include faculty in any other department.
Consider one of the programs you wrote earlier this semester. Take a quick look at it, using less to remind yourself of a variable name that is used in several places in the file. Now use grep to print a listing of the lines that include that variable. Get grep to print the line number (in the source file) for each line of output as well.
Note that it can be very useful to use grep in this way when you return to a project after taking a long break from it. For example, you might want to find every instance of a given class -- in any source file in the project -- as part of re-acquainting yourself with your code.
Print a list of all Grinnell faculty named David.
Hint: To do this, it would be helpful to create a single list that combines all the entries in the three faculty lists I have provided. But instead of generating a separate combined file, you can do this on the fly using cat as shown below. This is where cat gets its name -- from its ability to concatenate multiple files.
cat ~coahranm/share/csc201/socialfac.txt ~coahranm/share/csc201/humanfac.txt ~coahranm/share/csc201/sciencefac.txt
Print a unique list of departments in the Humanities Division.
The names provided in ~coahranm/share/csc201/ give the faculty for the various departments, and the Chair of each department is denoted by an asterisk. Use grep to output a list of Department Chairs in one (or all) of the divisions. It might also be nice to sort your list by last name.
In this final section of the lab, we briefly describe a few useful capabilities. In each case, these options can be worthwhile as you work within the Linux environment, but in the interests of time we provide few experiments here.
You are probably already familiar with the idea of "wildcards" in filenames. For example, you can use the command "ls *.ss" to get a listing of all the files with the extension .ss.
What allows this to work? The shell parses your input, discovers the asterisk in it, and "expands" the command to include all files that match the given pattern. This ability to expand commands based on special characters in the input is also called globbing.
Further, the asterisk is not the only special character used for globbing. Here are some more.
Special Character | Is replaced by... | Example(s) |
* | matches any string (including zero characters) | cat ~coahranm/share/csc201/*fac.txt | less |
? | matches any single character | ls -ld /usr/bin/gc? ls /usr/lib/lib?.a |
[...] | matches any single character inside the brackets | ls /usr/lib/lib[xX]*.a |
On occasion, we want to keep the shell from treating special characters specially.
There is a file with the following goofy name in my share directory:
~coahranm/share/csc201/goofy file name
What do you expect will happen if we try to list its contents using the following command? Give it a try to be sure.
cat ~coahranm/share/csc201/goofy file name
You may have an idea how to work around this problem. If so, try it to make sure it works.
In fact, there are two ways to deal with it.
You can quote the file name, as follows. The single quotation marks keep the shell from treating characters inside them specially. In this case, it keeps the shell from parsing the line into four separate tokens.
cat 'goofy file name'
you can escape individual characters with a backslash. Again this keeps the shell from interpreting these characters in their usual (specialized) way. Be sure to try this one as well.
cat goofy\ file\ name
Command substitution allows you to embed one command inside another, using "backquotes" to delimit the nested command. (You should be able to find the backquote character in the upper-left of the keyboard, with the tilde.)
When you do this, the shell first executes the backquoted command, then substitutes the result that the command output to stdout in place of the command itself. Finally, the shell interprets and runs the resulting command string.
Try these, looking up any of the commands you are not familiar with already.
echo There are `ls | wc -l` files in my current working directory. echo Today is `date +%A`. It is now `date +%r`.
When we begin writing shell scripts later in the semester, you will find this ability useful for creating informative output messages.
This document is available on the World Wide Web as
http://www.cs.grinnell.edu/~walker/courses/153.sp09/labs/lab-linux-c.html
created January 2007 by Marge Coahran revised January 2008 by Marge Coahran last revised 16 February 2009 by Henry M. Walker |
![]() ![]() |
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu. |