CSC 153 Grinnell College Spring, 2009
 
Computer Science Fundamentals
 

Linux and C

Goal

This lab continues the discussion of Linux from the introductory Linux lab and also provides examples of how C programs can be used within the Linux environment.

Standard I/O Files and I/O Re-direction

In the GNU/Linux system, there are 3 standard "files" that are always open and available for use, called stdin, stdout, and stderr. (I've put quotation marks around "files" because, while Linux treats them like files behind the scenes, they do not seem much like files to the user. For example, they can not be found in the file system tree.)

By now, we should have discussed these files in class, but for a brief review:

However, we can use "I/O re-direction" to re-assign any of these files (also called channels) elsewhere.

  1. Many Linux commands send their output to stdout, so by default their output is sent to the screen. However, with output re-direction we can send the output to a file instead.

    1. Give the command

         ls -l > dirlist.txt
      

      (Think of the "greater than" symbol as an arrow in this context, re-directing the output from ls to the file dirlist.txt.)

      Now view the file dirlist.txt with the command

         more dirlist.txt
      
    2. Give the command

         ls >> output.txt
      
    3. What is the difference between what the two symbols > and >> do? (You may want to try both with the same output file a few times to be sure.)

To explore the meaning of stdin, let's consider the Linux utility cat one more time. Like many Linux utilities that accept input from a file, specifying an input file for cat is actually optional. If we invoke cat without a file name, it takes its input from stdin (i.e., from the keyboard) instead. This comes in handy when you want to create short text files, again because you can do so without firing up a more substantial text editor.

  1. Type "cat > temp.txt", and then follow that with some text you want to insert into the file temp.txt. For example, your session could look something like the following:

       $ cat > temp.txt
       Here is the text that I want to insert.
       It is ok to include multiple lines.
       How do we stop? Just type ctrl-d.
       <ctrl-d>
    

    Recall from the previous Linux lab that ctrl-d is used in Linux as the "end of input" character. Thus, typing ctrl-d at the end of your session indicates to cat that it has received all the input you are going to send it, which causes the cat command to end.

    Review the file you have just created in an editor, such as emacs Has the final ctrl-d character has been stored in the file? Do you understand why (or why not)?

    As a final note regarding cat, here is the synopsis for cat from the man pages: "cat [OPTION] [FILE]...". The fact that FILE is enclosed in square brackets tells us the input file for cat is optional. (You should also be able to see that cat can be given multiple input files instead.)

Just as we can re-direct stdout to a file, we can also re-direct stdin to come from a file with the < symbol. Thus, if a program expects to receive its input from stdin, we can cause the input to come from a regular file instead.

I/O Redirection and C Programs

Program ~walker/c/examples/count-chars.c reads input from the keyboard (i.e., stdin) and counts the number of times each letter occurs together with the number of non-letters. The program uses character input (to be discussed in a few weeks), and the approach is reasonably straightforward. The program uses an array count[0]..count[26] to count the number of times various characters arise in the input. In particular, count[0] gives the number of a's, count[1] gives the number of b's, ..., and count[25] gives the number of z's. The last array element count[26] is used to count all non-letters.

  1. Copy count-chars.c to your account and compile it. Read through the code, and write a paragraph explaining how the code works.
    Hint: EOF is a special character used in C to designate the end of a file.

  2. Run count-chars in a terminal window, using ctrl-d to end the input. (For technical reasons related to the terminal window, you may need to hit ctrl-d twice to end the input stream.) Check that the program works correctly.

  3. Rather than read material from the keyboard, use count-chars to count the number of letters in a file by redirecting the input. For example, to determine the frequency count of characters in the program count-chars.c itself, use the command:

       count-chars < count-chars.c
    

Pipes and Filters

A final type of I/O redirection within Linux is called a "pipe" (represented by the vertical bar "|", which on many keyboards is located on the same key as the backslash). A pipe allows you to connect utilities in a "pipeline." A pipe causes the data sent to stdout from one utility to be re-directed into stdin of another utility. We say that the output from the first utility is "piped" to the input of the next utility.

  1. Use to following command that allows you to page through a long directory listing.

       ls -l /bin | less
    

The fact that many Linux utilities can accept their input from one file, multiple files, or stdin makes them very versatile. Pipes take this versatility one step farther: any command or utility that can accept input from stdin can also accept input from a pipe. Any command or utility that sends output to stdout can also send output through a pipe.

To combine more than one program, we can string multiple pipes together to create longer pipelines, like so:

   command | command | command

Additional Linux Utilities

Many Linux utilities that accept input from stdin and send output to stdout are known as "filters" because, in one way or another, they filter their input to produce output. These utilities are especially apt for creating useful pipelines. In the first laboratory on Basic Linux Commands, you have worked with several filters, including less, tail, and cat.

The following table lists some additional filters. All of these filters can accept their input from regular files, stdin, or a pipe. None of them modify their original input; rather, they generate new output that reflects a modified version of the input.

Utility Description Example usage
wc "word count" - counts characters, words, and lines in input wc -l ~/.bashrc
sort sorts lines in input sort -k2 ~coahranm/share/csc201/sciencefac.txt
uniq "unique" - removes (or reports) duplicate lines in input uniq ~coahranm/share/csc201/duplicates.txt
grep searches for a target string in input grep li ~coahranm/share/csc201/duplicates.txt
cut removes parts of lines from input cut -d' ' -f2 ~coahranm/share/csc201/sciencefac.txt
diff reports the differences between two files diff ~walker/public_html/courses/153.sp09/labs/lab-linux-c.shtml ~walker/public_html/courses/153.sp09/labs/lab-linux-c.shtml~
  1. Try each of the examples given in the table, looking up the commands as needed, to be sure you understand them.

  2. Consider the following piped command:

       count-chars < count-chars.c | grep ":" | sort -n -r -k2
    

    You already used the first step, count-chars < count-chars.c, in step 5 of this lab.

    1. What is the purpose of the command grep ":" in this piped command?
    2. Use the man page for sort to explain each of the options used above.
  3. Use the filters given above (and other utilities if needed) to perform the following tasks. Note that for some of the tasks you may need to combine filters with pipes.

    1. Count the lines of source code in a program you wrote earlier this semester for this course.

    2. Determine the number of user accounts on the MathLAN. Recall that each account has a directory in /home.

    3. Print a list (in the terminal window, not on a printer) of faculty in Grinnell College's Science Division, sorted by last name. The file ~coahranm/share/csc201/sciencefac.txt contains the data you need.

    4. Print a list of faculty in the Biology Department. Your list should not include faculty in any other department.

    5. Consider one of the programs you wrote earlier this semester. Take a quick look at it, using less to remind yourself of a variable name that is used in several places in the file. Now use grep to print a listing of the lines that include that variable. Get grep to print the line number (in the source file) for each line of output as well.

      Note that it can be very useful to use grep in this way when you return to a project after taking a long break from it. For example, you might want to find every instance of a given class -- in any source file in the project -- as part of re-acquainting yourself with your code.

    6. Print a list of all Grinnell faculty named David.

      Hint: To do this, it would be helpful to create a single list that combines all the entries in the three faculty lists I have provided. But instead of generating a separate combined file, you can do this on the fly using cat as shown below. This is where cat gets its name -- from its ability to concatenate multiple files.

         
      cat  ~coahranm/share/csc201/socialfac.txt ~coahranm/share/csc201/humanfac.txt  ~coahranm/share/csc201/sciencefac.txt
      
    7. Print a unique list of departments in the Humanities Division.

    8. The names provided in ~coahranm/share/csc201/ give the faculty for the various departments, and the Chair of each department is denoted by an asterisk. Use grep to output a list of Department Chairs in one (or all) of the divisions. It might also be nice to sort your list by last name.

A Few More Useful Linux Capabilities

In this final section of the lab, we briefly describe a few useful capabilities. In each case, these options can be worthwhile as you work within the Linux environment, but in the interests of time we provide few experiments here.

Wildcards

You are probably already familiar with the idea of "wildcards" in filenames. For example, you can use the command "ls *.ss" to get a listing of all the files with the extension .ss.

What allows this to work? The shell parses your input, discovers the asterisk in it, and "expands" the command to include all files that match the given pattern. This ability to expand commands based on special characters in the input is also called globbing.

Further, the asterisk is not the only special character used for globbing. Here are some more.

Special Character Is replaced by... Example(s)
* matches any string (including zero characters) cat ~coahranm/share/csc201/*fac.txt | less
? matches any single character ls -ld /usr/bin/gc?
ls /usr/lib/lib?.a
[...] matches any single character inside the brackets ls /usr/lib/lib[xX]*.a

Quoting

On occasion, we want to keep the shell from treating special characters specially.

There is a file with the following goofy name in my share directory:

   ~coahranm/share/csc201/goofy file name
  1. What do you expect will happen if we try to list its contents using the following command? Give it a try to be sure.

       cat ~coahranm/share/csc201/goofy file name
    

You may have an idea how to work around this problem. If so, try it to make sure it works.

In fact, there are two ways to deal with it.

  1. You can quote the file name, as follows. The single quotation marks keep the shell from treating characters inside them specially. In this case, it keeps the shell from parsing the line into four separate tokens.

       cat 'goofy file name'
    
  2. you can escape individual characters with a backslash. Again this keeps the shell from interpreting these characters in their usual (specialized) way. Be sure to try this one as well.

       cat goofy\ file\ name
    

Command Substitution

Command substitution allows you to embed one command inside another, using "backquotes" to delimit the nested command. (You should be able to find the backquote character in the upper-left of the keyboard, with the tilde.)

When you do this, the shell first executes the backquoted command, then substitutes the result that the command output to stdout in place of the command itself. Finally, the shell interprets and runs the resulting command string.

  1. Try these, looking up any of the commands you are not familiar with already.

      echo There are `ls | wc -l` files in my current working directory.
      echo Today is `date +%A`. It is now `date +%r`.
    

When we begin writing shell scripts later in the semester, you will find this ability useful for creating informative output messages.


This document is available on the World Wide Web as

     http://www.cs.grinnell.edu/~walker/courses/153.sp09/labs/lab-linux-c.html

created January 2007 by Marge Coahran
revised January 2008 by Marge Coahran
last revised 16 February 2009 by Henry M. Walker
Valid HTML 4.01! Valid CSS!
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu.