CSC 213, Fall 2006 : Schedule : Lab 10

Lab 10: A Simple Web Client and Server

Goals: 

Background:

Collaboration: You will complete this lab in teams of 1-3 of your choice.  You may consult me or your classmates with proper attribution.


Part A: Experiments with IP addresses and hostnames

  1. Download the files netserv.c and netclient.c to your account and compile them.  Review the programs to be sure you understand them.
  2. Compile both programs.  In one terminal window, run netserv.  In another, run netclient.  Briefly explain what you see.
  3. Type the command nslookup <hostname>, where hostname is the name of the computer you are working on, to learn the IP address of that computer.
  4. In netclient.c, change the definition of IP_ADDRESS so that it is the address of the computer you are working on. Recompile netclient.c.
  5. Now, run netserv and netclient again.  Explain what you see.
  6. Suppose you run netclient when netserv is not running? Try this and explain what you see.
  7. Wouldn't it be great if we could run netserv and netclient on any computer, without having to change the IP address and recompile the client?  The way we will do this is by using the getaddrinfo system call.  This function lets us supply a hostname as a string, and it will resolve that hostname to a result of type struct sockaddr.  The program my-nslookup.c illustrates the use of getaddrinfo.  Download this program, compile it, run it a few times supplying different hostnames as arguments, and review the code to understand how it works.
  8. Now, modify netclient.c so that it takes the hostname as a command-line argument.  Use code from my-nslookup.c to resolve this hostname to a result of type struct sockaddr, and then use the result you obtain to connect to this address rather than the hard-coded IP_ADDRESS.

Part B: A simple web client

  1. The wget program allows you to fetch the contents of a URL and save it to a new file.  If you've never used wget, try using it to fetch the file named by http://www.cs.grinnell.edu/~davisjan/csc/213/2006F/index.html. In this part of the lab, we will build a simple analog to wget.  Our program will take a URL as a command-line argument and write the response from the web server to STDOUT.
  2. One problem we will face in building our simple web client is that of parsing URLs.  Luckily, we can build a very simple web client while only parsing a limited class of URLs: those of the form <protocol>://<hostname>/<path> or <protocol>://<hostname>:<port>/<path>. The program parseurl.c illustrates a simple parser for URLs of this form.  Download the program, compile it, run it on a few different URLs, and review the code to understand what it does.
  3. To put together your simple web client, make an appropriately named copy of your program from step 8 and then make the following changes.
    1. The command-line argument should be a URL rather than a hostname.
    2. Parse the URL and connect to the hostname and port specified.
    3. The HTTP protocol, in its simplest form, is very, very simple. After connecting the socket, write a request of the following form (where indicates a newline character):
GET url HTTP/1.0↵
HOST: hostname

  1. At this point, data from the web server should start arriving.  Repeatedly read chunks of data from the socket into a character array and then write the data to STDOUT.  You will know all of the data has been read when the return value from the read(...) syscall is zero (indicating 0 bytes read). When all of the data has been read, close the socket. (Note that the data read from the socket will not end with a null (\0) character.  You'll need to account for this somehow when you are writing the contents of the buffer to STDOUT.)
  1. Examine the result of running your program from step 11 for the URL http://www.cs.grinnell.edu/.  The HTTP header is separated from the contents of the file by a blank line.  What information do you see in the header?
  2. Examine the result of running your program from step 11 for the URL http://www.cs.grinnell.edu/does-not-exist. How is the HTTP header different from what you saw in step 12?

Part C: A simple web server

  1. Getting started.  Download the files wwwserv.c and url_lib.c.  Compile and run wwwserv.c.  Point your web browser (e.g., Firefox) to the url http://localhost:8000/. Review wwwserv.c to understand what you observed.
  2. Security. Note that we use the ROOT constant to specify the directory in which the web server should look for files.  Does this guarantee that a malicious web client cannot access files outside of the web server's root directory?  If so, explain why.  If not, give an example of a request a client could make to obtain an "unauthorized" file.
  3. A note on sockets.  Run wwwserv again and use the client you wrote in part B to request the url http://localhost:8000/.  Then, run the command "netstat -u".  You will see a list of all open Internet domain sockets on the computer, such as the following:
	Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 localhost.localdom:8000 localhost.localdo:42526 TIME_WAIT
The TIME_WAIT state is used to ensure that "stray" packets from previous connections do not interfere with new connections, when a new socket is opened on the same port number as an old connection.  In practice, it means you cannot open another socket using the same port number for about two minutes after the previous socket is closed.  (You may have already discovered this in part A of the lab.)

For this reason, wwwserv accepts a port number as an optional command-line argument.  This will allow you to easily cycle through a few different server port numbers (e.g., 8000, 8001, 8002, 8003) as you are testing your server.
  1. Serving content. Modify wwwserv.c to write the actual contents of the file to the connection socket, rather than "Content of file goes here."  To test the server, you may wish to use the WWW client program you wrote in Part B.

    A good strategy is to iteratively read chunks of data from the file into a character array and then write them to the socket. 
  2. Servicing concurrent requests. Modify the program so that it can accept and service multiple connections, using the fork strategy we discussed in class.  Note that the parent process should not wait for a child to complete its execution before accepting the next connection.  If it waits for each connection to complete before starting the next, then it's not concurrent!
  3. Now we have a problem.  Start your server and make several requests; leave the server running for now.  In another terminal window, give the command "ps -u <username> --forest".  Look for your web server process in the results.  What is the problem?
  4. To solve this problem, Stevens (p. 122-3) recommends adding the following signal handler for the SIGCHLD signal, which a child process sends to its parent when it dies.  (How tragic!)  Copy the signal handler into your program and register it using the signal(...) syscall. Verify that it solves the problem observed in step 19.  Review the manpage for waitpid and explain why this code solves our problem.
void sig_chld(int signo) {
pid_t pid;
int stat;

while ( (pid = waitpid(-1, &stat, WNOHANG)) > 0 ) {
printf("child %d terminated\n", pid);
}
}

Work to be turned in

Parts A & B: Due Friday, 17 November, 2006

Part C: Due Monday, 20 November, 2006


References


Janet Davis (davisjan@cs.grinnell.edu)

Created November 6, 2006
Last revised November 16, 2006
With thanks to Henry Walker