Tracking Student Use of Course Webs

Raphen Becker and Kevin McLaughlin
Department of Mathematics and Computer Science
Grinnell College
Grinnell, Iowa 50112
{beckerr,mclaughl}@cs.grin.edu

1. Introduction

In this paper, we present a logging system that permits instructors and instructional designers to better garner information that helps them understand how students use course webs, and therefore how to improve course webs. In particular, the system allows one to determine the path(s) individual users follow through a site and the time they spend on each page.

Students at most universities now have access to the World-Wide Web, putting the Web in a position to be a common medium of hypermedia for class materials. Hypermedia has many advantages over a paper-based presentation of information. For example, it allows the easy integration of media other than text. It also supports a structure quite different from the linear presentation of textbooks. Links allow related material to be easily accessible from various nodes in the course web. At the same time, this new technology requires analysis of its use and usefulness. Instructors need to know how the Web affects students' understanding of information, and their performance in class. Such analysis requires careful and regular gathering of information on how individual students use course webs.

The Web was not originally designed to provide much information on how readers use individual sites and pages. The log files generated by most web servers (e.g., NCSA httpd) are inadequate for careful analyses. In particular, log files list only pages accessed and accessing machine, but not the individual that accessed the pages or the previous and next pages on that individual's exploration of the Web. Successful acquisition of more detailed data requires working around this original design decision. Recently, new technologies such as cookies, Java, and Javascript have provided ways to do new things with the Web. Using these technologies it is now possible to track a student as they move throughout the course web. In particular, we can log the path of a particular student and record the time spent on each page.

2. Improving Understanding of Site Usage

The primary goal of course web analysis is to help teachers, site designers, and instructional designers as they evaluate the use and usefulness of course webs. There are many potential benefits from more thorough analysis of course web usage. These include a better understanding of student learning patterns, justification for the time and effort spent developing a site; and support for improving the site.

Information presented with hypermedia does not need to appear linear format as in a lecture or in a text book. This change can have a dramatic effect on how well students learn and retain course material. An analysis system would allow one to correlate viewing information on the Web with related test questions [Trochim 1996], student interviews [Jones et al. 1996], or student perceptions of their own web use [Rebelsky 1998]. With more detailed usage logs, one can also ask questions like

To permit these sorts of analyses, a logging system must identify students and keep track of the paths they take through the course web and the amount of time they spend on individual pages. This means the system needs to log when a reader loads a web page, how they found the page, how long the page is open, and which link (if any) is followed next. Because the web permits readers to open multiple "windows" on the same site, we also need to be able to determine when they have several pages open concurrently.

3. The Identification System

The identification system is loosely modeled after the Unix model of users and groups. Each user entry contains a single line in a file that contains all of the information for a user. Each entry contains the account name of the user, an encrypted password, the groups that the user belongs to, the real name the real name of the user (or anonymous, for users who do not wish their name to be known), and an email address for contacting the user.

We use a group model so that different types of tracking are possible. For example, one might want to compare different classes using the same web site or only log members of a particular group. The group model is also appropriate for related uses of the system, such as an annotation system which takes advantage of this system [Luebke and Mason 1998].

When users first access pages that are logged, they are prompted by a form that asks for account and password. Once the user has logged in, a session and session identification number are created for that user. As the user continues throughout the logged pages the session identification number is carried with them via query strings and cookies. Once the user exits the logged pages or leaves the browser the session number is expired and they must login again if they choose to return to logged pages. In this way, only registered users of the system may view the logged pages. This can easily be circumvented by allowing a guest account. In the future it should also be possible to restrict certain pages for certain groups or to restrict logging to certain Internet domains (so that, for example, internal users might be logged but external users would not be).

The identification system is not only essential in providing access control to pages. It is also essential in keeping track of users as they move around the logged pages with the tracking system. With the identification system, each time a user enters a page, both the user name and session number can be immediately identified. These pieces of data are used in constructing the log files for each user.

4. The Tracking System

The tracking/logging system is used to record events for each user as they move around the logged webpages. To improve the accuracy of the tracking, tracking events are gathered in a variety of ways. The tracking system includes a server, a Java applet, Javascript scripts, and several CGI scripts written in Perl. These various components report events to the server for each session. The events include:

Event # Sent by Event type Data sent
0 applet left page page left
1 CGI script loaded page page loaded
2 CGI script from page last page
3 Javascript Javascript enabled current page
4 applet Java enabled current page
5 applet stay on page current page

As the user browses through the logged pages these events are generated and sent to the server which writes them to a log file. Because it is traditionally difficult to determine when someone leaves a page (e.g., if the user switches to a page on another server, the current server gets no notice), the stay on page event helps us determine the amount of time on the page.

Since we rely on Java and Javascript, problems may occur when the user disables them in the browser. This is the reason events 3 and 4 exist. In the logfile conversion for analysis these events are looked for. If they are not found we know that Java or Javascript was not functioning properly. Thus, analysis component accounts for this and tries to determine those events based on other events.

5. Anonymous Browsing

The question of anonymity with regard to the World-Wide Web is a common question. Is it acceptable for sites to track users paths through the web? We considered this issue in great detail while designing the logging system. For example, would students be less likely to use web pages if they knew their use was being logged, or would they "simulate browsing" to make it appear that they were using the system more than they were? It is clear that careful studies and interviews need to be done to determine all of the potential effects. We decided that it was appropriate to permit browsers to use anonymous accounts.

There are a number of options for anonymous browsing: one might use a single "anonymous" account, one might use multiple anonymous accounts (one for each user), or one might allow users to turn off logging. In the case of a single account, the administrator might distribute the password to all users with privileges to browse the logged pages. This method, however, has a distinct disadvantage. One purpose of the logging system is to determine distinct paths for each user across sessions and not just in a single session. With a group guest account it is unknown what user is using the web at a particular time. In addition, some students might use the guest account some of the time and their own account at other times, leading to unreliable data collection. Turning off logging is similarly unacceptable.

A better system for allowing anonymous browsing is to create accounts for all desired users and let the account names, real names, and email addresses be blank or psuedonyms. Such accounts may be distributed anonymously (e.g., with slips of paper in a hat). In this manner we can create distinct paths for each particular pseudonym and thus, each user. Regular analysis can be performed without any loss of accuracy. This is the preferred way to handle anonymous logging.

For many purposes, such as marketing and other corporate uses, the name of the user is the desired information. For our logging system, however, it is frequently not the name, but the the path taken by any particular user. Therefore the actual name of the user is only useful for record keeping and is not needed to actually perform many analyses. Names are useful only when usage and success in learning are correlated, which is only one of many analyses possible.

6. Future Work

We have developed working identification and tracking systems that have been been used for simple experiments during Summer 1998 and are scheduled for course testing in Fall 1998. Development of a sophisticated analysis tool for using the logs is underway. When completed the tracking and analysis tools will provide the means to track users and answer questions about student usage, such as "How often and how much do students use particular pages?" and "How do students typically reach the most useful pages?". With the answers to these questions one can design more usable and more useful course webs. We expect to use this prototype for analyses of the usage in the Fall 1998 course.

The identification and tracking systems are also being used to support additional facilities for course webs. The identification system has been used to support an annotation system [Luebke and Mason 1998] and the tracking system may be used to support custom pages based on history of use.

Acknowledgements

We thank Grinnell College, the Robert N. Noyce Faculty Study Program, the Robert N. Noyce Prize Program Summer Student Fellowship Program, and the Grinnell College Noyce Science Summer Research Fund for their support of this research. We also acknowledge Professor Samuel Rebelsky, Sarah Luebke, and Hilary Mason for their critiques and suggestions throughout the entire project.

References

  1. Jones, T., Berger, C. F., and Magnusson, S. J. (1996). "The Pursuit of Knowledge: Interviews and Log Files in Hypermedia Research", Proceedings of Ed-Media 1996 World Conference on Multimedia and Hypermedia in Education. Charlottesville, VA: Association for the Advancement of Computing in Education.
  2. Jones, T. and Jones, M. (1997). "MacSQUEAL: A Tool for Exploration of Hypermedia Log File Sequences", Proceedings of Ed-Media 1997 World Conference on Multimedia and Hypermedia in Education. Charlottesville, VA: Association for the Advancement of Computing in Education.
  3. Guzdial, M., Walton, C., Konemann, M., & Soloway, E. (1993). "Characterizing process change using log file data" (GVU Center Technical Report No. 93-44). Georgia Institute of Technology. Online document at file://ftp.cc.gatech.edu/pub/gvu/tech-reports/93-44.ps.Z (accessed July 22, 1998).
  4. Guzdial, M. J. (1993). "Deriving software usage patterns from log "files (GVU Center Technical Report No. 93-41). Georgia Institute of Technology. Online document at file://ftp.cc.gatech.edu/pub/gvu/tech-reports/93-41.ps.Z (accessed July 22, 1998).
  5. Luebke, S. and Mason, H. (1998). "An Annotation System for the World-Wide Web". Submitted to the Consortium for Computing in Small Colleges 1998 Midwest Conference.
  6. Rebelsky, S. (1998). "In Class Use of Course Webs: A Case Study", Proceedings of the 10th EdMedia World Conference on Educational Multimedia and Hypermedia , T. Ottmann and I. Tomek (eds), pp. 1115-1120. Charlottesville, VA: Association for the Advancement of Computing in Education.
  7. Trochim, W. M. K. (1996) "Evaluating Websites"; Online document at http://trochim.human.cornell.edu/webeval/webintro/webintro.htm (accessed July 22 1998).
  8. Trochim, W. M. K. and Cirillo, D. (1996) "Automatic Data Collection with Log Files"; Online document at http://trochim.human.cornell.edu/webeval/perform/performd.htm (accessed July 22 1998).