Summer 2009 MAP Projects

Jerod Weinman

Abstract

This document provides a background on summer MAP projects for Grinnell students and the high expectations I have for summer students.

Contents

1  Introduction
2  GPU Learning
    2.1  Background
    2.2  Project Work
3  Visual Categories
    3.1  Background
    3.2  Project Work
4  Gamma Ray Astronomy
    4.1  Background
    4.2  Project Work
5  Approximate Schedule
6  Expectations
    6.1  Spring
    6.2  Summer
    6.3  Fall and Beyond

1  Introduction

The general focus of my research is in probabilistic machine learning for computer vision. Because reconstructing a 3-D image from a 2-D projection is a difficult inference problem, some computational machinery is necessary. Furthermore, understanding and extracting meaning from images is a problem that has been solved by humans, but remains elusive for machines. Since it is nearly impossible to specify and hand-code models for these tasks, machines must be endowed with some amount of learning capabilities.
There are three particular projects I am hoping to work on in summer 2009
Slides from the presentation given at the departmental Thursday Extras Series can be found here:

2  GPU Learning

The context of this project is character recognition. While this is one of the oldest problems in pattern recognition, the most general form of the problem-automated recognition that matches human performance in any situation-is still very far from solved. My primary aim is to advance information availability with automated character recognition. The primary focus of this project will be to improve performance by adding training data through massive parallelization.

2.1  Background

The application for the project, already underway, is an aid to the blind. By recognizing text in arbitrary images, we hope to improve the ability of the visually impaired to navigate. In order to correctly recognize characters, computers must, in essence, learn to read. That means they are shown many examples of characters in many fonts. Given a model with many free parameters, the machine then learns how to set these parameters such that it is likely to recognize the examples it has seen, and others like it.
It is known in the speech recognition community that many millions of training examples are required for a machine perform satisfactorily. This realization, and a corresponding data-gathering effort, has made speech recognition more successful when deployed in real-world applications, such as automated telephone operators.
Commercial optical character recognition (OCR) software for document text recognition has also seen modest success. However, it depends on many assumptions that are not warranted in other text recognition settings. One example mentioned above is scene text recognition (STR), the recognition of text anywhere in the environment, such as on store fronts, traffic signs, movie marquees, or parade banners. While superficially similar to OCR, STR is significantly more challenging because there are innumerable fonts to consider, uncontrolled viewing conditions, and minimal language context. Difficult viewing angles, shadows, occlusions, unique fonts, and lack of language context are all problems that make the typical STR problem significantly more difficult than a straightforward OCR application of document recognition.
Our current models perform at the state-of-the art on modestly constrained STR problems-95% accuracy for normal resolution and 85% accuracy at low resolution. But there is still clear room for improvement-a seven year old child makes fewer recognition errors. One important development necessary to increase performance to human level will be to drastically increase the amount of training data available to the system. However, training time is already on the order of one to two weeks. With the current methodology, any increase in training set size by multiple orders of magnitude will make timely error analysis impossible.
For a high-level view of what I have been doing in this area, please see the following:
For a more detailed perspective, you might look at the following papers. In particular, the introduction, experiments, and conclusions are likely to be interesting and easier to follow than the "model" sections, which you certainly can try reading:
For the truly curious (or deranged), a plethora of details on the current state of the project are in my Ph.D. thesis

2.2  Project Work

The accuracy of our current text recognition model on training data is very high (near 100%). Therefore, we believe the capacity of our model to recognize complex characters in many fonts is great. However, one of the primary bottlenecks of improving the system is the amount of training data we are able to use. I want to improve system performance and move toward more challenging problems by radically parallelizing the training process with graphics processing units (GPU). GPUs feature many processing units on a single board and is a hardware platform perfectly suited to performing the same instructions on different data-i.e., the same learning algorithm on different examples. Since we can generate a virtually unlimited amount of training data artificially, we should be able to see an immediate gain in performance.
There is a programming language (called CUDA) for GPUs allowing developers to more easily port their existing algorithms into a parallelized version. Even though CUDA exists, there will stil be open questions about the optimal approach for moving the massive amounts of data between the computer's main memory and the more limited GPU memory. Variations to the training algorithm are possible, thus approximations that may increase throughput should be investigated.
Once the parallel training is achieved, it will then be necessary to investigate what types of variations in the training data provide the greatest improvement on an evaluation benchmark. These might include number and variety of fonts, scale/size variations, noise, contrast, rotations, and the magnitude or degree of all of these.
In addition (or as an alternative), we may also investigate the type of representation used by the model. Currently, a "flat" representation is used. Here, a character is only decomposed into the orientations of its edges, and these are recognized as a whole. As an alternative, the character may be broken up in to representative parts (T-junctions, corners, straight lines, round bowls), and these parts can be detected and recognition based on the parts. This is the approach used for visual categories and it has not yet been shown that a flat model (of the sort currently in use) with sufficient training data cannot outperform or rival a parts-based hierarchical model.
I expect applicants to have a strong understanding of programming in C and systems architecture, since system throughput and bandwidth will be important considerations for maximizing training efficiency. Students with strong backgrounds in linear algebra and probability/statistics will also be at an advantage. In addition, students on this project should expect to become very familiar with the NVIDIA CUDA programming language as well as MATLAB, being prepared to handle the challenges of communicating between the two. Some design work on how best to create a data pipeline from disk to GPU will be needed, and we may decide that it is optimal to forego MATLAB altogether.
It is my desire to release any software produced in the most general form possible so that our efforts may be used to reproduce experiments or apply the technology to other problems. Therefore, a code repository (SVN) will be used to manage code, and high standards of code quality and documentation are expected.

3  Visual Categories

A closely related project is generalizing to visual categories. In this problem, many computational models have been proposed, but few take advantage of the relationships between categories. We will investigate whether performance improvements can be realized by learning not only to distinguish between cars, motorbikes, cheetahs, and zebras, for instance, but by learning relations between their hierarhical categories, such as "wheeled transport" and "quadrupeds."

3.1  Background

Researchers have proposed a standard training and test set for evaluating visual category recognition models. While humans are thought to recognize 30,000 object categories, this data set incorporates just 256. One reason for the limitation is the overhead involved in annotating data. In order for a computer to learn adequately general models and representations, there must be a vast amount of training data or a great deal of prior knowledge engineered into the system. Unfortunately, both "solutions" are inadequate and very hard to come by. Since training data for each category is limited, generalizations are challenging, and models tend to either overfit the limited training data, or make weak, uncertain predictions. State of the art performace is just 35% accuracy.
You can browse the data set here: Caltech256. This is a press-release discussing a model used for this problem:
For those who want details, a recent journal article (complete with a very informative Figure 1), is here:
This project will build-upon the work described at a high-level here:
For a more detailed perspective, you might look at the following paper. In particular, the introduction, experiments, and conclusions are likely to be interesting and easier to follow than the "model" sections, which you certainly can try reading:

3.2  Project Work

I have already performed a preliminary study of the joint learning of multiple categories. However, the images used were artificially generated and formed only a few "super categories." To demonstrate the method as a viable strategy for more complex recognition problems, we will plan to expand the experiments to the 256 category benchmark, using a subset of these categories that form at least a dozen super-categories (e.g. wheeled transport, quadruped, etc.).
While models and software exist for such category recognition, it is not clear they are optimized for the problem at hand. Some experimentation will need to be done to be sure that the "features" used to represent the images are in fact appropriate for the task. Once this is done, the experiments will test whether awareness of super categories leads to better generalization and/or stronger predictions.
This process can be computationally time-consuming, but there is also room for parallelization. If adequate progress is made on establishing the preliminaries above, we will investigate tools for parallelizing the learning process in this model as well. In this case, the MATLAB\textregistered Distributed Computation Toolbox would be used (as opposed to CUDA for the GPU), since the development environment for the project is MATLAB\textregistered.
Students should have sufficient maturity in the discipline to teach themselves a new programming language (MATLAB\textregistered). Students with strong backgrounds in linear algebra and probability/statistics will also be at an advantage. Students who have taken Neuroscience 250, Psychology 260/360, or Philosophy 256/257 may also be at an advantage.
It is my desire to release any software produced in the most general form possible so that our efforts may be used to reproduce experiments or apply the technology to other problems. Therefore, a code repository (SVN) will be used to manage code, and high standards of code quality and documentation are expected.

4  Gamma Ray Astronomy

This project is being conducted in conjunction with Professor Charles Duke, from the Grinnell College Department of Physics. He is a member of the VERITAS gamma-ray astronomy research collaboration centered at the Smithsonian Observatory in Tucson.

4.1  Background

Astronomers have a multitelescope configuration that allows for detection of upper-atmosphere gamma-ray showers. The three main problems are:
.All of this must be done with the four images generated by the telescope (or in this case, simulations of such images).
More information on the VERITAS project can be found here:

4.2  Project Work

Image processing will be used to separate proton-produced atmospheric showers from gamma-ray produced atmospheric showers. Current techniques are straightforward and make decisions based on simple image shape. Machine learning could be used to improve the results for automatically distinguishing gamm-ray from proton-induced showers. A large library of Machine Learning algorithms is available to the research community in an Open Source Java package called WEKA. Identifying useful features of the images and testing the performance of certain algorithms could be a key component of this research. This will require gaining some understanding of image processing and features, as well as becoming versed in several machine learning algorithms (or at least how to use them in WEKA).
Another possibility is to develop new methods for reconstructing the gamma-ray source location in the sky and the on the ground. This work would make extensive use of vector algebra and coordinate transformations using C++ language and the ROOT framework, an astronomy code library.
Depending on the results, both of these projects could result in a journal publication, with work continuing beyond the summer.

5  Approximate Schedule

This schedule largely follows that officially approved by the division. However, since other (off-campus) options have different schedules, I will need to know if you are considering other opportunities, what the schedule is, and whether you are likely to choose an off-campus opportunity if accepted. The ten weeks of summer research are currently under discussion.
Friday, 20 February 2009:
Application forms due. You must submit the division-wide form in hardcopy to the division office and your responses to my questions to me.
Friday, 6 March 2009:
Initial selections announced (provided that the college has approved funding).
Friday, 13 March 2009:
Preliminary acceptances/rejections due.
Week of 7 April 2009:
First meeting.
Unspecified other dates:
Additional meetings.
Monday, 18 May 2009:
Commencement.
Monday, 25 May 2009:
Summer research begins.
Friday, 31 July 2009:
Summer research concludes.

6  Expectations

I have very high expectations of my summer research students. Among other things, I expect my students to begin their summer research during spring semester and continue their summer research into fall semester (and sometimes beyond). By applying for summer research you are agreeing to meet these expectations if I take you on as a research student. You are unlikely to receive explicit credit or compensation for work in the spring and fall.
I also expect my students to be self-reliant. While I do my best to be around, I expect you to be able to do many things on your own or with a small group.

6.1  Spring

Topic Preparation  
You are expected to begin your background research during the spring. In particular, you must identify at least four papers on related projects. You are also encouraged to use the web to aid your search. Some useful resources are:
Some of the top related conferences to find this work are
CVPR
Computer Vision and Pattern Recognition
ICCV
International Conference on Computer Vision
ECCV
European Conference on Computer Vision
ICDAR
International Conference on Document Analysis and Recognition
ICPR
International Conference on Pattern Recognition
ICIP
International Conference on Image Processing
ICASSP
International Conference on Acoustics, Speech, and Signal Processing
and some related jourals include
PAMI
IEEE Transactions on Pattern Analysis and Machine Intelligence
IJCV
International Journal on Computer Vision
TIP
IEEE Transactions on Image Processing
though there are of course many, many others. Once you have identified potentially useful resources, if you cannot find an author preprint online (they nearly always are), consult with the librarians about obtaining a copy of an article or conference paper.
Skill Preparation  
If your project will require a programming language (e.g., MATLAB), data language (e.g., CUDA), or library (e.g., WEKA) that you do not yet know, you are expected to begin studying that language. You need not master the language or libraries, but should develop some basic familiarity.

6.2  Summer

During the summer, you are expected to work full-time on the project (40-50 hours per week for ten weeks). This work will include regularly scheduled group meetings.
Topic Preparation  
For the first week of summer research, you will continue your preparation from the Spring, developing a survey of the state of the art in whatever project you've decided to undertake. You should prepare a short survey paper. This will serve as an introduction/literature review for a later paper. On the first day of the second week, you will give a public presentation of your work.
Core Research and Development  
For the next eight weeks of the summer, you will work on your project, using what you've learned during preparation for guidance. Some of this time may be spent developing skills. We will have a full-group meeting several times a week. Each group will present at least once per week at that meeting.
Writing  
For the last week of the summer (and, preferably, as you do your work), you will work on a five-to-ten page paper describing your work and placing it in the context of related work. Your paper should meet the highest standards of writing at Grinnell. If you work as part of a small group, the group need only prepare one paper. In most cases, you will be required to submit a version of this paper to a conference or journal. (I will provide significant assistance in developing the submitted version, in which case I will probably ask to be listed as a co-author.)

6.3  Fall and Beyond

Poster Presentation  
You will create a poster describing your work and present it at the Grinnell Science Poster Seminar (typically during parents' weekend).
Internal Public Presentation  
You will give a twenty-five or fifty minute presentation on your work as part of the Computer Science Thursday Extras series.
External Conference Presentation  
If your work is submitted and accepted to a conference, and there is funding available for you to attend the conference, you are expected to attend and present your work.
External Pew Presentation  
You must submit your work to the Pew Midstates Science and Mathematics Consortium Fall Symposium on Undergraduate Research in the Physical and Mathematical Sciences. You must attend the symposium (including non-CS talks and present your work (in poster or talk form) if your work is accepted. You must give at least one practice talk before going to the conference.

Acknowledgement

With thanks to Professor Sam Rebelsky for many elements of Section 6 and Professor Charles Duke for information in Section 4.


File translated from TEX by TTH, version 3.80.
On 9 Feb 2009, 17:05.