CSE 252C: Selected Topics in Vision and Learning

Project Reports

December 4, 2007

Deborah Goshorn: Recognizing Hand Gesture Behavior Using Sequential Grammar-based Classifier

Abstract: Object behavior recognition is a fundamental task abounding in a multitude of disciplines. This paper reviews a linguistically motivated approach to behavior recognition using cost-augmented context-free grammars to robustly classify arbitrary sequential (in time or space) objects [R. Goshorn. (2001, 2005)]. In this approach, objects are represented as terminals, and predefined object behaviors are represented by grammars, i.e. set of production rules. To allow for approximate object behavior matching, each grammar is augmented with error production rules with associated learned costs. For this paper, a finite set of hand gesture types constitute the objects to be studied and the hand gesture behaviors are defined as particular temporal hand gesture type sequences. This behavior recognition problem arises in various important applications such as American Sign Language recognition and Human-Computer Interaction. The experimental data in this paper is taken from the hand gesture annotations of several hand-pointing video recordings used to simulate the hand gesture behaviors of a particular Human-Computer Interaction scenario. Experimental results are provided.

Joshua Lewis: Charting the Evolutionary Course of Ocean-Dwelling Creatures Using Shape Analysis (with Pincelli Hull)

Abstract: This project explores the evolutionary history of two ocean-dwelling species, the tumida and pulleniatina. Both species have changed in shape over time, but it is unclear whether these changes happened abruptly or whether a mixture of morphologies coexisted for some time before arriving at their present state. We use techniques such as Procrustes analysis to analyze a chronological series of two-dimensional outlines of both tumida and pulleniatina forms. We then use spectral eigengap methods to cluster the outlines and investigate their evolutionary dynamics.

Shiaokai Wang: Reading Image Text on Product Packaging

Abstract: In a grocery store environment, there is intuition that along with color and visual design, reading text on product packaging is an important step for humans to perform in identifying items they are intending to purchase. However, the problem of reading text from packaging also presents a difficult problem of performing Optical Character Recognition (OCR) in a setting where the text being read varies greatly in style, color, pose, and other factors. In this work we hope to demonstrate a system that can read and localize image text on product packaging that has been acquired from the web while leveraging textual metadata describing each image. Our broader goal is to incorporate our product OCR system into the larger GroZi framework to assist the visually impaired in object recognition and way-finding.

IkkJin Ahn: Object Recognition and Segmentation by Isoperimetric Graph Partitioning

Abstract: In this project, I will present a method for concurrent recognition and segmentation using isoperimetric graph partitioning algorithm. As many previous related works pointed out, combining top-down object information and bottom-up pixel similarity is crucial to improve semantic quality of segmentation. It also enables the quantitative measurement of the segmentation result. Differently from recent related papers based on Bayesian model, I will use pairwise clustering based on graph theory. For object recognition, I will utilize two new pairwise relationships, pixel-model affinity and pixel-pixel affinity between multiple images. In our example, pixel-model relationship are created by intuitive user input, and pixel-pixel affinity between different images are defined by feature matching. I will use isoperimetric graph partitioning for segmentation which is faster than other partitioning algorithms. Another benefit of this algorithm is that a user can pick points in the figure manually. As an application, we will demonstrate a figure-ground segregation from multiple input images which requires handful user inputs only in a single reference image.

Fred Birchmore: Using Color-Based Features With Boosting Techniques to Detect and Recognize a Surrogate AK-47

Abstract: The goal of this project is to determine if the introduction of color-based features to an existing software recognition framework will allow it to train more effective classifiers for object recognition. The target object to detect is a surrogate AK-47 rifle. An existing boosting framework written in C++ will be used and modified to train strong classifiers using color-based features.

Nick True: Offline Word Spotting in Handwritten Documents

Abstract: The digitization of written human knowledge into string data has reached up to but not beyond the recognition of typeset text. This means that vast libraries of handwritten, cursive documents must be indexed and transcribed by a human -- a prohibitively laborious task. This paper explores an existing algorithm developed in [1] for the offline indexation of historical documents. Specifically, words are clustered using dynamic time warping (DTW) which compares sets of features from two images. By clustering words in an unknown document, a human would only have to label the cluster significantly reducing the humans workload.

Iman Mostafavi: Online Supervised Edge Learning

Abstract: Edge detection is utilized in a variety of computer vision applications, yet it remains a challenging problem on its own. Boosting has shown impressive performance in training offline classifiers for detection tasks. In this paper we propose the use of an online supervised learning algorithm for edge detection. The algorithm trains incrementally as new data becomes available, which has several advantages over offline methods, and opens the possibility for new applications. Our method combines a large number of features across different scales in order to learn a discriminative model using an online boosting classification framework. The resulting edge detector is adaptive with no parameters to tune. We test our algorithm on images from two different domains and demonstrate promising results with a relatively small number of training examples.
December 6, 2007

Diane Hu: Cracking Audio CAPTCHAs

Abstract: [put abstract here]

Christopher Kanan: NIMBLER: A Proposed Model of Saccade-Based Visual Attention and Object Recognition

Abstract: NIMBLE is a cognitively plausible object recognition system that uses a saccadic visual memory to store and retrieve image fragments. These fragments are acquired by scanning an image in a human-like way using a bottom up saliency model to find informative regions, applying a kernel density function to the fragment to determine its familiarity, and then combining the fragments using naive Bayes. Although it has performed well on the datasets it has been evaluated on, there are numerous extensions that can be made to NIMBLE to improve its biological plausibility and performance. The upgraded architecture, NIMBLER, will be evaluated on a variety of challenging datasets.

Miro Enev: TBA

Abstract: [put abstract here]

Steven Branson: TBA

Abstract: The active appearance model (AAM) is a popular method for modeling the shape and texture of classes of objects. Although AAMs have demonstrated a good deal of success in areas such as face tracking and medical imaging analysis, there exist significant limitations in their ability to scale well to classes of objects with larger intrinsic dimensionality. For this project, I will aim to test and evaluate the performance and feasibility of applying different methods in machine learning for building and fitting AAMs.

Cory Rieth: SURF-ing a Model of Spatiotemporal Saliency

Abstract: Zhai and Shah [1] proposed a model of spatiotemporal saliency using a combina- tion of temporal and spatial attention models. The temporal model utilized SIFT [2] to compute feature points and the correspondences between them in successive frames. Another model similar to SIFT has emerged, called SURF [3]. The authors of SURF show that it is faster than and superior to SIFT. This investigation simply replicates the model in [1] and compares performance when SURF is used in place of SIFT.

Marissa Grigonis: ICA Model for the Cross Race Effect

Abstract: Principal component analysis (PCA) learns the second-order dependencies between image pixels, and performs information maximization when the input is Gaussian. Although PCA has been used for image analysis, images are not inherently Gaussian. Independent component analysis (ICA) learns higher order dependencies among image pixels and performs information maximization for many distributions. An ICA model [1] has already been successfully applied to face processing and has been shown to be superior to PCA for facial recognition using both a spatially local image basis and a spatially global image basis [2]. The cross race effect (also known as the other race effect or own race bias) [4] refers to the fact that people are better at recognizing members of their own race than members of other races. This effect has been demonstrated for a PCA model [3] trained on Caucasian and Asian faces from the FERET database. In this pro ject I will extend the ICA model using the same image subset as the PCA model. I expect that the ICA model will demonstrate the cross race effect and that this effect will be stronger for the ICA model than the PCA model.

Most recently updated on Oct. 29, 2007 by Serge Belongie.