CSE 291: Seminar on Vision and Learning

Project Reports

November 2001

Sameer Agarwal, On the non-optimality of four color coding of Image partitions.

Abstract: Object based image decomposition and compression is an area of active research. For image and video compression we are witnessing a move from purely transform based coding to methods to methods which decompose the image into multiple layers/ objects and compress each of them separately exploiting layer specific properties to get better better distortion rates as well as achieving better overall compression ratios. For documents we now have an ITU standard for Multiple Raster Content (MRC). The DjVu system is an implementation of this standard. The DjVu systems treats the documents as made up of three layers corresponding to color, background, and text. For each of these layers it uses layer specific algorithms. Similarly the MPEG-4 standard for video compression specifically focuses on identifying objects across frames and compressing in different streams. Segmentation based image compression requires the following two components for successful compression. 1. Robust image segmentation methods. 2. Compression algorithms which can be used based on the content in a particular segment. A number of robust image segmentation have become available over the past few years, and in what follows we will assume that a ``good'' partitioning of the image into coherent connected segments is available. We consider the image to be composed of multiple layers, each layer made up of texture corresponding to one of the layers. One way of compressing the image would be to compress each of the segments individually, which includes both the texture and the shape information for the segment. Another option is to only encode texture information in the segment layers, and use an additional layer for storing the image partition information. We call this the image partition layer. In this paper we will address the issue of effective compression of the partition layer.

Andrew Cosand, Adaptive Multispectral Difference Weighting.

Abstract: This project attempts to develop a robust and reliable method for fusing hue and intensity difference information for shadow-invariant difference detection. While hue information is standardly used because it is insensitive to shadows, it may fail on pixels which are nearly gray as their hues cannot be determined reliably. A log-scaled distance between each pixel and the gray line in RGB color space is used a measure of how strongly colored the pixel is. Using information about how the algorithm performs on a training set, weighting functions are generated which fuses hue and intensity difference information as a function of this log-scaled distance. The weighting function currently generated is observably better in some regards than both the hue and saturation differences, but it has yet to combine the better performance characteristics of each without any of the poor performance characteristics. Although the original goal was to fit simple curves to these weighting functions and adaptively adjust the curve parameters, that has not yet been accomplished.

Gyozo Gidofalvi, Recognizing hand-drawn images using shape context.

Abstract: The objective of this paper is twofold: to gather real world samples for a subset of the standardized set of 260 line drawings introduced by Snodgrass and Vanderwart [4] and to test the performance of the representative shape context method for rapid retrieval of similar shapes introduced by Mori et al. [1]. To experiment with the expressive power of the shape context at different location in the image, we introduce a modification to the representative shape context method, which draws representatives from a distribution based on the pixel density of the image in a given area. Furthermore, we test the performance of the representative shape context method for detection of these objects when embedded in an arbitrary environment. We find that the performance of the representative shape context method is slightly worse on hand-drawn images than on synthetic data presented in [1]. We also find that the density based sampling methods perform worse than the original method. Finally, we find that the representative shape context in its original form is highly affected by the presence of clutter and is not appropriate to recognize objects when embedded in an environment. However, results suggest that a sampling method that incorporates both spatial considerations and density measures may improve query performance for embedded objects.

David Kauchak, Audio Meets Image Retreival Techniques.

Abstract: In this paper we examine the problem of audio retrieval. We make a number of key contributions to this field. First, we examine artist recognition/retrieval as a problem instead of the traditional genre classification. This problem has the motivating benefit that there is a known, uncontraversial ground truth. Second, and more importantly, we suggest borrowing research from the image retrieval community. We provide results from one image retrieval technique ported over to audio retrieval. This technique consits of taking the discrete wavelet transform of the audio, histogramming the results and using statistical histogram comparison metrics to compare similarity. The results are not outstanding, but we do show that this sort of research can be done fairly easily and productively.

Most recently updated on Dec. 9, 2001 by Serge Belongie.