Texty

Short Description:

My name is Tom Duerig, I'm a computer science graduate student researcher at UCSD. I am currently working on a research project to develop a handheld mobile device to assist blind people with grocery shopping. The device uses computer vision and pattern recognition techniques to recognize products by their visual appearance, and alerts the user of the product location via haptic and audio feedback. It is called GroZi after the hardware called MoVi (Mobile Vision project). Texty is the text detection/reading portion of that device.

My involvement:

My goal was to write the software to quickly, and accurately get text from a video feed. By giving GroZi access to the textual cues in a scene it should help the system quickly decide which objects to look for, and where. I focused on the aisle signs as their text is relatively optimal for machine reading. To correctly detect aisle signs in a cluttered environment you need a lot of training photos of the signs in natural scenes. Vons kindly agreed to allow us to take those training photos. After getting the detector to work I focused my efforts on preprocessing to remove perspective distortion and illumination variance, and postprocessing to limit the OCR's search to a given wordlist. Texty can now accurately read most single images containing signs when given a list of possible words.

How much and when?

The GroZi project is slated to continue until 2010. My contribution is nearing completion and I will be disconnected from the project from october 2006 onward.

Final Thoughts:

Although I've experimented with various open source OCR programs (STIHRS, GOCR, and TESSERACT) I wrote my own, which is used in the images below. Tesseract seems to get reliably better results, so I've included an easy option to use Tesseract instead of my home brew OCR. I've started a web forum dedicated to text reading in natural scenes which I hope will help people do things like this more easily in the future.

Getting the code/executable:

To get a copy of the code or executable contact me via email at tduerig AT gmail.com .

Links:

Belongie Group
Main Vision Site