Wide-Area Egomotion From Omnidirectional Video and Coarse 3D Structure

About | Code & Documentation | 3D models | Datasets | References | Contact

About

This thesis describes a method for real-time vision-based localization in human-made environments. Given a coarse model of the structure (walls, floors, ceilings, doors and windows) and a video sequence, the system computes the camera pose (translation and rotation) in model coordinates with an accuracy of a few centimeters in translation and a few degrees in rotation. The system has several novel aspects: it performs 6-DOF localization; it handles visually cluttered and dynamic environments; it scales well over regions extending through several buildings; and it runs over several hours without losing lock.

Code

Download the latest version (Feb 2007): omni3d-2007-03-01.tar.gz (53 MB).

If you have a CSAIL account, the code is also available under svn (contact tig for permission issues):

svn co svn+ssh://login.csail.mit.edu/afs/csail/group/rvsn/repositories/omni3d

The code comes as a MS Visual C++ 6.0 workspace. The code has not been tested under Linux although it uses standard C++ libraries.

Please follow the Installation procedure and read the Documentation.

3D Models

The 3D models are available as a set of ASCII files. Each model contains the walls, doors and windows expressed in inches. Read the README file for a quick tutorial.

The Stata Center, 33x: 33x.tar.gz
The Stata Center, Dreyfoos 4th & 5th floors: D4and5.tar.gz
Buildings 36 & 26, 3rd floor: 36-26.tar.gz
A simple cube: cube.tar.gz

We provide Linux-compatible code for reading these 3D models: model_reader-2007-08-21.tar.gz

Datasets

The datasets are available under

/afs/csail.mit.edu/group/rvsn/www/data/static-content/omni3d/data

and at the following URL:

http://rvsn.csail.mit.edu/static-content/omni3d/data/

Each directory corresponds to one dataset. The naming convention for a dataset directory is yyyymmdd_bbb_name where yyyy, mm, dd are the year, month and day (respectively) the sequence was captured, bbb, the building number at MIT and name, a name specific to the dataset (e.g. robot). Each dataset contains the Ladybug images in JPG format, a configuration file (data.ini), the output camera pose at every frame as recovered using our method (poses.dat) and a bird's eye view of the 3D model and reconstructed camera motion (birds-eye.jpg). The datasets are usable as is by pointing to their location when opening a dataset. Ladybug images are named xxxxxx_camy.jpg where xxxxxx is the frame ID and y is the camera ID (between 0 and 5).

References

Olivier Koch and Seth Teller, Wide-Area Egomotion Estimation from Known 3D Structure, CVPR 2007, Minneapolis. [PDF]

Olivier Koch, Wide-Area Egomotion From Omnidirectional Video and Coarse 3D Structure, MSc thesis, Feb 2007. [PDF]

Demo video (530MB, requires huffyuv codec)

Contact

If you have any questions, suggestions, or bug reports about this implementation, please contact Olivier Koch ( koch at csail dot mit dot edu ).