CSAIL logo

Wide-Area Egomotion From Omnidirectional Video and Coarse 3D Structure

Authors: Olivier Koch, Seth Teller

About | Code & Documentation | 3D models | Datasets | References | Contact


About

This thesis describes a method for real-time vision-based localization in human-made environments. Given a coarse model of the structure (walls, floors, ceilings, doors and windows) and a video sequence, the system computes the camera pose (translation and rotation) in model coordinates with an accuracy of a few centimeters in translation and a few degrees in rotation. The system has several novel aspects: it performs 6-DOF localization; it handles visually cluttered and dynamic environments; it scales well over regions extending through several buildings; and it runs over several hours without losing lock.


Code

Download the latest version (Feb 2007): omni3d-2007-03-01.tar.gz (53 MB).

If you have a CSAIL account, the code is also available under svn (contact tig for permission issues):

svn co svn+ssh://login.csail.mit.edu/afs/csail/group/rvsn/repositories/omni3d

The code comes as a MS Visual C++ 6.0 workspace. The code has not been tested under Linux although it uses standard C++ libraries.

Please follow the Installation procedure and read the Documentation.


3D Models

The 3D models are available as a set of ASCII files. Each model contains the walls, doors and windows expressed in inches. Read the README file for a quick tutorial.


We provide Linux-compatible code for reading these 3D models: model_reader-2007-08-21.tar.gz





Datasets

The datasets are available under

/afs/csail.mit.edu/group/rvsn/www/data/static-content/omni3d/data

and at the following URL:

http://rvsn.csail.mit.edu/static-content/omni3d/data/


Each directory corresponds to one dataset. The naming convention for a dataset directory is yyyymmdd_bbb_name where yyyy, mm, dd are the year, month and day (respectively) the sequence was captured, bbb, the building number at MIT and name, a name specific to the dataset (e.g. robot). Each dataset contains the Ladybug images in JPG format, a configuration file (data.ini), the output camera pose at every frame as recovered using our method (poses.dat) and a bird's eye view of the 3D model and reconstructed camera motion (birds-eye.jpg). The datasets are usable as is by pointing to their location when opening a dataset. Ladybug images are named xxxxxx_camy.jpg where xxxxxx is the frame ID and y is the camera ID (between 0 and 5).


References

Olivier Koch and Seth Teller, Wide-Area Egomotion Estimation from Known 3D Structure, CVPR 2007, Minneapolis. [PDF]

Olivier Koch, Wide-Area Egomotion From Omnidirectional Video and Coarse 3D Structure, MSc thesis, Feb 2007. [PDF]

Demo video (530MB, requires huffyuv codec)


Contact

If you have any questions, suggestions, or bug reports about this implementation, please contact Olivier Koch ( koch at csail dot mit dot edu ).