Wide-Area Egomotion From Omnidirectional Video and Coarse 3D Structure
Authors: Olivier Koch, Seth Teller
About
This thesis describes a method for real-time vision-based localization in human-made
environments. Given a coarse model of the structure (walls, floors, ceilings, doors and
windows) and a video sequence, the system computes the camera pose (translation
and rotation) in model coordinates with an accuracy of a few centimeters in translation and a few degrees in rotation. The system has several novel aspects: it performs
6-DOF localization; it handles visually cluttered and dynamic environments; it scales
well over regions extending through several buildings; and it runs over several hours
without losing lock.
Code
Download the latest version (Feb 2007): omni3d-2007-03-01.tar.gz (53 MB).
If you have a CSAIL account, the code is also available under svn (contact tig for permission issues):
svn co svn+ssh://login.csail.mit.edu/afs/csail/group/rvsn/repositories/omni3d
The code comes as a MS Visual C++ 6.0 workspace. The code has not been tested under Linux although it uses standard C++ libraries.
Please follow the Installation procedure and read the Documentation.
3D Models
The 3D models are available as a set of ASCII files. Each model contains the walls, doors and windows expressed in inches. Read the README file for a quick tutorial.
We provide Linux-compatible code for reading these 3D models: model_reader-2007-08-21.tar.gz
Datasets
The datasets are available under
/afs/csail.mit.edu/group/rvsn/www/data/static-content/omni3d/data
and at the following URL:
http://rvsn.csail.mit.edu/static-content/omni3d/data/
Each directory corresponds to one dataset. The naming convention for a dataset
directory is yyyymmdd_bbb_name where yyyy, mm, dd are the year, month and day
(respectively) the sequence was captured, bbb, the building number at MIT and name,
a name specific to the dataset (e.g. robot).
Each dataset contains the Ladybug images in JPG format, a configuration file
(data.ini), the output camera pose at every frame as recovered using our method
(poses.dat) and a bird's eye view of the 3D model and reconstructed camera motion
(birds-eye.jpg). The datasets are usable as is by pointing to their location when
opening a dataset.
Ladybug images are named xxxxxx_camy.jpg where xxxxxx is the frame ID and y
is the camera ID (between 0 and 5).
References
Olivier Koch and Seth Teller, Wide-Area Egomotion Estimation from Known 3D Structure, CVPR 2007, Minneapolis. [PDF]
Olivier Koch, Wide-Area Egomotion From Omnidirectional Video and Coarse 3D Structure, MSc thesis, Feb 2007. [PDF]
Demo video (530MB, requires huffyuv codec)
Contact
If you have any questions, suggestions, or bug reports about this implementation, please contact Olivier Koch ( koch at csail dot mit dot edu ).