Video-Based Change Detection


For more information, email Seth Teller at "teller" at "csail.mit.edu".


System Overview Video (5:13)
wmv (36.9 MB) Playable on Windows XP with Windows Media 9 or above
avi (51.5 MB) Requires DivX 6 codec
mpg (50.7 MB) Playable with most operating systems


Live Demonstration Video (10:37)
wmv (62.3 MB)
avi (82.8 MB)
mpg (84.9 MB)


With DARPA support, we have developed an interactive system for the efficient inspection of multiple videos recorded along spatially similar paths, but at different times. The system enables a human sequence analyst to classify and flag automatically detected differences between the two videos.

The system is based on the techniques described in the paper "Video Matching," by Peter Sand and Seth Teller, which appeared in the proceedings of Siggraph 2004. The idea is to determine the frame of the second video that is visually closest to each frame of the primary video. This secondary frame is then warped to match the primary frame, and the difference between the frames is generated as a false color image. The sequence analyst can see at a glance which parts of the scene have changed. (Click on the image for a larger view)

With the addition of geo-reference "tags" to each video, we can search for matching frames far more efficiently, and can "flag" regions where differences exist in a map-based interface. This interface quickly tells the sequence analyst which regions of the map contain areas in which the system detected a substantial difference in the scene between the times that the primary and secondary videos were acquired.

The user begins a new project by selecting the videos to be matched. He then selects the GPS data files for each video. By default the system analyzes the entire videos, but the user has the option to specify only a portion. Now the processing begins. Processing is performed in the background, so the sequence analyst can start viewing the results before the entire process is completed. When processing has completed, the results are displayed in the summary tab.

The route taken in the video is displayed on the map. Segments of the route in which the system detected differences are shown in red, as well as marked by red flags. These same differences are duplicated as a list below the map.

The sequence analyst can either click on a flag on the map or an entry in the list to retrieve more information about this difference. This info is displayed on the right side of the screen.

At the top, the video of the selected segment is shown in a continuous loop. The user can search through the video using the controls below. Below the video, information about the difference found is shown, such as its GPS position and date and time of recording.

In the lower right of the summary tab, as well as in the other tab, are two information windows. The top provides feedback as to what the system is doing, as well as a progress bar. The bottom window displays a description of a system element whenever the mouse pointer hovers over it, providing the user with instruction in using the system without the need to read a manual.

The summary tab provides at a glance the results of the matching, but some users may want to investigate the videos in greater depth. For this the overview tab is used. Displayed at the top left are the two matched source videos, the primary and secondary videos. These videos are synced so that each shows the same view in space, and are viewed using the slider and playback controls below.

Next to the source videos is the difference video. This video is generated by the system. Pixels that changed between the recording of the two videos are white, while pixels that remained the same are black. This video tells the user at a glance what the system detected as differences. The user can increase or decrease the sensitivity of this view by using the threshold slider above.

To help guide the user, there are visual timelines representing the length of each source video. These timelines are color coded to highlight segments in which differences were detected. The colors correspond to a three level threat rating system. Red is for definite hazards, yellow indicates activity that needs further investigation, and green is for little or no difference detected. By default the system flags all differences it finds red.

However, the sequence analyst can override the system’s rating, flagging something as not being a threat or pointing out differences the system missed. This manual flagging not only helps the user organize the information found by the system, but is also used by the system to refine future results.

Here is an example of a clear difference found by our system. The system saw that these trash cans in the secondary video were not present when the primary video was recorded. It flagged this segment as a difference, and the trash cans appear clearly in the difference video. (Click on the image for a larger view)

One of the strengths of our system is that it can detect differences that may easily escape a human observer. In this video, the system detected two people walking down some train tracks. It is very hard to perceive the people by eye, even when comparing the secondary video to the primary video. But the system detected it, and a look at the difference video shows the two people as bright white. The difference video also shows changes in light and shadow as differences; eliminating such false positives is a major area for future work. (Click on the image for a larger view)

The system can also match videos recorded in largely featureless environments, such as deserts. In this example, a simulated explosive device was buried near a fence. Even in this largely homogenous video, the system can still spot the difference, as shown in the difference video. (Click on the image for a larger view)

The current system is only a prototype. It serves as a proof of concept that a system based on computer vision techniques can perform useful change detection across long video sequences acquired in a variety of environments.