Up to this point, I had been trying to make the code from the proposed method ("What went where?", which will be referred to as WWW from now on) work with the google streetview images. After some discussion with Dr. Belongie, I think it is apparent that I need to try a slightly different approach. The main reason being that SIFT seems unable to get good matches on the foreground object. This makes it impossible for the WWW code to detect multiple motion layers. The correspondences that have been detected (see previous post) can only compute one motion layer, which corresponds to the motion of the streetview car.
The new approach I will try consists of:
- Compute correspondences using SIFT.
- Compute a homography using RANSAC
- Detect pixels which do not agree with the homography.
- Apply graph cuts to obtain piece-wise contiguous and smooth regions
For the image sequence I have been working here are the inliers used to compute the homogrpahy, and the warped images after finding the homography:
For step 3 above (detecting pixels that don't agree with the homography), my first guess was to simply compute the difference between the reference image and the warped image, but it appears the computed homography is not very precise, which results in a lot of noise:
As in the WWW paper, I tried a second round of RANSAC with a tighter threshold. The inliers and difference (between reference image and warped image) are shown below.