Foreground object removal: January 2010

Wednesday, January 27, 2010

Odd results and some runtime errors

At this point, I am trying to debug the Josh Wills' motion segmentation code.

After altering the code to use SIFT correspondences, I encountered a few bugs:


??? Attempted to access IndexSam(4997,:); index out of bounds because size(IndexSam)=[4996,4].

Error in ==> get_warps at 56
choice=IndexSam(count,:);

From reading the code, it appears this function is using RANSAC to estimate the homography. I changed a hard-coded iters variable to a smaller number to resolve this error, though I'm not sure if this would have any unwanted side-effect.

After resolving this error, I also received a segmentation fault that crashed matlab. The error message was:


------------------------------------------------------------------------
Segmentation violation detected at Tue Jan 26 22:42:38 2010
------------------------------------------------------------------------
...
Stack Trace:
[0] creategraph.mexglx:0x0443688e(0x0bcb5660 "test1.graph", 0xad7bc010, 0xafcc1010 ", 0xa1cfa010)
...

However, on a different machine, I get a different error:


./smooth: /usr/local/matlabr2008b/sys/os/glnx86/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by ./smooth)

Despite the error, some figures are produced, though it appears it just warped the images. See the output images below.

Monday, January 25, 2010

A little improvement

To continue debugging the correspondence issue, I tested with images that had been cropped to contain a small portion of the original image. See correspondence results below.

The correspondences here are good, so the SIFT correspondences obviously work ok for this type of image. For images containing more of the original scene, I tried varying the threshold parameter of the matching function. A higher threshold seems to work better, see results below.

This did improve the quality of the correspondences. However, there are hardly no correspondences on the foreground object (the person and the bike), so this might make the motion segmentation method fail.

Friday, January 22, 2010

Debugging correspondences

To make sure it wasn't a bug in my code, I tested with some images from Andrew Zisserman (found here). See the results below:

This is what you would expect to see, so it appears the poor results on the google streetview data is not because of a bug in my code.

Friday, January 15, 2010

Feature correspondences

One thing I've been trying to do is change the descriptors used in the proposed method to something more modern. The proposed method currently uses Harris corner detector for interest points, and a filter bank as a descriptor. Something more modern would be the SIFT descriptor (stands for Scale Invariant Feature Transform). I've been using the implementation from vlfeat.org . A comparison of correspondences can be found here: http://picasaweb.google.com/arflobow/Correspondences?feat=directlink . These images were manually cropped from random locations in google maps.

One problem is that the proposed method uses "perturbed interest points":

According to the principle of perturbation, a stable system will remain at or near equilibrium even as it is slightly modified. The same holds true for stable matches. To take advantage of this principle, we dilate the interest points to be disks with a radius of rp , where each pixel in the disk is added to the list of interest points. This allows the correct matches to get support from the points surrounding a given feature while incorrect matches will tend to have almost random matches estimated for their immediate neighbors, which will not likely contribute to a widely supported warp.

Which brings up a question, if SIFT is to be used instead of the filter bank descriptor, should the "perturbed" points receive a copy of the original descriptor? The implementation from vlfeat.org provides a function for a dense sift descriptor (see http://vlfeat.org/mdoc/VL_DSIFT.html ) though the wording on this function is a little cryptic, so I'm not sure if it is the right thing to use.

I ran the code for the proposed method on some of the streetview images I manually cropped, the results can be seen here: http://picasaweb.google.com/arflobow/DropBox#

I've been looking into other papers that deal with Google streetview images. In vision.ucla.edu/papers/lee09.pdf they use a structure from motion structure from motion filter, described here. The UCLA paper is about 3D reconstruction, though the discussion on the SFM filter was interesting. It has been hard to find truly relevant papers though.