Thursday, March 4, 2010

Interesting paper

"Seam carving for content-aware image resizing", uses the same technique James Davis uses, but for a different application. See the "Object removal" section (section 4.6) for another interesting application. I may play around with this to reduce the artifacts after removing people.

Sunday, February 28, 2010

Preliminary pedestrian removal results

Currently, this is my pipeline to remove pedestrians:
  1. Compute homographies between two views (I1 and I2) using SIFT and RANSAC.
  2. Detect pedestrians on both views (with bounding boxes).
  3. Warp pedestrian bounding boxes using homography from step 1, determine overlap (if any).
  4. Use method proposed by James Davis to obtain a dividing boundary in overlap region.
  5. Replace pixels where a pedestrian is detected with pixels from the other (warped) view using the boundary from step 4.
Some results.

Thursday, February 25, 2010

How google does surface normals

Apparently, google does use 3D laser point clouds to estimate surface normals (or building facades as they call them), see this quote from http://google-latlong.blogspot.com/2009/06/introducing-smart-navigation-in-street.html :

We have been able to accomplish this by making a compact representation of the building facade and road geometry for all the Street View panoramas using laser point clouds and differences between consecutive pictures.

I've checked the google api to see if there is some way to extract the surface normal at each pixel, but so far have not been able to find that information.

Monday, February 22, 2010

Pedestrian detection

Due to the lack of progress with the proposed method, I decided to simplify and restrict the problem to removal of pedestrians only. From a privacy standpoint, this would be a step beyond the face blurring that google already does. I have been testing with pedestrian detection code by Bastian Leibe. Some results on google streetview data can be seen here. These results are inconsistent, but I think I can achieve better results by tuning the detector parameters to google streetview data.

I also tried pedestrian detection software by Liming Wang, but this proved to be too slow for my purposes (took about an hour on one image).

Thursday, February 18, 2010

A relevant paper

Found a paper that may be useful: "Piecewise Planar City 3D Modeling from Street View Panoramic Sequences". The focus of this paper is 3D modeling, but they mention a multi-view stereo technique for dense depth estimation. This may make it possible to remove foreground objects based on histogram analysis of pixel depths (similar to the way foreground objects were removed in Dr. Zakhor's work, which was the inspiration for this project).

Wednesday, February 10, 2010

Hole filling (and reading)


As an initial experiment, I manually selected a rectangle (containing the person in one view) and filled it in with corresponding pixels from the other view. The results are shown above. The window and wall-ground borders do not line up perfectly, so it seems that further refinement in the homography estimation is needed. Another problem in this particular set of images is that there are multiple "foreground" objects (the bike, the parking meter, the bike rack). Also, for some of the pixels in the manually selected rectangle, the corresponding pixels in the other view included the foreground object (the person) I wanted to remove, so I need to find a way to detect and handle this possibility.

Since it is likely I will need to incorporate one more view of the scene, I have been reading about the trifocal tensor in Hartley and Zisserman and MaSKS. I've also been reading some material on how to further refine the estimated homography here.

Monday, February 1, 2010

A new direction

Up to this point, I had been trying to make the code from the proposed method ("What went where?", which will be referred to as WWW from now on) work with the google streetview images. After some discussion with Dr. Belongie, I think it is apparent that I need to try a slightly different approach. The main reason being that SIFT seems unable to get good matches on the foreground object. This makes it impossible for the WWW code to detect multiple motion layers. The correspondences that have been detected (see previous post) can only compute one motion layer, which corresponds to the motion of the streetview car.

The new approach I will try consists of:
  1. Compute correspondences using SIFT.
  2. Compute a homography using RANSAC
  3. Detect pixels which do not agree with the homography.
  4. Apply graph cuts to obtain piece-wise contiguous and smooth regions
For the image sequence I have been working here are the inliers used to compute the homogrpahy, and the warped images after finding the homography:



For step 3 above (detecting pixels that don't agree with the homography), my first guess was to simply compute the difference between the reference image and the warped image, but it appears the computed homography is not very precise, which results in a lot of noise:
As in the WWW paper, I tried a second round of RANSAC with a tighter threshold. The inliers and difference (between reference image and warped image) are shown below.


Wednesday, January 27, 2010

Odd results and some runtime errors

At this point, I am trying to debug the Josh Wills' motion segmentation code.

After altering the code to use SIFT correspondences, I encountered a few bugs:


??? Attempted to access IndexSam(4997,:); index out of bounds because size(IndexSam)=[4996,4].

Error in ==> get_warps at 56
choice=IndexSam(count,:);


From reading the code, it appears this function is using RANSAC to estimate the homography. I changed a hard-coded iters variable to a smaller number to resolve this error, though I'm not sure if this would have any unwanted side-effect.

After resolving this error, I also received a segmentation fault that crashed matlab. The error message was:


------------------------------------------------------------------------
Segmentation violation detected at Tue Jan 26 22:42:38 2010
------------------------------------------------------------------------
...
Stack Trace:
[0] creategraph.mexglx:0x0443688e(0x0bcb5660 "test1.graph", 0xad7bc010, 0xafcc1010 ", 0xa1cfa010)
...


However, on a different machine, I get a different error:

./smooth: /usr/local/matlabr2008b/sys/os/glnx86/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by ./smooth)
Despite the error, some figures are produced, though it appears it just warped the images. See the output images below.





Monday, January 25, 2010

A little improvement

To continue debugging the correspondence issue, I tested with images that had been cropped to contain a small portion of the original image. See correspondence results below.The correspondences here are good, so the SIFT correspondences obviously work ok for this type of image. For images containing more of the original scene, I tried varying the threshold parameter of the matching function. A higher threshold seems to work better, see results below.

This did improve the quality of the correspondences. However, there are hardly no correspondences on the foreground object (the person and the bike), so this might make the motion segmentation method fail.

Friday, January 22, 2010

Debugging correspondences

To make sure it wasn't a bug in my code, I tested with some images from Andrew Zisserman (found here). See the results below:
This is what you would expect to see, so it appears the poor results on the google streetview data is not because of a bug in my code.

Friday, January 15, 2010

Feature correspondences

One thing I've been trying to do is change the descriptors used in the proposed method to something more modern. The proposed method currently uses Harris corner detector for interest points, and a filter bank as a descriptor. Something more modern would be the SIFT descriptor (stands for Scale Invariant Feature Transform). I've been using the implementation from vlfeat.org . A comparison of correspondences can be found here: http://picasaweb.google.com/arflobow/Correspondences?feat=directlink . These images were manually cropped from random locations in google maps.

One problem is that the proposed method uses "perturbed interest points":

According to the principle of perturbation, a stable system will remain at or near equilibrium even as it is slightly modified. The same holds true for stable matches. To take advantage of this principle, we dilate the interest points to be disks with a radius of rp , where each pixel in the disk is added to the list of interest points. This allows the correct matches to get support from the points surrounding a given feature while incorrect matches will tend to have almost random matches estimated for their immediate neighbors, which will not likely contribute to a widely supported warp.

Which brings up a question, if SIFT is to be used instead of the filter bank descriptor, should the "perturbed" points receive a copy of the original descriptor? The implementation from vlfeat.org provides a function for a dense sift descriptor (see http://vlfeat.org/mdoc/VL_DSIFT.html ) though the wording on this function is a little cryptic, so I'm not sure if it is the right thing to use.

I ran the code for the proposed method on some of the streetview images I manually cropped, the results can be seen here: http://picasaweb.google.com/arflobow/DropBox#

I've been looking into other papers that deal with Google streetview images. In vision.ucla.edu/papers/lee09.pdf they use a structure from motion structure from motion filter, described here. The UCLA paper is about 3D reconstruction, though the discussion on the SFM filter was interesting. It has been hard to find truly relevant papers though.

Tuesday, January 5, 2010

Introduction


Street view from one point of view (POV).

Street view from a different POV

Desired output (MSPaint)


The method I will use is motion segmentation. However, there is a potential setback.