Mac Zulu: Another Review of Dalal & Triggs

Writeup HOG Descriptor from the blog of Yet Another Blogger,

Begin copied blog post.

It gives a working example on choosing of various modules at the recognition pipeline for human figure (pedestrians).

Much simplified summary

It uses Histogram of Gradient Orientations as a descriptor in a 'dense' setting. Meaning that it does not detect key-Points like SIFT detectors (sparse). Each feature vector is computed from a window (64x128) placed across an input image. Each vector element is a histogram of gradient orientations (9 bins from 0-180 degrees, +/- directions count as the same). The histogram is collected within a cell of pixels (8x8). The contrasts are locally normalized by a block of size 2x2 cells (16x16 pixels). Normalization is an important enhancement. The block moves in 8-pixel steps - half the block size. Meaning that each cell contributes to 4 different normalization blocks. A linear SVM is trained to classify whether a window is human-figure or not. The output from a trained linear SVM is a set of coefficient for each element in a feature vector.

I presume Linear SVM means the Kernel Method is linear, and no projections to higher dimension. The paper by Hsu, et al suggests that linear method is enough when the feature dimension is already high.

OpenCV implementation (hog.cpp, objdetect.hpp)

The HOGDescriptor class is not found in the API documentation. Here is notable points judging by the source code and sample program(people_detect.cpp):

Comes with a default human-detector. It says at the file comment that it is "compatible with the INRIA Object Detection and Localization toolkit. I presume this is a trained linear SVM classifier represented as a vector of coefficients;
No need to call SVM code. The HOGDescriptor.detect() function simply uses the coefficients on the input feature-vector to compute the weight-sum. If the sum is greated than the user specified 'hitThreshold' (default to 0), then it is a human-figure.
'hitThreshold' argument could be negative.
'winStride' argument (default 8x8)- controls how the window is slide across the input window.
detectMultiScale() arguments

'groupThreshold' pass-through to cv::groupRectangles() API - non-Max-Suppression?
'scale0' controls how much down-sampling is performed on the input image before calling 'detect()'. It is repeated for 'nlevels' number of times. Default is 64. All levels could be done in parallel.

Sample (people_detect.cpp)

Uses the built-in trained coefficients.
Actually needs to eliminate for duplicate rectangles from the results of detectMultiScale(). Is it because it's calling to match at multiple-scales?
detect() return list of detected points. The size is the detector window size.

Observations

With GrabCut BSDS300 test images - only able to detect one human figure (89072.jpg). The rest could be either too small or big or obscured. Interestingly, it detected a few long-narrow upright trees as human figure. It takes about 2 seconds to process each picture.
With GrabCut Data_GT test images - able to detect human figure from 3 images: tennis.jpg, bool.jpg (left), person5.jpg (right), _not_ person7.jpg though. An interesting false-positive is from grave.jpg. The cut-off tomb-stone on the right edge is detected. Most pictures took about 4.5 seconds to process.
MIT Pedestrian Database (64x128 pedestrian shots):

The default HOG detector window (feature-vector) is the same size as the test images.
Recognized 72 out of 925 images with detectMultiScale() using default parameters. Takes about 15 ms for each image.
Recognized 595 out of 925 images with detect() using default parameters. Takes about 3 ms for each image.
Turning off gamma-correction reduces the hits from 595 to 549.

INRIA Person images (Test Batch)

(First half) Negative samples are smaller in size at (1 / 4) of Positives, 800 - 1000 ms, the others takes about 5 seconds.
Are the 'bike_and_person' samples there for testing occlusion?
Recognized 232/288 positive images. 65 / 453 negative images - Takes 10-20 secs for each image.
Again cut-off boxes resulting in long vertical shape becomes false positives
Lamp Poles, Trees, Rounded-Top Extrances, Top part of a tower, long windows are typical false positives. Should upright statue considered 'negative' sample?
Picked a few false-negatives to re-run with changing parameters. I picked those with large human-figure and stands mostly upright. (crop_00001.jpg, crop001688.jpg, crop001706.jpg, person_107.jpg).

Increased the nLevels from default(64) to 256.
Decrease 'hitThreshold' to -2: a lot more small size hits.
Half the input image size from the original.
Decrease the scaleFactor from 1.05 to 1.01.
Tried all the above individually - still unable to recognize the tall figure. I suppose this has something to do with their pose, like how they placed their arms.

Resources

MIT Pedestrian Database: http://cbcl.mit.edu/cbcl/software-datasets/PedestrianData.html
INRIA Toolkit http://pascal.inrialpes.fr/soft/olt/ ; DataSet: http://pascal.inrialpes.fr/data/human/ (There are links to other image databases)
More INRIA images: http://lear.inrialpes.fr/data
Fast Alternative Site for INRIA image: http://yoshi.cs.ucla.edu/yao/data/PASCAL_human/
SVM Light : http://svmlight.joachims.org/ (free for scientific use)
Wikipedia on this topic: http://en.wikipedia.org/wiki/Histogram_of_oriented_gradients
Histograms of Oriented Gradients for Human Detection, Dalal & Triggs.
A Practical Guide to Support Vector Classifier, Hsu, Chang & Lin

End copied blog post.

Mac Zulu

Monday, August 22, 2011

Another Review of Dalal & Triggs

No comments:

Post a Comment