Xiaodong's tech notes on computer vision and machine learning: 2011-11

More than one week passed after coming back from ICCV, Barcelona, I finally finished my summary of ICCV papers that I am interested. There are totally 15 papers in 5 topics. Certainly there are many more ICCV paper worth to read. I will update my summary later.

Attributes:

In ICCV this year, attributes continue to attract interests from researchers in the community. In particular, the paper “Relative Attributes” wins the Marr prize.

Relative Attributes (Marr prize paper)
Devi Parikh, Kristen Grauman

Idea: relative attribute can provide more informative and intuitive description for images, which overcome many restrictions of binary attributes. For example, it is more useful to say “Bill Clinton is younger than George H.W.Bush” than to say “Bill Clinton is young”. The latter is a binary attribute which is often difficult to judge true or false since it is a subjective judgment in many cases; while the former is a relative attribute which is more objective and easier to judge. This paper describe an approach to model the relative attributes as a ranking function and then apply it to zero-shot learning and textual description of images, and shows clear advantage over traditional binary attributes.

Describing People: A Poselet-Based Approach to Attribute Classification (oral)
Lubomir Bourdev, Subhransu Maji, Jitendra Malik

Idea: This paper applies poselet representation to recognize attributes of human, such as gender, hair style and types of cloths. It is generally a difficult task to recognize these human attributes due to the large variations of pose, viewpoint, etc. The poselet representation implicitly decomposes the aspect, i.e., the pose and viewpoint, and thus facilitates the detection of human attributes.

A Joint Learning Framework for Attribute Models and Object Descriptions (oral)
Dhruv Mahajan, Sundararajan Sellamanickam, Vinod Nair

Idea: This paper proposes to jointly learn attribute classifiers and the attribute labels. This method eliminates the requirement of labeling the attributes in images. Giving a list of attribute names, some positive and negative training examples for each of the binary attribute classifiers and some training images of various objects with known class labels but not their attribute labels, the proposed method can automatically learn an attribute vector for each object class. An interesting finding of this research is that it detects lots of erroneous attribute labels in the existing dataset and find the performance of classification can be boosted after CORRECTing these erroneous attributes.

Depth Image and Kinect:

There are a number of ICCV papers this year study various problems related to depth image, possibly due to the widely available Kinect sensor.

Kinecting the dots: Particle Based Scene Flow From Depth Sensors

Simon Hadfield; Richard Bowden

Summary: The theme of this paper is to estimate scene flow, the 3-D motion field of an observed scene, as opposed to optical flow in 2-D field. The point motion in 3D is modeled as a collection of particle filters that support multiple hypotheses and does not oversmooth the motion field.

Efficient Regression of General-Activity Human Poses from Depth Images
Ross Girshick; Jamie Shotton; Pushmeet Kohli; Antonio Criminisi; Andrew Fitzgibbon

Summary: This paper employs random forest regression to directly estimate the human pose without segmentation of body parts; several techniques are proposed to speed up the regression process which enable super-realtime test performance.

Accurate 3D Body Pose Estimation From a Single Depth Image
Mao Ye; Xianwang Wang; Ruigang Yang; Liu Ren; Marc Pollefeys

Summary: use pre-captured motion exemplars to estimate the body pose in an depth image and then refine the results by fitting the body configurations with the input depth image.

A Data-Driven Approach for Real-Time Full Body Pose Reconstruction from a Depth Camera
Andreas Baak; Meinard Mueller; Gaurav Bharaj; Hans-Peter Seidel; Christian Theobalt

Summary: similar to the above paper, this paper also use a pose database to facilitate the problem of pose estimation from depth images.

This are two interesting demos using Kinect sensor:

KinectFusion: Real-time 3D tracking, reconstruction and Interaction with a depth camera S.Izadi, R.Newcombe et al.

Seeing Your Weight – An application in targeted advertisement T.Van Nguyen, S.Yan

Random Forest:

Random Forest is an ensemble of decision trees, where each decision tree is slightly different from one another. The randomness among different decision trees are achieved by being train with either different subset of training data or different subset of parameter space. Random Forest classifier can achieve similar max-margin like behavior like SVM with less computational cost. Random Forest has been also widely used in many other problems, such as regression, density estimation, manifold, semi-supervised learning, etc. There is one-day tutorial on Random Forest in ICCV this year. The PPT and technical report can be downloaded online at http://research.microsoft.com/en-us/groups/vision/decisionforests.aspx

The following ICCV papers are related to Random Forest:

Structured Class-Labels in Random Forests for Semantic Image Labelling (Oral)
Peter Kontschieder, Samuel Rota Bulò, Horst Bischof, Marcello Pelillo

Task: image labeling, ie., classify the pixels in an image to an object class label

Idea: Using Random Forest classifier, but each internal node is a classifier for an image patch rather than a classifier for an image pixel. The advantage of this approach is that the spatial context is represented in the classifier by the image patch and thus in testing the classifier can produce much smooth and coherent image labels

Decision Tree Fields (Oral)
Sebastian Nowozin, Carsten Rother, Shai Bagon, Bangpeng Yao, Toby Sharp, Pushmeet Kohli

Task: image labeling

Idea: A CRF using the Decision Tree to model the mapping between the image and the parameters of a unary or pairwise interaction in the graphical model. The advantage of using Decision Tree to model this mapping is (1) it is non-parametric so can represent richer relationships; (2) more scalable to large training data

Action and Activity:

Human Action Recognition by Learning Bases of Action Attributes and Parts (oral)
Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas J. Guibas, Li Fei-Fei

Summary: the idea is similar to the paper of scene attribute published last NIPS and ECCV workshop by Jia-Li Li of their group. This sparse coding method is used to learn sparse bases of action attributes in static images.

Learning Spatiotemporal Graphs of Human Activities (oral)
William Brendel, Sinisa Todorovic

Summary: a volumetric representation of human activities is presented. A video is decomposed into a collection of spatiotemporal tubes at multiple scales. These tubes are connected by a graph model which represents the temporal and spatial constraints among these tubes. This paper address the issues how to extract/learn and match/recognize these 2D+t tubes and their configuration for activity recognition in video.

A "String of Feature Graphs" Model for Recognition of Complex Activities in Natural Videos
Utkarsh Gaur, Yingying Zhu, Bi Song, Amit Roy-Chowdhury

Summary: STIP features are grouped together to form a feature graph that represent the spatial configuration in a single image; and the feature graphs in consecutive frames are linked to form a “String of Feature Graphs” to represent the temporal dynamics. Activity recognition in videos is then cast as a problem to match these feature graphs and the string of feature graphs.

Human Activity Prediction: Early Recognition of Ongoing Activities from Streaming Videos
Michael Ryoo

Summary: this paper discussed an interesting problem: how to recognize the human activity in a video stream without seeing the all-length video. The solution presented in this paper is a dynamic bag-of-words model, which is similar to dynamic time warping.

Machine learning:

Perturb-and-MAP Random Fields: Using Discrete Optimization to Learn and Sample from Energy Models (oral)
George Papandreou, Alan Yuille

Task: how to avoid the costly MCMC sampling in energy models (e.g., MRF) while maintain accuracy in parameter estimating

Idea: Do MAP after adding randomness (i.e., noise) in the parameter: MAP is faster while the randomness simulates the power of MCMC

Discriminative Learning of Relaxed Hierarchy for Large-scale Visual Recognition [Supplemental materials] (oral)
Tianshi Gao, Daphne Koller

Task: learning hierarchical classifier for large scale image database with hundreds or thousands of visual classes

Idea: hierarchical classifier is a natural solution for large scale visual recognition. The problem is that it is often not able to divide the visual classes nicely and there are often ambiguous classes in-between. The idea of this paper is very simple and smart: treat the ambiguous classes as if they do not exist (i.e., relaxed hierarchy). This idea can be modeled in a modified SVM where these ambiguous classes are labeled as 0 (while positive classes are labeled as +1 and negative classes are labeled as -1) so they will not take effect in computing the loss but still constraint the parameter space.

Xiaodong's tech notes on computer vision and machine learning

Wednesday, November 30, 2011

RE: DIY low cost 3D laser scanner

自制低成本3D激光扫描测距仪(3D激光雷达)

http://www.csksoft.net/blog/post/lowcost_3d_laser_ranger_1.html

http://www.csksoft.net/blog/post/lowcost_3d_laser_ranger_2.html

Wednesday, November 23, 2011

15 ICCV'11 paper I am interested in

Labels

Blog Archive

About Me

My Blog List