Xiaodong's tech notes on computer vision and machine learning: Twenty Questions Game and Object Recognition

Wednesday, December 8, 2010

Twenty Questions Game and Object Recognition

Objects can be defined by many features/parts/attributes, each of which can be viewed as a test. The problem of object recognition/detection is then solved by combining outputs of these tests. Instead of performing all possible tests, a smart way is to select a small set of tests without sacrificing the recognition quality/accuracy. The process of selectinvision.ucsd.edu/sites/default/files/Visipedia20q.pdfg the right tests can be formulated as a 20-question game, and the recognition of object is achieved by sequentially asking a question to an Oracle, and analyzing the results returned by the Oracle. The criterion of selecting next question is the information gain brought by the answer of the question. This approach is also called "Active Testing" in "An Active Testing Model for Tracking Roads in Satellite Images", PAMI 1996.

So far, the earliest work using this idea for object recognition is due to Donald Geman of JHU, described in his 1993 technical report "Shape Recognition and Twenty Questions". Each test is a local functional of the image loosely corresponding to configurations (vertex labels) resembling "endings", "junctions", and "turns", or a invariant relations (relational labels) between two vertex labels, i.e., "same class", "same orientation".

The most recent work is "Active Testing for Face Detection and Localization", PAMI 2010, "Visual Recognition with Humans in the Loop", ECCV2010a, and "Indoor Scene Recognition Through Object Detection Using Adaptive Objects Search", ECCV 2010b. In the PAMI 2010 paper, the tests are specific type of image functional (i.e., proportion of edges in particular orientation and scale) within a local region. In the ECCV 2010b paper, the tests are object detectors. In the ECCV 2010a paper, the tests are object attributes while the Oracle is human.

This idea can be extended in many aspects. In the application domain, it can be used in scene and activity recognition; regarding the questions to ask, we can ask many richer questions besides What, e.g., Where, How Many, How Big, etc. We are currently investigating these problems.

Xiaodong's tech notes on computer vision and machine learning

Wednesday, December 8, 2010

Twenty Questions Game and Object Recognition

No comments:

Post a Comment

Labels

Blog Archive

About Me

My Blog List