Wednesday, December 8, 2010

Ph.D. proposal exam passed

I just passed my Ph.D. proposal exam on Monday, Dec. 6th, 2010. The title of the proposal is "Bridging the Semantic Gap: Image and Video Understanding by Integrating Vision and Language". The hypothesis of this thesis is that many vision problems can not be solved solely based on the visual data. Knowledge and reasoning process need to be integrated in the loop, which are provided by language. This idea is embodied by several recent projects, as described in my research web page.


Now I need to work hard to finish the rest work towards my Ph.D. Hopefully, they can be done in next year.

Twenty Questions Game and Object Recognition

Objects can be defined by many features/parts/attributes, each of which can be viewed as a test. The problem of object recognition/detection is then solved by combining outputs of these tests. Instead of performing all possible tests, a smart way is to select a small set of tests without sacrificing the recognition quality/accuracy. The process of selectinvision.ucsd.edu/sites/default/files/Visipedia20q.pdfg the right tests can be formulated as a 20-question game, and the recognition of object is achieved by sequentially asking a question to an Oracle, and analyzing the results returned by the Oracle. The criterion of selecting next question is the information gain brought by the answer of the question. This approach is also called "Active Testing" in "An Active Testing Model for Tracking Roads in Satellite Images", PAMI 1996.

So far, the earliest work using this idea for object recognition is due to Donald Geman of JHU, described in his 1993 technical report "Shape Recognition and Twenty Questions". Each test is a local functional of the image loosely corresponding to configurations (vertex labels) resembling "endings", "junctions", and "turns", or a invariant relations (relational labels) between two vertex labels, i.e., "same class", "same orientation".

The most recent work is "Active Testing for Face Detection and Localization", PAMI 2010, "Visual Recognition with Humans in the Loop", ECCV2010a, and "Indoor Scene Recognition Through Object Detection Using Adaptive Objects Search", ECCV 2010b. In the PAMI 2010 paper, the tests are specific type of image functional (i.e., proportion of edges in particular orientation and scale) within a local region. In the ECCV 2010b paper, the tests are object detectors. In the ECCV 2010a paper, the tests are object attributes while the Oracle is human.

This idea can be extended in many aspects. In the application domain, it can be used in scene and activity recognition; regarding the questions to ask, we can ask many richer questions besides What, e.g., Where, How Many, How Big, etc. We are currently investigating these problems.

Wednesday, December 1, 2010

Microsoft Kinect: the next generation of HCI device?

With the release of Kinect, Microsoft becomes a star in the eyes of researchers of computer vision and HCI. It is really a cool idea to control your computer with your hands and body, without attaching/holding any other devices. It provides us with numerous possibilities. I think there will be boom of games and VR/AR applications using Kinect in the next few years.

Kinect for Xbox 360 review

Open source Kinect driver

Kinect's open-source ambitions

Wednesday, November 17, 2010

Two extreme views on object attributes in the community

I have a paper on attribute-based transfer learning for object categorization in ECCV this year. So I am very curious about the views or attitudes of the community on this topic. During this ECCV, there is a one-day workshop on this topic. At the end of this workshop, there is a panel discussion about this topic among five leading researchers in the computer vision community. It turns out that there are two views on this topic which occupy the two extremes of the spectrum. The followings are summaries of their personal views on object attributes: 
  • Malik doesn’t favorite attributes. He said “vision should not be hijacked by language”
  • Mata doesn’t favorite attributes. He said “my dog can recognize as good as the state-of-the-art computer vision algorithms or even better without language”
  • Hoiem considers attributes a way to go beyond recognition for image understanding, i.e., describing objects and scene
  • Fei-Fei considers attributes as a knowledge
  • Lampert considers attributes as a way to transfer knowledge to the vision system
Overall, there is neither clear definition on attributes nor consensus in the community. It is still a controversial topic. But  it may be a hot research topic in the next a few years. In this ECCV, there are three papers about attributes.
  • Automatic Attribute Discovery and Characterization from Noisy Web Data 
    • Idea:  mining text and image data sampled from the Internet
    • Motivation: product images online are often accompanied texts describing their attributes, such as color, parts, functions, etc.
  • A Discriminative Latent Model of Object Classes and Attributes
  • Attribute-based Transfer Learning for Object Categorization with Zero or One Training Example (my paper)

Monday, February 22, 2010

Recent papers on large scale image search

Here are some recent papers on large scale image search:

By INRIA, Schmid's group:
Improving Bag-of-Features for Large Scale Image Search, IJCV 2010
Recent Advances in Large Scale Image Search, ETVC 2008

By Google:
VisualRank Applying PageRank to Large-Scale Image Search, PAMI 2008

By Microsoft:
Bundling Features for Large Scale Partial-Duplicate Web Image Search, CVPR 2009
A Multi-Sample, Multi-Tree Approach to Bag-of-Words Image Representation, ICCV 2009

Saturday, January 23, 2010

吴军的《数学之美》系列



  1. 数学之美 一 统计语言模型
  2. 数学之美 二 谈谈中文分词
  3. 数学之美 三 隐含马尔可夫模型在语言处理中的应用
  4. 数学之美 四 怎样度量信息?
  5. 数学之美 五 简单之美:布尔代数和搜索引擎的索引
  6. 数学之美 六 图论和网络爬虫 (Web Crawlers)
  7. 数学之美 七 信息论在信息处理中的应用
  8. 数学之美 八 贾里尼克的故事和现代语言处理
  9. 数学之美 九 如何确定网页和查询的相关性
  10. 数学之美 十 有限状态机和地址识别
  11. 数学之美 十一 Google 阿卡 47 的制造者阿米特.辛格博士
  12. 数学之美 十二 余弦定理和新闻的分类
  13. 数学之美 十三 信息指纹及其应用
  14. 数学之美 十四 谈谈数学模型的重要性
  15. 数学之美 十五 繁与简 自然语言处理的几位精英
  16. 数学之美 十六 不要把所有的鸡蛋放在一个篮子里 最大熵模型
  17. 数学之美 十七 闪光的不一定是金子 谈谈搜索引擎作弊问题(Search Engine Anti-SPAM)
  18. 数学之美 十八 矩阵运算和文本处理中的分类问题
  19. 数学之美 十九 马尔可夫链的扩展 贝叶斯网络 (Bayesian Networks)
  20. 数学之美 二十 自然语言处理的教父 马库斯
  21. 数学之美 二十一 布隆过滤器(Bloom Filter)
  22. 数学之美 二十二 由电视剧《暗算》所想到的 — 谈谈密码学的数学原理
  23. 数学之美 二十三 输入一个汉字需要敲多少个键 — 谈谈香农第一定律
  24. 数学之美 二十四 从全球导航到输入法——谈谈动态规划

推荐:编程珠玑番外篇

在徐宥的博客上看到了他写的《编程珠玑番外篇》系列文章,觉得非常值得收藏。下面是这个系列的文章列表:

Wednesday, January 6, 2010

object recognition in iPhone

I am interested in applications of object recognition on iPhone and other smart phones recently. Having studied this research topic for years, I am eager to apply what I have learned to do some really cool things. Smart phones, such as iPhone, provide us such an excellent platform. I Goggled "iPhone object recognition", and find lots of cool stuffs:

The Future of the iPhone: Intelligent Object Recognition
The “eye-Phone” Image-Recognition System
Amazon Releases Amazon Mobile, Includes Object Recognition