Xiaodong's tech notes on computer vision and machine learning: 2009-11

Monday, November 30, 2009

papers: prototype theory

In my previous post, "basic level classes and subordinate class", I mentioned Aharon Bar-Hillel's paper Subordinate class recognition using relational object models . Now I am going to build topic models for object hierarchies and need to have a better understanding of the prototype theory by Rosch. Here are some papers I found about this topic:

the seminal paper: Basic Objects in Natural Categories, cognitive psychology 1976. Another link

several blogs on this theory:

http://mixingmemory.blogspot.com/2005/03/basics-of-basic-level.html

http://poorbuthappy.com/ease/archives/2003/11/20/1944/basic-level-categories

http://everything2.com/title/basic+level+categories

Sunday, November 29, 2009

papers: Estimation of Dirichlet Distribution Parameters

Recently, I am interested in apply Pachinko Allocation topic models to the object recognition problems. Mixtures of Hierarchical Topics with Pachinko Allocation, ICML 2007 mentioned several methods in training the hPAM model, and here are the related papers:

Gibbs EM:

An introduction to MCMC for machine learning, Machine Learning 2003
implementations of the Monte Carlo EM Algorithm, 2001

fixed point iteration method for estimating the Dirichlet parameters:

Estimating a Dirichlet distribution, Minka 2000, and code

Maximum Likelihood Estimation of Dirichlet Distribution Parameters by Jonathan Huang

papers: syntax and topic model

Syntactic constraint is an important ingredient in NLP. At the beginning, topic models, such as LDA, assume bag-of-word model and thus ignore the syntax. Later on, this constraint is added to the topic model to improve the modeling power. Here are a few papers regarding this issue:

Integrating topics and syntax, NIPS 2005
Style and Topic Language Model Adaptation Using HMM-LDA, EMNLP 2006
Hidden Topic Markov Models, AISTATS 2007, presentation video
Topic Modeling: Beyond Bag-of-Words, ICML 2006, slides
Syntactic Topic Models, NIPS 2008, supplement materials
Bayesian Modeling of Dependency Trees Using Hierarchical Pitman-Yor Priors, In Proceedings of the Workshop on Prior Knowledge for Text and language (held in conjunction with ICML/UAI/COLT), 2008

Topical N-grams: Phrase and Topic Discovery, with an Application to Information Retrieval, ICML 2007, technical report

paper: Rethinking LDA: Why Priors Matter

Rethinking LDA: Why Priors Matter, Hanna M. Wallach David Mimno Andrew McCallum, NIPS 2009

Abstract:

Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such “smoothing parameters” have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document–topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic–word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hyperparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling.

Saturday, November 28, 2009

paper: On Smoothing and Inference for Topic Models

On Smoothing and Inference for Topic Models, UAI 2009
abstract:

Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the close connections between these approaches. We find that the main differences are attributable to the amount of smoothing applied to the counts. When the hyperparameters are optimized, the differences in performance among the algorithms diminish significantly. The ability of these algorithms to achieve solutions of comparable accuracy gives us the freedom to select computationally efficient approaches. Using the insights gained from this comparative study, we show how accurate topic models can be learned in several seconds on text corpora with thousands of documents.

paper: Multilevel Bayesian Models of Categorical Data Annotation

A paper I found from LingPipe's blog:

Multilevel Bayesian Models of Categorical Data Annotation

It seems to be close related to image annotation. More comments will follow after reading it.

fast and parallel Gibbs sampling for LDA

Gibbs sampling for LDA is very simple to understand and implement, especially the collapsed Gibbs sampling. But one drawbacks of GS is its complexity is linear to the number of word tokens. This problem is even more serious when we apply LDA-based approaches to computer vision problems where we use visual words in images to replace words in documents. To maximize our chance to detect the object in an image, we need large number of visual word tokens. It is more and more popular to extract features at dense regular grids over images, and to one extreme, someone extract features at every pixel with several scales. Also we often need to extract several types of features and hope them to be complementary to each other since we usually do not which type of feature is more useful for a particular object category. Combine these factors together, there are often more than 10k ~ 50k word tokens per image extracted. For Gibbs sampling, this is a nightmare!

So a fast Gibbs sampling or parallel Gibbs sampling are absolutely rescues. There are two such papers recently, with published codes (that is great!):

PLDA: Parallel Latent Dirichlet Allocation for Large-scale Applications by Wang Yi et al at Google, code

Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation at UCI, code

here is a comment from LingPipe's blog:

Porteous et al. (2008) Fast Collapsed Gibbs Sampling for Latent Dirichlet Allocation

Another paper related to topic model inference in large scale corpus is

Efficient Methods for Topic Model Inference on Streaming Document Collections

Friday, November 27, 2009

book: Data Analysis Using Regression and Multilevel/Hierarchical Models

LingPipe's blog
Finkel and Manning (2009) Hierarchical Bayesian Domain Adaptation
mentioned a book
Data Analysis Using Regression and Multilevel/Hierarchical Models.
After going through the table of contents, I found this book may be quite useful for me, especially read it together with Bishop's book PRML.

Thursday, November 26, 2009

paper: Boosted Bayesian Network Classifiers

Jing Yushi has a ICML 2005/ ML2008 paper:
Boosted Bayesian Network Classifiers
It seems very interesting to me. Now I am using topic models to implement my ideas. But generative models usually can not beat discriminative classifiers such as SVM in many cases. It is of interests of the generative guys to combine these two methods to benefit from both. Jing's paper show the boosted version of Naive Bayesian. Can we develop boosted topic models? It is good direction. I Googled and find no such work so far.

Google's image swirl

Jing Yushi @ Google just announced a new Google lab tool: Google Image Swirl, which is built on his previous work VisualRank .
I am really excited by this work, and also admire Yushi has such a good opportunity to realize the ideas in research domain to a real stuff workable on a real-world platform. Sure, there is still long to go but at least we see some light of the dawn.

Sunday, November 8, 2009

Google's new toy

Google's new toy: a portable search panel

If only one word is permitted to used to describe this Google's new toy, it will be "cool". It integrates the techniques of localization (probability with GPS or other instruments), image retrieval, OCR, object recognition, and combines with Google's cloud computing capability. OMG, it seems to me part of computer visioner's dream will soon come true, and many of us will lose our jobs on the other hand :(

Later on, I found this is a product concept design, not a Google product. This is the link of the author's blog.
Future of Internet Search: Mobile Vision

Saturday, November 7, 2009

A clever way to derive the collapsed Gibbs sampling for LDA

Inspired by Tom Griffiths technical report, Gibbs Sampling in the Generative Model of Latent Dirichlet Allocation, I derive the collapsed Gibbs sampling for LDA by myself using the tricks in Tom's report. These tricks are universally applicable to other topic models:

simplify the conditional property by employing Bayes theorem and
d-separation property
derive the results of conditional probability directly from the result of
the predictive likelihood of Dirichlet/multinomial distribution

There is no lengthy and complex computation like those in Wang Yi's note or Gregor Heinrich's note. It is easy to understand and has intuitive explanation for the formulas involved. I wrote up a report for my derivation, as a complementary to Tom's note:
Derivation of Collapsed Gibbs Sampling for LDA

Xiaodong's tech notes on computer vision and machine learning