Wednesday, August 26, 2009

evolutions of topic models for modling document and image

I summarized the evolutions of the topic models for modeling documents and images using the above figure. Here are the notations:
  • NB-BoW: Naive Bayes bag of words, i.e., mixture of unigram
  • pLSA: probabilistic latent semantic analysis
  • LDA: latent Dirichlet allocation
  • FMM: finite mixture model
  • FHMM: finite hierarchical mixture model
  • DPMM: Dirichlet mixture model
  • HDP: hierarchical Dirichlet process mixture model
The texts on the arrows mean the changes that need to be done to evolve from one model to the other:
  • w-> x denotes to generalize the word x from a categorical variable to a
    real variable, x, that can be either discrete or continuous
  • hierarcy denotes to add a hierarchy to the original model
  • K topics denotes to extend from one topic per document to multiple topics
  • K -> \infty denotes to derive an in finite limit of the original model.
This is not a complete summary. For example, HMM is not included.

Some interesting observations: there are three paths to evolve a NB-BoW model to a
HDP-MM model, which we always need to perform all the above four extensions
no matter which path we choose.

No comments:

Post a Comment