Wednesday, August 26, 2009

Gibbs sampling for LDA inference

I read Blei's paper of Latent Dirichlet Allocation a few months ago. Now I decide to implement it by my own. Though there are a lot of free LDA codes online, I feel I can only grasp it, especially the technique details, after coding myself. My goal is to understand and apply non-parametric Bayesian hierarchy model, such as HDP (hierarchical Dirichlet process) and DPMM (Dirichlet process mixture model), to my research. LDA, a finite dimension topic model, is a good starting point.

In Blei's paper, the inference of LDA is done by variational methods. I decide to go with the Gibbs algorithm, because it is easier to apply Gibbs algorithm to HDP and DPMM.

I found a good technique note about Gibbs for LDA from Wang Yi:

Distributed Gibbs Sampling of Latent Topic Models: The Gritty Details


Wang provides detailed derivation of Gibbs for LDA in his note. Many thanks!
There is also a note about Gibbs sampling of mixture models on this page, which can be used to study the Gibbs sampling from a simpler case.

Another good stuff is:
Parameter estimation for text analysis

There are a lots of LDA codes online, just name a few:

* Blei's LDA in C
* Yee Whye Teh's Gibbs LDA Matlab codes, Gibbs LDA is a one of the course assignment
* Mark Steyvers and Tom Griffiths's topic modeling matlab toolbox
* GibbsLDA++: gibbslda.sourceforge.net/
* Gregor Heinrich's LDA-J

BTW: Nowadays, the papers published on conferences and journals rarely contains the full details of the techniques. In my opinion, this is a very bad habit or tradition. If there is page limitations on the published papers, the authors at least can put the technique details in a technique report and put it online, together with their codes. But only a few authors are willing to do so. Thus it is very hard for a newbies to become an expert just relying on these papers. Fortunately, in the case of LDA, Gregor and Yi generously share their notes, although they are not the authors of these techniques.

1 comment:

  1. I totally agree with your comment. thanks for your generosity also.

    ReplyDelete