Friday, September 11, 2009

Gibbs sampling for DP Mixtures: technical details

My previous post Gibbs sampling procedure for GMM describes a Gibbs sampling procedure, which is adopted in Rasmussen's paper. In this approach, all the cluster parameters and the indicator variables are to be sampled i.e.,.

In the course "Dirichlet Processes: Tutorial and Practical Course" by Yee Whye Teh, Machine Learning Summer School 2007, Teh mentioned a collapsed Gibbs sampling scheme for the DP mixture model (DPMM). The collapse Gibbs sampling is much simpler in the sense that we only need to sample the indicator variables by integrating out the cluster parameters. It seems that my intuition is consistent to someone else. For example, in Hal Daume III's blog "collapsed Gibbs", he says "if you can collapse out a variable, you should". But he raised some concerns about why we need to sometimes to add auxiliary variables. However, in the case of mixture models, or topic models, the addition of auxiliary variable such as the indicator variable make lots of sense because it turns out it is much easier to sample these indicator variables than the cluster parameter variables. A more important reason to use indicator variable is reveal in the collapsed Gibbs sampling with Chinese restaurant process: we can integrate out the cluster parameters and only sample the indicator variables in each iteration.

Rarely I can found a real example about these algorithms in literature, except that in Ranganathan's thesis appendix, The Dirichlet Process Mixture (DPM) Model , gives an example of a simple case where the observations follows a univariate normal distribution with unknown mean but known, constant variance equal to unity.

Nevertheless, none of the three materials provide an overview of the Gibbs sampling techniques for DPMM. I search the literature and found Neal's review paper is very helpful: Markov Chain Sampling Methods for Dirichlet Process Mixture Models. In Neal's paper, three sampling schemes are described for the case when conjugate priors are used, namely:

  1. Gibbs sampling with Blackwell-MacQueen urn scheme
  2. Gibbs sampling with Chinese restaurant process
  3. Collapsed Gibbs sampling with Chinese restaurant process (also called as Rao-Blackwellized Gibbs sampling)
The method mentioned in Yeh's course is the third type. The method used in Rasmussen's paper is the second one.  Ranganathan adopts the Gibbs sampling with Blackwell-MacQueen urn scheme.

I also found the Sudderth has a good review about the two Gibbs sampling methods with Chinese restaurant process, in page 87-94, Chapter 2 of Sudderth's thesis 2006.

Combining the materials mentioned above , now I can have a good idea about the Gibbs sampling techniques for DPMM, at lease for the case when conjugate priors are used, which is my current interests. Eventually, I wrote up a technical note about the Gibbs sampling techniques when conjugate prior is used. It also includes a brief introduction of Dirichlet process and Dirichlet process mixture model

The report is here


Later on, I found several papers, which are perhaps the original papers about the algorithms described here:

On Xinyi Xu's course webpage, I found the following paper may be the original work about these algorithms:
    MacEarchern, S. (1998). Computational Methods for Mixture of Dirichlet Process Models. In Dey, Dipak D., Muller, Peter, and Sinha, Debajyoti (Eds.) Practical Nonparametric and Semiparametric Bayesian Statistics, 23-44. New York: Springer.
This paper also gives an example of beta-binomial distribution, which is very helpful.

"Bayesian Density Estimation and Inference Using Mixtures" (1995) by Michael D. Escobar and Mike West and "Computational Methods for Mixture of Dirichlet Process Models" by Steven N. MacEarchern are also original papers describing Gibbs sampling methods.

But I have no time to read these papers in details in recent days.

No comments:

Post a Comment