- To fully describe a LDA, we need to solve the following three problems:
- latent variables for each word , where indexes a word from the whole training set, i.e.,
- parameters , where specifies the topic distribution for document d=m
- parameter and , where specifies the word distribution for topic z = k
- Given the training data set, we only need to know the latent variables , because the two parameters can be considered as statistics of the association between the observed w and the corresponding z.
- Problem I can not be solved deterministically due to the noise in the data. So a practical solution is to estimate .
- Directly estimate is difficult due to the complex form of distribution in LDA. Gibbs sampling solve this problem by approximating with samples from after the burn-in period, i.e., when the Markov chain is stationary.
- To draw samples from , we need not know the exact form of this distribution. All we need is a function . The rest thing we need to do is to derive such a function.
- The conditional distribution be derived from the joint distribution . The second equality comes from the fact that only depends on .
- The denominator has the similar form of the numerator. So we need to derive the form of
- can derived by using the conditional independent property of LDA:
- The two distributions in step 8 are both multinomial distributions with Dirichlet conjugate prior. So their derivations are also similar. I summarize the key steps in their derivations and compare them side by side to emphasize these similarities.
Dirichlet prior | Dirichlet prior |
- Substitute the results in step 9 to step 8 and then to step 6, we can express the conditional distribution as functions of the co-occurrence of word and topic , and the co-occurrence of topic and document, , and the hyperparameter, and thus we draw samples from therefrom.
- The procedure of the Gibbs sampling for LDA learning can be then summarized in a figure from Wang Yi's tech report:
This sampling scheme integrates out the model parameters , and this strategy is called "collapsed" Gibbs sampling.
No comments:
Post a Comment