Now after implementing the Gibbs sampling for the LDA learning, I can understand the usefulness of such generative process. The generative process in the LDA model actually defines a joint distribution for all the observed words and all the unobserved topic indicator ,
Generally, for a model with observed variable x's and hidden variable z's, the idea of Gibbs sampling is sampling one of the hidden variables conditioning on the other hidden and observed variables, i.e., draw samples from . This sampling procedure should be done for all i's in a fixed order or in a random order. To derive this conditional distribution, we usually need to start from the full joint distribution:
The generative process of a topic model then justifies its usefulness in the Gibbs sampling prodecure.