I was confused by the conjugate prior:
- Beta distribution is the conjugate prior of Bernulli distribution; and Beta distribution is also the conjugate prior of binomial distribution;
- Dirichlet distribution is the conjugate prior of multivariate Bernulli distribution; and Dirichlet distribution is also the conjugate prior of multinomial distribution;
Why could a distribution (Beta, Dirichlet) could be conjugate prior for two distributions?
After reading the chapter 2 and Appendix B in Bishop's PRML book again and again, I finally realized one thing today: conjugacy is related to the parameter rather than the distribution and this explains the above puzzles.
See the definition of conjugate prior on wikipedia:
In
Bayesian probability theory, a class of
prior probability distributions p(θ) is said to be
conjugate to a class of
likelihood functions p(
x|θ) if the resulting
posterior distributions p(θ|
x) are in the same family as
p(θ); the prior and posterior are then called
conjugate distributions, and the prior is called a
conjugate prior for the likelihood.
Thus,
- Beta(μ|a,b) is the conjugate prior of Bernulli(x|μ) and binomial(m|N,μ) , because in both distributions, μ has a conjugate prior Beta
- Dirichlet distribution is the conjugate prior of multivariate Bernulli Discrete(x_{1:K}|μ) and multinomial(m_{1:K}|μ), because in both distributions, vector μ has a conjugate prior Dirichlet
It worth to note that the Posterior hyperparameters listed in the conjugate pairs on wikipedia are computed for the case of n observation, while in some reference, they are computed for the case of one observation.
For example, for Bernulli(x|p), its conjugate prior is Beta(p|a,b), then the posterior distribution of the parameter p after giving n observations is
p(p|x_{1:N}, a, b) = Beta(p|a+n_1, b+n_0), where n_1 and n_0 are the counts of x_i = 1 and x_i = 0 respectively.
But for the example in this website, the poster distribution is for the case of one observation.