Showing posts with label Gibbs sampling. Show all posts

Saturday, November 28, 2009

fast and parallel Gibbs sampling for LDA

Gibbs sampling for LDA is very simple to understand and implement, especially the collapsed Gibbs sampling. But one drawbacks of GS is its complexity is linear to the number of word tokens. This problem is even more serious when we apply LDA-based approaches to computer vision problems where we use visual words in images to replace words in documents. To maximize our chance to detect the object in an image, we need large number of visual word tokens. It is more and more popular to extract features at dense regular grids over images, and to one extreme, someone extract features at every pixel with several scales. Also we often need to extract several types of features and hope them to be complementary to each other since we usually do not which type of feature is more useful for a particular object category. Combine these factors together, there are often more than 10k ~ 50k word tokens per image extracted. For Gibbs sampling, this is a nightmare!

So a fast Gibbs sampling or parallel Gibbs sampling are absolutely rescues. There are two such papers recently, with published codes (that is great!):

PLDA: Parallel Latent Dirichlet Allocation for Large-scale Applications by Wang Yi et al at Google, code

Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation at UCI, code

here is a comment from LingPipe's blog:

Porteous et al. (2008) Fast Collapsed Gibbs Sampling for Latent Dirichlet Allocation

Another paper related to topic model inference in large scale corpus is

Efficient Methods for Topic Model Inference on Streaming Document Collections

Saturday, November 7, 2009

A clever way to derive the collapsed Gibbs sampling for LDA

Inspired by Tom Griffiths technical report, Gibbs Sampling in the Generative Model of Latent Dirichlet Allocation, I derive the collapsed Gibbs sampling for LDA by myself using the tricks in Tom's report. These tricks are universally applicable to other topic models:

simplify the conditional property by employing Bayes theorem and
d-separation property
derive the results of conditional probability directly from the result of
the predictive likelihood of Dirichlet/multinomial distribution

There is no lengthy and complex computation like those in Wang Yi's note or Gregor Heinrich's note. It is easy to understand and has intuitive explanation for the formulas involved. I wrote up a report for my derivation, as a complementary to Tom's note:
Derivation of Collapsed Gibbs Sampling for LDA

Friday, September 11, 2009

Chinese restaurant process and Chinese restaurant franchise

An illustration of Chinese restaurant process (CRP) and Chinese restaurant franchise (CRF). Materials are from Yee Whye Teh's 2004 technical report on "Hierarchical Dirichlet Process"

Chinese restaurant process (CRP):

For a set of random variables

distributed according to

, we have the following conditional distribution:

This can be described in a Chinese restaurant process metaphor:

Consider a Chinese restaurant with an unbounded number of tables,
The first customer sits at table 1
Suppose there are K tables occupied before the i-th customer comes, he can sit at

The relationship between

and

can be best illustrated as in the following figure.

random variables	meaning	metophor
	random variables	customer i
	distinct values within all	table k
	the number of associated to	the number of customers sitting around table k

Chinese restaurant franchise (CRF):

An essentially two-level Chinese restaurant process:

Within a restaurant, customers choose tables
Within all restaurants, tables choose dishes

In both levels, the choosing follows the Chinese restaurant process as illustrated above.

At the restaurant level, customers choose tables according to the following distribution:

At this level, the metaphor of CRF is the same as the one of CRP as described above, except some changes on symbols

Consider restaurant j with an unbounded number of tables,
The first customer sits at table 1
Suppose there are tables occupied before the i-th customer comes, he can sit at

The relationship between

and

can be best illustrated as in the following figure:

At the franchise level, table choose dishes according to the following distribution:

This can be described in a Chinese restaurant franchise metaphor:

Consider a Chinese restaurant franchise, whose J restaurants share a menu with unbounded number of dishes,
At each table of each restaurant, one dish is ordered from the public menu by the first customer who sits there, and it is shared among all customers who sit at that table. Multiple tables at multiple restaurants can serve the same dish
Suppose there are tables occupied before the i-th customer comes restaurant j and there are total K dishes has been ordered among all restaurants in the franchise. He can sit at an occupied table or a new table with certain probability, as described above. If he sits at an occupied table, he shares the dish that has been ordered at that table. If he sits at a new table, he order a dish for that table according to its popularity among the whole franchise, while a new dish can also be tried.

The relationship between

and

can be best illustrated as in the following figure.

random variables	meaning	metophor
	random variables	customer i in restaurant j
	distinct values of in group j	table t in restaurant j
	index of associated to ,	the table taken by customer i in restaurantj,.i.e., Table()=, Customer() =
	the number of associated to in group j	the number of customers sitting around table t in restaurantj
	distinct values in within all groups,	dish k, which is shared within all restaurants
	index of associated to ,	the dish ordered by table t in restaurantj, i.e., Dish() = , Table() =
	the number of associated to in group j	the number of tables ordered dish k in restaurantj
	, i.e., the number of associated to over all j	the total number of tables ordered dish k within all restaurants