Xiaodong's tech notes on computer vision and machine learning: Author-topic model and transformed LDA

Sunday, October 4, 2009

Author-topic model and transformed LDA

Latent Dirichlet Allocation (LDA) is essentially a generative model for document analysis rather than classification, and it is an unsupervised rather than supervised learning algorithm. Given a new document, the output of LDA is the topic proportion instead of document category. So LDA can not be directly used for classification.

Author-topic model (ATM), on the other hand, can be used in classification, as long as we view the author as the category label.

Comparison between the above two models can be summarized as follows, where the figures are from the UAI paper by M. Rosen-Zvi, T. Griffiths, M. Steyvers, P. Smyth, 2004:

LDA	ATM

generative process: choose for each of the words in document d choose choose	generative process: for each of the words in document d choose an author x from , the author set of document d following a uniform distribution choose choose

Notice the most significant difference in ATM compared to LDA is that the topic mixture weight

is not generated for each document; rather, there are finite number of possible topics mixture weights, which is specified by the author information in each document.

For document classification, if we view the author as the class label, and let

is a scalar, the ATM model can be directly applied.

My interest on the ATM model is due to Sudderth's transformed LDA model, which reduces to an ATM when ignore the spatial transformation (see the part inside the big red square).

Xiaodong's tech notes on computer vision and machine learning

Sunday, October 4, 2009

Author-topic model and transformed LDA

No comments:

Post a Comment

Labels

Blog Archive

About Me

My Blog List