Xiaodong's tech notes on computer vision and machine learning: 2009-10

Sunday, October 25, 2009

paper: DeltaLDA

DeltaLDA is a modification of the Latent Dirichlet Allocation (LDA) model which uses two different topic mixing weight priors to jointly model two corpora with a shared set of topics, where one topic mixing weight prior to model the normal pattern and the other for the abnormal pattern.
The graphical model:

An illustration of topic mixture weights in two scenarios:

This looks like quite similar to the Adapted Vocabularies for Generic Visual Categorization, ECCV 2006 in the way they split the topic/vocabulary into two sets, though there are fundamental difference in their underneath mechanism.

Friday, October 23, 2009

paper: Sketch2Photo: Internet Image Montage

Amazing realistic montage!

A few students from Tsinghua Univ. present this montage using images downloaded from the internet.

The links are as follows:
project web page
The demo on youtube
The paper on ACM SIGGRAPH ASIA 2009, ACM Transactions on Graphics

papers: visual attribute and object class recognition

Several recent papers discussed the methods to extract visual attributes from images and/or use these attributes for object class recognition. We can view visual attributes as another type of annotation. While image annotation is applied to individual image, visual attributes are specified to an object class; image annotations are usually words, i.e., discrete values, visual attributes can be either discrete (e.g. color={red, blue, ...}) or continuous values (e.g., average size). Here are a few papers I am reading:

Describing Objects by their Attributes, CVPR 2009
Joint learning of visual attributes, object classes and visual saliency, ICCV 2009
Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, CVPR 2009
Learning Visual Attributes, NIPS 2007

Here is a blog about visual attribute from Tombone.

Sunday, October 18, 2009

segmentation vs. recognition

In What is segmentation-driven object recognition?,
Tomasz finally remarked that the learning-driven segmentation may be a hot topic in the next few years. I totally agree with him. The problem for us is how to design such algorithms to be robust to intra-class variations, scales and pose. This remains a challenging problem in recognition community.

In the comments of this blog, someone suggest to check the most up-to-date segmentation results in PASCAL 2009.

basic level classes and subordinate class

Comments: the following paper provides a good insight into the role of generative and discriminative models in learning a large number of object categories, i.e., we can use the generative models to distinguish categories at basic level, and discriminative models to differentiate lower-level and similar categories.

In Subordinate class recognition using relational object models
Aharon Bar-Hillel, Daphna Weinshall, NIPS, 2006, the authors illustrate some interesting points:

"Human categorization is fundamentally hierarchical, where categories are organized in tree-like hierarchies.

higher nodes close to the root describe inclusive classes (like vehicles),
intermediate nodes describe more specific categories (like motorcycles),
lower nodes close to the leaves capture fine distinctions between objects (e.g., cross vs. sport motorcycles).

Intuitively one could expect such hierarchy to be learnt either bottom-up or top-down (or both), but surprisingly, this is not the case. In fact, there is a well defined intermediate level in the hierarchy, called basic level, which is learnt first [11]...."

"The primary role of basic level categories seems related to the structure of objects in the world. In [13], Tversky & Hemenway promote the hypothesis that the explanation lies in the notion of parts.Their experiments show that

basic level categories (like cars and flowers) are often described as a combination of distinctive parts (e.g., stem and petals), which are mostly unique.
higher levels (superordinate and more inclusive) are more often described by their function (e.g., ’used for transportation’),
lower levels (sub-ordinate and more specific) are often described by part properties (e.g., red petals) and other fine details."

Based on these assumptions, Bar-Hillel and Weinshall proposed a two stage approach for subordinate class recognition:

First we should learn a generative model for the basic category. Using such a model, the object parts should be identified in each image, and their descriptions can be concatenated into an ordered vector. This stage is used to solve the correspondence problem: features in the same entry in two different image vectors correspond since they implement the same part.
In a second stage, the distinction between subordinate classes can be done by applying standard machine learning tools, like SVM, to the resulting ordered vectors, since the correspondence problem has been solved in the first stage.

Another paper reinforce this idea from the psychology study: Comparison Processes in Category learning: From Theory to Behavior, Rubi Hammer, Aharon Bar-Hillel, Tomer Hertz, Daphna Weinshall and Shaul Hochstein, Brain Research, Special issue on 'Brain and Vision', 2008.

Wednesday, October 14, 2009

a good summary on generative vs. discriminative models

The GenDisc2009 NIPS workshop is call for papers. Though I have no time to catch up the deadline, I found the brief discussion on the generative vs. discriminative models are quite useful. In case I lose the link or the link is broken in the future, I copy some contents as follows:

In generative approaches for prediction tasks, one models a joint distribution on inputs and outputs and parameters are typically estimated using a likelihood-based criterion. In discriminative approaches, one directly models the mapping from inputs to outputs (either as a conditional distribution or simply as a prediction function); parameters are estimated by optimizing objectives related to various loss functions. Discriminative approaches have shown better performance given enough data, as they are better tailored to the prediction task and appear more robust to model misspecification. Despite the strong empirical success of discriminative methods in a wide range of applications, when the structures to be learned become more complex than the amount of training data (e.g., in machine translation, scene understanding, biological process discovery), some other source of information must be used to constrain the space of candidate models (e.g., unlabeled examples, related data sources or human prior knowledge). Generative modeling is a principled way of encoding this additional information, e.g., through probabilistic graphical models or stochastic grammar rules. Moreover, they provide a natural way to use unlabeled data and are sometimes more computationally efficient.

Theoretical analysis of generative versus discriminative learning has a long history in statistics, where the focus was on asymptotic analyses (e.g. [Efron 75]). Ng and Jordan provided an initial comparison of generative versus discriminative learning in the non-asymptotic regime in the most cited paper on the topic in machine learning [Ng 02]. For a few years, this paper was one of the only machine learning papers providing a theoretical comparison, and was responsible for the conventional wisdom: "use generative learning for small amount of data and discriminative learning for large amounts". Recently, there has been new advances on our theoretical understanding [Liang 08, Xue 08] and their combination [Bouchard 07, Xue 09].

On the empirical side, combinations of discriminative and generative methodologies have been explored by several authors [Raina 04, Bouchard 04, McCallum 06, Bishop 07, Schmah 09] in many fields such as natural language processing, speech recognition, and computer vision. In particular, the recent "deep learning" revolution of neural networks relies heavily on a hybrid generative-discriminative approach: an unsupervised generative learning phase ("pre-training") is followed by discriminative fine-tuning. Given these recent trends, a workshop on the interplay of generative and discriminative learning seem especially relevant.

Hybrid generative-discriminative techniques face computational challenges. For some models, training these hybrids is akin to the discriminative training of generative models, which is a notoriously hard problem ([Bottou 91] for discriminatively trained HMM, [Jebara 04, Salojarvi 05] for EM-like algorithms), though for other models, learning can be in fact simple [Raina 04, Wettig 03]. Alternatively, the use of generative models in predictive settings has been be explored, e.g., through the use of Fisher kernels [Jaakkola 98] or other probabilistic kernels. One of the goal of the workshop will be to highlight the connections between these approaches.

The aim of this workshop is .... (ignored)

References

[Bishop 07] C. M. Bishop and J. Lasserre, Generative or Discriminative? getting the best of both worlds. In Bayesian Statistics 8, Bernardo, J. M. et al. (Eds), Oxford University Press. 3–23, 2007.

[Bottou 91] L. Bottou, Une approche théorique de l'apprentissage connexionniste: Applications à la reconnaissance de la parole. Doctoral dissertation, Université de Paris XI, 1991.

[Bouchard 04] G. Bouchard and B. Triggs, The tradeoff between generative and discriminative classifiers. In J. Antoch, editor, Proc. of COMPSTAT'04, 16th Symposium of IASC, volume 16. Physica-Verlag, 2004.

[Bouchard 07] G. Bouchard, Bias-variance tradeoff in hybrid generative-discriminative models. In proc. of the Sixth International conference on Machine Learning and Applications (ICMLA 07), Cincinnati, Ohio, USA, 13-15 December 2007.

[Efron 75] B. Efron, The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis. Journal of the American Statistical Association, 70(352), 892—898, 1975.

[Greiner 02] R. Greiner and W. Zhou. Structural extension to logistic regression: Discriminant parameter learning of belief net classifiers. In Proceedings of the Eighteenth Annual National Conference on Artificial Intelligence (AAAI-02), 167–173, 2002.

[Jaakkola 98] T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems 11, 1998.

[Jaakkola 99] T. Jaakkola, M. Meila, and T. Jebara. Maximum entropy discrimination. In Advances in Neural Information Processing Systems 12. MIT Press, 1999.

[Jebara 04] T. Jebara, Machine Learning - Discriminative and Generative. International Series in Engineering and Computer Science, Springer, Vol. 755, 2004.

[Liang 08] P. Liang and M. I. Jordan, An asymptotic analysis of generative, discriminative, and pseudo-likelihood estimators. In Proceedings of the 25th International Conference on Machine Learning (ICML), 2008.

[McCallum 06] A. McCallum, C. Pal, G. Druck and X. Wang, Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification. AAAI, 2006.

[Ng 02] A. Y. Ng and M. I. Jordan, On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes. In Advances in Neural Information Processing Systems 14, 2002.

[Salojarvi 05] J. Salojärvi, K. Puolamäki and S. Kaski, Expectation maximization algorithms for conditional likelihoods. In Proceedings of the 22nd International Conference on Machine Learning (ICML), 2005.

[Schmah 09] T. Schmah, G. E Hinton, R. Zemel, S. L. Small and S. Strother, Generative versus discriminative training of RBMs for classification of fMRI images. In Advances in Neural Information Processing Systems 21, 2009.

[Wettig 03] H. Wettig, P. Grünwald, T. Roos, P. Myllymäki and H.Tirri, When discriminative learning of Bayesian network parameters is easy. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI 2003), 491-496, August 2003

[Xue 08] J.-H Xue and D.M. Titterington, Comment on "discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes". Neural Processing Letters, 28(3), 169-187, 2008.

[Xue 09] J.-H Xue and D.M. Titterington, Interpretation of hybrid generative/discriminative algorithms. Neurocomputing, 72(7-9), 1648-1655, 2009.

Thursday, October 8, 2009

test

I just found the editor in Google Docs is a pretty handy tool to write blog with pictures, tables and Latex equations. The following is an example. It seems the only feature lacking in Google is the capability to download the file as Latex source. But this is not so crucial. When I need to write a big a document in Latex, I will write using WinEdt or other Latex editor and upload the PDF file to the blog, just as I have done in the previous few technical notes.

$\phi_{ji} | \phi_{j1}, .., \phi_{j, i-1}, \alpha_0, G_0 \sim \sum_{t=1}^{T_j} \frac{n_{jt}}{\alpha_0 + i -1} \delta_{\psi_{jt}} + \frac{\alpha_0}{\alpha_0 + i-1} G_0$

Wednesday, October 7, 2009

an easy way to write Latex codes in Blogger

I just found an easy way to write Latex codes directly in Blogger.
See How To Install Latex On Blogger/Blogspot
Following the steps described there, I can write the Latex codes between double dollar signs as I do in TexCenter or WinEdt, and get the desired equation. It is really cool!

A test:
The Latex codes are:
${\phi_{ji} | \phi_{j1}, .., \phi_{j, i-1}, \alpha_0, G_0 \sim \sum_{t=1}^{T_j} \frac{n_{jt}}{\alpha_0 + i -1} \delta_{\psi_{jt}} + \frac{\alpha_0}{\alpha_0 + i-1} G_0}$
and the result is:
$\phi_{ji} | \phi_{j1}, .., \phi_{j, i-1}, \alpha_0, G_0 \sim \sum_{t=1}^{T_j} \frac{n_{jt}}{\alpha_0 + i -1} \delta_{\psi_{jt}} + \frac{\alpha_0}{\alpha_0 + i-1} G_0$

It works very well.

I have tried to immigrate from Blogger to WordPress for the Latex capability in WordPress. But I found there are often errors in render Latex in WordPress, which makes this feature less attractive. Also I did not figure out how to change font and font size in WordPress. So I keep staying in Blogger. I hope Google can soon release a much power editor, with buttons on top the editor to enable insert Latex and symbols, and as well as tables. These features are the most frequently used features in my experience. Before Google doing so, I think zoho will be my favorite.

Sunday, October 4, 2009

Author-topic model and transformed LDA

Latent Dirichlet Allocation (LDA) is essentially a generative model for document analysis rather than classification, and it is an unsupervised rather than supervised learning algorithm. Given a new document, the output of LDA is the topic proportion instead of document category. So LDA can not be directly used for classification.

Author-topic model (ATM), on the other hand, can be used in classification, as long as we view the author as the category label.

Comparison between the above two models can be summarized as follows, where the figures are from the UAI paper by M. Rosen-Zvi, T. Griffiths, M. Steyvers, P. Smyth, 2004:

LDA	ATM

generative process: choose for each of the words in document d choose choose	generative process: for each of the words in document d choose an author x from , the author set of document d following a uniform distribution choose choose

Notice the most significant difference in ATM compared to LDA is that the topic mixture weight

is not generated for each document; rather, there are finite number of possible topics mixture weights, which is specified by the author information in each document.

For document classification, if we view the author as the class label, and let

is a scalar, the ATM model can be directly applied.

My interest on the ATM model is due to Sudderth's transformed LDA model, which reduces to an ATM when ignore the spatial transformation (see the part inside the big red square).

a blog about faculty job hunting

I found a blog talking about faculty job hunting, which is very useful:

http://nlpers.blogspot.com/2009/09/some-notes-on-job-search.html

Xiaodong's tech notes on computer vision and machine learning