The easiest way to install the package is to add my archive to your system. To do this, store the following in /etc/apt/sources.list.d/mene.list: (this file does not exist before, you need to create it)
deb http://archive.mene.za.net/raspbian wheezy contrib
I just bought a Raspberry PI (short as "PI" in the future) and plan to play with it for various projects.
The hardware I bought from Amazon includes the following items: 1) a PI Model B, a transparent plastic case and a wireless adapter; 2) a HDMI cable to connect the video output to my TV; 3) a micro USB charger of 5V/2A.
The input devices are a wireless keyboard/mouse combo. The visual display is my TV. My first project is XBMC, which turns the PI into home multimedia center.
The following is the log of what I have done to my PI until it works as Lubuntu machine.
Install Raspberry PI NOOBS (New Out Of Box Software)
More than one week passed after coming back from ICCV, Barcelona, I finally finished my summary of ICCV papers that I am interested. There are totally 15 papers in 5 topics. Certainly there are many more ICCV paper worth to read. I will update my summary later.
In ICCV this year, attributes continue to attract interests from researchers in the community. In particular, the paper “Relative Attributes” wins the Marr prize.
Idea: relative attribute can provide more informative and intuitive description for images, which overcome many restrictions of binary attributes. For example, it is more useful to say “Bill Clinton is younger than George H.W.Bush” than to say “Bill Clinton is young”. The latter is a binary attribute which is often difficult to judge true or false since it is a subjective judgment in many cases; while the former is a relative attribute which is more objective and easier to judge. This paper describe an approach to model the relative attributes as a ranking function and then apply it to zero-shot learning and textual description of images, and shows clear advantage over traditional binary attributes.
Idea: This paper applies poselet representation to recognize attributes of human, such as gender, hair style and types of cloths. It is generally a difficult task to recognize these human attributes due to the large variations of pose, viewpoint, etc. The poselet representation implicitly decomposes the aspect, i.e., the pose and viewpoint, and thus facilitates the detection of human attributes.
Idea: This paper proposes to jointly learn attribute classifiers and the attribute labels. This method eliminates the requirement of labeling the attributes in images. Giving a list of attribute names, some positive and negative training examples for each of the binary attribute classifiers and some training images of various objects with known class labels but not their attribute labels, the proposed method can automatically learn an attribute vector for each object class. An interesting finding of this research is that it detects lots of erroneous attribute labels in the existing dataset and find the performance of classification can be boosted after CORRECTing these erroneous attributes.
Depth Image and Kinect:
There are a number of ICCV papers this year study various problems related to depth image, possibly due to the widely available Kinect sensor.
Summary: The theme of this paper is to estimate scene flow, the 3-D motion field of an observed scene, as opposed to optical flow in 2-D field. The point motion in 3D is modeled as a collection of particle filters that support multiple hypotheses and does not oversmooth the motion field.
Summary: This paper employs random forest regression to directly estimate the human pose without segmentation of body parts; several techniques are proposed to speed up the regression process which enable super-realtime test performance.
Summary: similar to the above paper, this paper also use a pose database to facilitate the problem of pose estimation from depth images.
This are two interesting demos using Kinect sensor:
KinectFusion: Real-time 3D tracking, reconstruction and Interaction with a depth camera S.Izadi, R.Newcombe et al.
Seeing Your Weight – An application in targeted advertisement T.Van Nguyen, S.Yan
Random Forest is an ensemble of decision trees, where each decision tree is slightly different from one another. The randomness among different decision trees are achieved by being train with either different subset of training data or different subset of parameter space. Random Forest classifier can achieve similar max-margin like behavior like SVM with less computational cost. Random Forest has been also widely used in many other problems, such as regression, density estimation, manifold, semi-supervised learning, etc. There is one-day tutorial on Random Forest in ICCV this year. The PPT and technical report can be downloaded online at http://research.microsoft.com/en-us/groups/vision/decisionforests.aspx
The following ICCV papers are related to Random Forest:
Task: image labeling, ie., classify the pixels in an image to an object class label
Idea: Using Random Forest classifier, but each internal node is a classifier for an image patch rather than a classifier for an image pixel. The advantage of this approach is that the spatial context is represented in the classifier by the image patch and thus in testing the classifier can produce much smooth and coherent image labels
Decision Tree Fields (Oral) Sebastian Nowozin, Carsten Rother, Shai Bagon, Bangpeng Yao, Toby Sharp, Pushmeet Kohli
Task: image labeling
Idea: A CRF using the Decision Tree to model the mapping between the image and the parameters of a unary or pairwise interaction in the graphical model. The advantage of using Decision Tree to model this mapping is (1) it is non-parametric so can represent richer relationships; (2) more scalable to large training data
Summary: the idea is similar to the paper of scene attribute published last NIPS and ECCV workshop by Jia-Li Li of their group. This sparse coding method is used to learn sparse bases of action attributes in static images.
Summary: a volumetric representation of human activities is presented. A video is decomposed into a collection of spatiotemporal tubes at multiple scales. These tubes are connected by a graph model which represents the temporal and spatial constraints among these tubes. This paper address the issues how to extract/learn and match/recognize these 2D+t tubes and their configuration for activity recognition in video.
Summary: STIP features are grouped together to form a feature graph that represent the spatial configuration in a single image; and the feature graphs in consecutive frames are linked to form a “String of Feature Graphs” to represent the temporal dynamics. Activity recognition in videos is then cast as a problem to match these feature graphs and the string of feature graphs.
Summary: this paper discussed an interesting problem: how to recognize the human activity in a video stream without seeing the all-length video. The solution presented in this paper is a dynamic bag-of-words model, which is similar to dynamic time warping.
Task: learning hierarchical classifier for large scale image database with hundreds or thousands of visual classes
Idea: hierarchical classifier is a natural solution for large scale visual recognition. The problem is that it is often not able to divide the visual classes nicely and there are often ambiguous classes in-between. The idea of this paper is very simple and smart: treat the ambiguous classes as if they do not exist (i.e., relaxed hierarchy). This idea can be modeled in a modified SVM where these ambiguous classes are labeled as 0 (while positive classes are labeled as +1 and negative classes are labeled as -1) so they will not take effect in computing the loss but still constraint the parameter space.
My recent work in company involves Hadoop and large scale computing, especially for machine learning and computer vision problems. I thought this direction will make a big impact onto the community. The followings are a few Chinese article abouth Hadoop, which I feel are very good for beginners.
I ordered a Kindle 3 DX from Amazon on Wednesday and it arrived yesterday. Tried it last night and got mixed feelings.
It is a great tool to read books from Amazon. I downloaded two Jules Verne's sci-fi: Twenty leagues under the sea and The mysterious island. They are free since they are published before 1929! They are my favorites since my teenage.
I also transferred a PDF book I read recently, Lucene in Action 2nd Edition. The viewing of PDF is very good. The only inconvenience is that I cannot highlight or annotate on PDF, and the Table of Content and navigation to next Chapter by the 5-way stick do not work. But, still, they are acceptable.
The worst thing I have experienced with Kindle 3DX is reading of 2-column scientific papers. The font size is too small in portrait mode. When turning to landscape mode, the font size is large enough, but I loose the big picture of the whole page and the operation to turn page is painful: it requires to press twice "down" button to finish reading the left column and then press twice "up" button to read the right one. Also the lack of capability to add annotations and highlights becomes a big minus when reading scientific papers. After doing some research online and I found the only solution is to cut the 2-column page into 4 parts and there are some free tools to do so. But the reading experience is still very poor.
My conclusion is: Kindle is designed to read novels and it does the best for this purpose. But for academia, Kindle is not our solution. Though iPad is of the same size, it is much easier to manipulate the PDF files by touch screen and numerous apps. So I think I'd better to return the Kindle DX and buy a Kindle 6" for leisure reading and iPad for academia reading respectively.
CVPR2011正在如火如荼的进行中，在网上能看到的部分文章中，我终于找到一篇让我眼前冒光的文章。虽然，其实，也许，主要的理由是我好不容易 能看懂一篇。不过，这并不妨碍该文章定会成为今年CVPR的一朵奇葩这件事。不过它肯定拿不了best paper啦，因为它只是篇poster，但我猜也许是组委会认为如果让他们当oral，大牛现场卖萌会雷死观众，并可能引起大面积恐慌。
文 章题目是：Unbiased Look at Dataset Bias，无偏见的研究数据集偏见。首先看看作者，一个是MIT的Antonio Torralba，另一个是CMU的Alexei A. Efros。 MIT和CMU在计算机视觉领域是两个绝对的巨人，无人能望其项背。这两个人这几年刚拿到副教授职位，而且又是那种论文又高产又高质的家伙。要注意哦，这 篇文章没有研究僧，也就是说，这是一篇大牛和大牛的强强联合之作。也正是如此，该文章发挥出了大牛卖萌的极致。满篇充斥着“矮油”(alas)，“让我们 玩儿个游戏/玩具实验(toy experiment)”等等，各种你很难在一般论文里看到的搞笑语言。特别是文章的致谢和声明：
The authors would like to thank the Eyjafjallajokull volcano as well as the wonderful kirs at the Buvette in Jardin du Luxembourg for the motivation (former) and the inspiration (latter) to write this paper. “作者要感谢埃Eyjafjallajokull的火山（就是2010年冰岛那个倒霉的让欧洲航线大面积瘫痪的火山）以及Buvette in Jardin du Luxembourg（法国一个酒吧）的美味基尔酒，前者给了我们写作的动机，而后者给了我们灵感。”我猜测是那个时候他们在法国开会，结果被火山给关在 那儿了，几个无聊的人只好跑去酒吧喝酒，于是在酒精的刺激下想出了这个题目。唉，大牛就是大牛，这种时候都能搞出这样一篇文章，让我等草民怎么活啊，以后 天天去喝酒好了。
Disclaimer: No graduate students were harmed in the production of this paper. Authors are listed in order of increasing procrastination ability.
计 算机视觉里面很大一块是在做物体检测、识别、分类(object detection, recognition, classification)。为了公平的比较不同的算法，大家就设计出了很多数据集(dataset)作为基准(benchmark)来比较算法的性 能。所以，当你阅读了灰常多的论文后，你会发现，大家的实验部分都会说：在某某个数据集上我的算法是最牛X的；在某某数据集上我的算法和当前最牛X的其他 地方差不多，但速度快/准确率高/误报率低；虽然我的算法在某数据集一般，但几个数据集一平均，我的最牛X…… 但是，这些数据集真的可以评价算法的优劣么？两位大牛向这些数据集提出了挑战。
在 他们的实验室（大家都是搞这些的），所有人配对的正确率超过75%。其实随便看看也能看出门道，有的数据集就是车，有的就是自然景色，有的就是局部场景， 有的就是实物，有的就是干净背景，有的就是专业摄影师拍摄的，有的就是随便找的等等。所以，尽管大多数数据集都号称“我们尽可能的多样化，非刻意的人工的 (in the wild)寻找样本”，但事实上还是带了偏见了。为了验证这个假设，他们用最简单的一些特征训练了一个12路分类器，结果，分类效果还不错，每类分正确的 比例最小也有20%，有6个超过30%，1个超过99%，要知道随机猜对的概率只有1/12 =8%。而且当训练数据慢慢增多到每类1000个时，正确率还没有收敛的迹象。一般来说，分类效果越好，证明两个类别的区分度越高，也可以说不同的数据集 来自“不同的世界”，而随着样本量增加，还很有可能区分度越来越高。我勒个去，你们都说自己“多样化”的描述了这个世界，而实验却证明你们表现了不同世 界，你们太坑爹了！特别是Caltech101，人家分类器正确率都99%了，你到底描述了一个多么简单的世界啊！桑不起啊桑不起！大牛指出，和机器学习 领域不同，它们的数据集就是它们自己的世界。而视觉领域不同，于是大牛把这些“伪世界”称为“Corel世界”“Caltech101世 界”“PASCAL VOL世界”等等，还是非常形象的。
De-mystifying Good Research and Good Papers
By Fei-Fei Li, 2009.03.01
Please remember this:
1000+ computer vision papers get published every year!
Only 5-10 are worth reading and remembering!
Since many of you are writing your papers now, I thought that I'd share these thoughts with you. I probably have said all these at various points during our group and individual meetings. But as I continue my AC reviews these days (that's 70 papers and 200+ reviews -- between me and my AC partner), these following points just keep coming up. Not enough people conduct first class research. And not enough people write good papers.
- Every research project and every paper should be conducted and written with one singular purpose: *to genuinely advance the field of computer vision*. So when you conceptualize and carry out your work, you need to be constantly asking yourself this question in the most critical way you could – “Would my work define or reshape xxx (problem, field, technique) in the future?” This means publishing papers is NOT about "this has not been published or written before, let me do it", nor is it about “let me find an arcane little problem that can get me an easy poster”. It's about "if I do this, I could offer a better solution to this important problem," or “if I do this, I could add a genuinely new and important piece of knowledge to the field.” You should always conduct research with the goal that it could be directly used by many people (or industry). In other words, your research topic should have many ‘customers’, and your solution would be the one they want to use.
- A good research project is not about the past (i.e. obtaining a higher performance than the previous N papers). It's about the future (i.e. inspiring N future papers to follow and cite you, N->\inf).
- A CVPR'09 submission with a Caltech101 performance of 95% received 444 (3 weakly rejects) this year, and will be rejected. This is by far the highest performance I've seen for Caltech101. So why is this paper rejected? Because it doesn't teach us anything, and no one will likely be using it for anything. It uses a known technique (at least for many people already) with super tweaked parameters custom-made for the dataset that is no longer a good reflection of real-world image data. It uses a BoW representation without object level understanding. All reviewers (from very different angles) asked the same question "what do we learn from your method?" And the only sensible answer I could come up with is that Caltech101 is no longer a good dataset.
- Einstein used to say: everything should be made as simple as possible, but not simpler. Your method/algorithm should be the most simple, coherent and principled one you could think of for solving this problem. Computer vision research, like many other areas of engineering and science research, is about problems, not equations. No one appreciates a complicated graphical model with super fancy inference techniques that essentially achieves the same result as a simple SVM -- unless it offers deeper understanding of your data that no other simpler methods could offer. A method in which you have to manually tune many parameters is not considered principled or coherent.
- This might sound corny, but it is true. You're PhD students in one of the best universities in the world. This means you embody the highest level of intellectualism of humanity today. This means you are NOT a technician and you are NOT a coding monkey. When you write your paper, you communicate and . That's what a paper is about. This is how you should approach your writing. You need to feel proud of your paper not just for the day or week it is finished, but many for many years to come.
- Set a high goal for yourself – the truth is, you can achieve it as long as you put your heart in it! When you think of your paper, ask yourself this question: Is this going to be among the 10 papers of 2009 that people will remember in computer vision? If not, why not? The truth is only 10+/-epsilon gets remembered every year. Most of the papers are just meaningless publication games. A long string of mediocre papers on your resume can at best get you a Google software engineer job (if at all – 2009.03 update: no, Google doesn’t hire PhD for this anymore). A couple of seminal papers can get you a faculty job in a top university. This is the truth that most graduate students don't know, or don't have a chance to know.
- Review process is highly random. But there is one golden rule that withstands the test of time and randomness -- badly written papers get bad reviews. Period. It doesn't matter if the idea is good, result is good, citations are good. Not at all. Writing is critical -- and this is ironic because engineers are the worst trained writers among all disciplines in a university. You need to discipline yourself: leave time for writing, think deeply about writing, and write it over and over again till it's as polished as you can think of.
- Last but not the least, please remember this rule: important problem (inspiring idea) + solid and novel theory + convincing and analytical experiments + good writing = seminal research + excellent paper. If any of these ingredients is weak, your paper, hence reviewer scores, would suffer.