Wednesday, November 30, 2011

RE: DIY low cost 3D laser scanner

自制低成本3D激光扫描测距仪(3D激光雷达)

http://www.csksoft.net/blog/post/lowcost_3d_laser_ranger_1.html

http://www.csksoft.net/blog/post/lowcost_3d_laser_ranger_2.html

Wednesday, November 23, 2011

15 ICCV'11 paper I am interested in

More than one week passed after coming back from ICCV, Barcelona, I finally finished my summary of ICCV papers that I am interested. There are totally 15 papers in 5 topics. Certainly there are many more ICCV paper worth to read. I will update my summary later.  

Attributes:
In ICCV this year, attributes continue to attract interests from researchers in the community. In particular, the paper “Relative Attributes” wins the Marr prize.

Relative Attributes  (Marr prize paper)
Devi Parikh, Kristen Grauman
Idea: relative attribute can provide more informative and intuitive description for images, which overcome many restrictions of binary attributes. For example, it is more useful to say “Bill Clinton is younger than George H.W.Bush” than to say “Bill Clinton is young”. The latter is a binary attribute which is often difficult to judge true or false since it is a subjective judgment in many cases; while the former is a relative attribute which is more objective and easier to judge. This paper describe an approach to model the relative attributes as a ranking function and then apply it to zero-shot learning and textual description of images, and shows clear advantage over traditional binary attributes.

Describing People: A Poselet-Based Approach to Attribute Classification (oral)
Lubomir Bourdev, Subhransu Maji, Jitendra Malik
Idea: This paper applies poselet representation to recognize attributes of human, such as gender, hair style and types of cloths. It is generally a difficult task to recognize these human attributes due to the large variations of pose, viewpoint, etc. The poselet representation implicitly decomposes the aspect, i.e., the pose and viewpoint, and thus facilitates the detection of human attributes.

A Joint Learning Framework for Attribute Models and Object Descriptions (oral)
Dhruv Mahajan, Sundararajan Sellamanickam, Vinod Nair
Idea: This paper proposes to jointly learn attribute classifiers and the attribute labels.  This method eliminates the requirement of labeling the attributes in images. Giving a list of attribute names, some positive and negative training examples for each of the binary attribute classifiers and some training images of various objects with known class labels but not their attribute labels, the proposed method can automatically learn an attribute vector for each object class. An interesting finding of this research is that it detects lots of erroneous attribute labels in the existing dataset and find the performance of classification can be boosted after CORRECTing these erroneous attributes.

Depth Image and Kinect:
There are a number of ICCV papers this year study various problems related to depth image, possibly due to the widely available Kinect sensor.

Simon Hadfield; Richard Bowden
Summary: The theme of this paper is to estimate scene flow, the 3-D motion field of an observed scene, as opposed to optical flow in 2-D field. The point motion in 3D is modeled as a collection of particle filters that support multiple hypotheses and does not oversmooth the motion field.

Efficient Regression of General-Activity Human Poses from Depth Images
Ross Girshick; Jamie Shotton; Pushmeet Kohli; Antonio Criminisi; Andrew Fitzgibbon
Summary: This paper employs random forest regression to directly estimate the human pose without segmentation of body parts; several techniques are proposed to speed up the regression process which enable super-realtime test performance.

Accurate 3D Body Pose Estimation From a Single Depth Image
Mao Ye; Xianwang Wang; Ruigang Yang; Liu Ren; Marc Pollefeys
Summary: use pre-captured motion exemplars to estimate the body pose in an depth image and then refine the results by fitting the body configurations with the input depth image.

A Data-Driven Approach for Real-Time Full Body Pose Reconstruction from a Depth Camera
Andreas Baak; Meinard Mueller; Gaurav Bharaj; Hans-Peter Seidel; Christian Theobalt
Summary: similar to the above paper, this paper also use a pose database to facilitate the problem of pose estimation from depth images.

This are two interesting demos using Kinect sensor:
KinectFusion: Real-time 3D tracking, reconstruction and Interaction with a depth camera S.Izadi, R.Newcombe et al.
Seeing Your Weight – An application in targeted advertisement T.Van Nguyen, S.Yan


Random Forest:
Random Forest is an ensemble of decision trees, where each decision tree is slightly different from one another. The randomness among different decision trees are achieved by being train with either different subset of training data or different subset of parameter space. Random Forest classifier can achieve similar max-margin like behavior like SVM with less computational cost. Random Forest has been also widely used in many other problems, such as regression, density estimation, manifold, semi-supervised learning, etc. There is one-day tutorial on Random Forest in ICCV this year. The PPT and technical report can be downloaded online at http://research.microsoft.com/en-us/groups/vision/decisionforests.aspx

The following ICCV papers are related to Random Forest:
Structured Class-Labels in Random Forests for Semantic Image Labelling (Oral)
Peter Kontschieder, Samuel Rota Bulò, Horst Bischof, Marcello Pelillo
Task: image labeling, ie., classify the pixels in an image to an object class label
Idea: Using Random Forest classifier, but each internal node is a classifier for an image patch rather than a classifier for an image pixel. The advantage of this approach is that the spatial context is represented in the classifier by the image patch and thus in testing the classifier can produce much smooth and coherent image labels

Decision Tree Fields  (Oral)
Sebastian Nowozin, Carsten Rother, Shai Bagon, Bangpeng Yao, Toby Sharp, Pushmeet Kohli
Task: image labeling
Idea: A CRF using the Decision Tree to model the mapping between the image and the parameters of a unary or pairwise interaction in the graphical model. The advantage of using Decision Tree to model this mapping is (1) it is non-parametric so can represent richer relationships; (2) more scalable to large training data

Action and Activity:
Human Action Recognition by Learning Bases of Action Attributes and Parts (oral)
Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas J. Guibas, Li Fei-Fei
Summary: the idea is similar to the paper of scene attribute published last NIPS and ECCV workshop by Jia-Li Li of their group. This sparse coding method is used to learn sparse bases of action attributes in static images.

Learning Spatiotemporal Graphs of Human Activities (oral)
William Brendel, Sinisa Todorovic
Summary:  a volumetric representation of human activities is presented.  A video is decomposed into a collection of spatiotemporal tubes at multiple scales. These tubes are connected by a graph model which represents the temporal and spatial constraints among these tubes. This paper address the issues how to extract/learn and match/recognize these 2D+t tubes and their configuration for activity recognition in video. 

Summary: STIP features are grouped together to form a feature graph that represent the spatial configuration in a single image; and the feature graphs in consecutive frames are linked to form a “String of Feature Graphs” to represent the temporal dynamics. Activity recognition in videos is then cast as a problem to match these feature graphs and the string of feature graphs.

Summary: this paper discussed an interesting problem: how to recognize the human activity in a video stream without seeing the all-length video. The solution presented in this paper is a dynamic bag-of-words model, which is similar to dynamic time warping.

Machine learning:
Task: how to avoid the costly MCMC sampling in energy models (e.g., MRF) while maintain accuracy in parameter estimating
Idea: Do MAP after adding randomness (i.e., noise) in the parameter: MAP is faster while the randomness simulates the power of MCMC

Task: learning hierarchical classifier for large scale image database with hundreds or thousands of visual classes
Idea: hierarchical classifier is a natural solution for large scale visual recognition. The problem is that it is often not able to divide the visual classes nicely and there are often ambiguous classes in-between. The idea of this paper is very simple and smart: treat the ambiguous classes as if they do not exist (i.e., relaxed hierarchy). This idea can be modeled in a modified SVM where these ambiguous classes are labeled as 0 (while positive classes are labeled as +1 and negative classes are labeled as -1) so they will not take effect in computing the loss but still constraint the parameter space.

Saturday, August 27, 2011

several good articles about hadoop and distributed storage and computing in Chinese

My recent work in company involves Hadoop and large scale computing, especially for machine learning and computer vision problems. I thought this direction will make a big impact onto the community. The followings are a few Chinese article abouth Hadoop, which I feel are very good for beginners. 

Hadoop与分布式计算
Hadoop在Linux下面进行分布式计算
Hadoop以及hadoop应用

Monday, August 1, 2011

a startup for wikipedia-on-the-go

In the recent article

The 20 Hot Silicon Valley Startups You Need To Watch,

one company use the state-of-the-arts computer vision technology to get wikipedia for whatever you see through your smart phone:
Read more: http://www.businessinsider.com/20-silicon-valley-startups-to-watch#ixzz1TmwXQxvy

I am very glad the recent advances in computer vision are commercialized so quickly.

Friday, July 29, 2011

new Kindle DX experience

I ordered a Kindle 3 DX from Amazon on Wednesday and it arrived yesterday. Tried it last night and got mixed feelings.
It is a great tool to read books from Amazon. I downloaded two Jules Verne's sci-fi: Twenty leagues under the sea and The mysterious island. They are free since they are published before 1929! They are my favorites since my teenage.
I also transferred a PDF book I read recently, Lucene in Action 2nd Edition. The viewing of PDF is very good. The only inconvenience is that I cannot highlight or annotate on PDF, and the Table of Content and navigation to next Chapter by the 5-way stick do not work. But, still, they are acceptable.
The worst thing I have experienced with Kindle 3DX is reading of 2-column scientific papers. The font size is too small in portrait mode. When turning to landscape mode, the font size is large enough, but I loose the big picture of the whole page and the operation to turn page is painful: it requires to press twice "down" button to finish reading the left column and then press twice "up" button to read the right one. Also the lack of capability to add annotations and highlights becomes a big minus when reading scientific papers. After doing some research online and I found the only solution is to cut the 2-column page into 4 parts and there are some free tools to do so. But the reading experience is still very poor.
My conclusion is: Kindle is designed to read novels and it does the best for this purpose. But for academia, Kindle is not our solution. Though iPad is of the same size, it is much easier to manipulate the PDF files by touch screen and numerous apps. So I think I'd better to return the Kindle DX and buy a Kindle 6" for leisure reading and iPad for academia reading respectively.

Saturday, June 25, 2011

(转贴)记CVPR2011一篇极品文章

卖萌的大牛你桑不起啊 ——记CVPR2011一篇极品文章 来源: 庞宇的日志

CVPR2011正在如火如荼的进行中,在网上能看到的部分文章中,我终于找到一篇让我眼前冒光的文章。虽然,其实,也许,主要的理由是我好不容易 能看懂一篇。不过,这并不妨碍该文章定会成为今年CVPR的一朵奇葩这件事。不过它肯定拿不了best paper啦,因为它只是篇poster,但我猜也许是组委会认为如果让他们当oral,大牛现场卖萌会雷死观众,并可能引起大面积恐慌。
鉴于我没有找到更好的平台共享这一发现,就只好先放到校内了。不过这个主题不光是对计算机视觉的人有意义,只要是搞数据的人都会有些用处吧。
文 章题目是:Unbiased Look at Dataset Bias,无偏见的研究数据集偏见。首先看看作者,一个是MIT的Antonio Torralba,另一个是CMU的Alexei A. Efros。 MIT和CMU在计算机视觉领域是两个绝对的巨人,无人能望其项背。这两个人这几年刚拿到副教授职位,而且又是那种论文又高产又高质的家伙。要注意哦,这 篇文章没有研究僧,也就是说,这是一篇大牛和大牛的强强联合之作。也正是如此,该文章发挥出了大牛卖萌的极致。满篇充斥着“矮油”(alas),“让我们 玩儿个游戏/玩具实验(toy experiment)”等等,各种你很难在一般论文里看到的搞笑语言。特别是文章的致谢和声明:
The authors would like to thank the Eyjafjallajokull volcano as well as the wonderful kirs at the Buvette in Jardin du Luxembourg for the motivation (former) and the inspiration (latter) to write this paper. “作者要感谢埃Eyjafjallajokull的火山(就是2010年冰岛那个倒霉的让欧洲航线大面积瘫痪的火山)以及Buvette in Jardin du Luxembourg(法国一个酒吧)的美味基尔酒,前者给了我们写作的动机,而后者给了我们灵感。”我猜测是那个时候他们在法国开会,结果被火山给关在 那儿了,几个无聊的人只好跑去酒吧喝酒,于是在酒精的刺激下想出了这个题目。唉,大牛就是大牛,这种时候都能搞出这样一篇文章,让我等草民怎么活啊,以后 天天去喝酒好了。
Disclaimer: No graduate students were harmed in the production of this paper. Authors are listed in order of increasing procrastination ability.
声明:没有研究僧在制作论文的过程中受到伤害(喂喂,难道研究僧是保护动物么?那有木有研究僧保护协会啊?啊?有木有?)。作者是按照拖延症的程度顺次递增排列(看来大牛们也有同样问题)。
 如此搞笑的文章,真的让人欲罢不能。关键是两位大牛在疯狂卖萌的同时,适时的提出了一个计算机视觉界一个极富有争议的问题,并做了很好的分析,可能会在未来很多年掀起一个风气。不得不说,两位大牛挖了一个超级NB的大坑。来来来,让我们一起跳下去吧。
 计 算机视觉里面很大一块是在做物体检测、识别、分类(object detection, recognition, classification)。为了公平的比较不同的算法,大家就设计出了很多数据集(dataset)作为基准(benchmark)来比较算法的性 能。所以,当你阅读了灰常多的论文后,你会发现,大家的实验部分都会说:在某某个数据集上我的算法是最牛X的;在某某数据集上我的算法和当前最牛X的其他 地方差不多,但速度快/准确率高/误报率低;虽然我的算法在某数据集一般,但几个数据集一平均,我的最牛X…… 但是,这些数据集真的可以评价算法的优劣么?两位大牛向这些数据集提出了挑战。
 首先,咱玩儿个游戏,叫连线配对,下面的每组图片都是从同一个数据集挑出的,请把他们和正确的数据集名称对应起来:
 
在 他们的实验室(大家都是搞这些的),所有人配对的正确率超过75%。其实随便看看也能看出门道,有的数据集就是车,有的就是自然景色,有的就是局部场景, 有的就是实物,有的就是干净背景,有的就是专业摄影师拍摄的,有的就是随便找的等等。所以,尽管大多数数据集都号称“我们尽可能的多样化,非刻意的人工的 (in the wild)寻找样本”,但事实上还是带了偏见了。为了验证这个假设,他们用最简单的一些特征训练了一个12路分类器,结果,分类效果还不错,每类分正确的 比例最小也有20%,有6个超过30%,1个超过99%,要知道随机猜对的概率只有1/12 =8%。而且当训练数据慢慢增多到每类1000个时,正确率还没有收敛的迹象。一般来说,分类效果越好,证明两个类别的区分度越高,也可以说不同的数据集 来自“不同的世界”,而随着样本量增加,还很有可能区分度越来越高。我勒个去,你们都说自己“多样化”的描述了这个世界,而实验却证明你们表现了不同世 界,你们太坑爹了!特别是Caltech101,人家分类器正确率都99%了,你到底描述了一个多么简单的世界啊!桑不起啊桑不起!大牛指出,和机器学习 领域不同,它们的数据集就是它们自己的世界。而视觉领域不同,于是大牛把这些“伪世界”称为“Corel世界”“Caltech101世 界”“PASCAL VOL世界”等等,还是非常形象的。
 那么换个角度,不同数据集中的同一个物体(例如车、人)会不会相对比较相似呢,如果相似不就说明他们还是展现了同一个世界,只不过角度不同嘛。

 遗憾的是,即使是同一种物体,偏见还是大大的存在,人眼都看的出来!为了谨慎,作者还是对上述5个例子做了相同实验,结果分类器正确率61% 远大于随机猜对的20%。所以,别说你们什么“多样化”了,神马都是浮云,事实证明,没有偏见的数据集到现在还没出生呢。
 文 章的小标题写作“序幕”(prologue)…..“尾声”(epilogue),二位果然是在演一台卖萌大戏啊。文章首先回顾了视觉领域数据集的发展 史,然后声讨了一下数据集带给这个领域的坏处。“这个领域过度的追求数据评估,浪费很多时间在precision-recall曲线上而不是在像素上”, “现在的研究都是在前人基础上,而很少有完全创新工作,因为全新的工作刚开始肯定拼不过那些经过认真细致调教的算法”,“大家越来越多的关注自己的算法在 某一个数据集上胜出,而从不考虑它是不是有统计显著性”。大牛啊,你们道出了我们的心声啊!!大牛还说,从发展史看出,尽管我们不停的说要避免“偏见”, 但每一个新数据集都不可避免的进入了另一种“偏见”,如果不搞明白到底错在哪儿,我们注定会继续错下去的。
 如果要谈“偏见”我们就得有个 标准,即观测者和任务(人看到的世界和鸟看到的肯定不同)。好吧,那我们就定义“给定人类感受到的典型视觉环境世界,任务是去检测其中的常见物体”。那么 “偏见”就是拿一个数据集和真实视觉世界作比较。而真实世界,呃,我们得拿另一个数据集做代表,而另一个数据集,呃,它还是带偏见的。肿么办?肿么办?还 是大牛比较聪明,他们提出了如下方法,即交叉检验。
 用一个数据集上训练的分类器去分类另一个数据集。因为我们前提假定这些数据集都是在描 绘同一个世界,那么就可以分出优劣了。具体的任务分为两个,一个叫做分类(classification),即给定一张图,指它是否包含某个特定物体。一 个叫做检测(detection),即给定一张图,找出所有特定物体并指出它们的位置。 实验结果表明,两个任务基本都是,在任一个数据集上做的分类器,在其数据集上的效果都会变差,平均而言下降了40%左右,是非常显著的。而且要不是因为有 Caltech101和MSRC这两个各种简单的数据集罩着,可能下降更多。
 作者分析偏见的罪魁祸首估计是这么几个:1.“选择偏见”, 大家更偏爱选择某类数据,例如风景、街景、或用关键词搜索的网络图片。2.“拍照偏见”,摄影师更喜欢用相似角度拍同一种物体。3.“标签偏见”,特别是 语义分类的时候,同种东西可能有不同称呼,例如“草地”“草坪”,“绘画”“图片”。4.“负样本偏见”,对于分类器而言,想要分出来的东西是正样本,其 余都是负样本。一般来说,负样本应该是无穷大的,但实际上,我们只能用有限多的负样本。那么这些有限的负样本有代表性么?足够了么?
 我们 再做个实验,对每个数据集用它们自己的正负样本训练分类器,但测试时,从所有的数据中挑选负样本。如果错分率增加,说明其他数据集的负样本被这个分类器当 成了正样本,进一步说明不同数据集的负样本是不同的。结果也证实了我们的猜测,有3个数据集中招了,错分率增加了20%,但 ImageNet,Caltech101, MSRC没有中招。据分析,ImageNet确实负样本丰富多样,这个实验没有给它造成困扰。但是,另外两个嘛,你们太特么简单了,简单到不会分错啊。简 单的数据集尼桑不起啊!
 至于是否足够?这里作者提出了一个有意思的问题。当我们要分类“船”的时候,假如你的船样本都在水里,你怎么知道 “懒惰的”分类器不是提取了“水”或者“岸”的特征呢?因此这确实很重要,但限于要求有大量的人力标定工作,暂时还没有做,不过有了Mechanical Turk我们可以以后慢慢来。
 讨论完数据集的偏见,我们可以讨论一下数据集的价值,特别偏见的价值就少,无偏见的价值就应该高。想象一 下如果我们想提高一个分类器的准确率,当然牛X人选择改进特征、物体表示方法以及学习算法,而简单些的则是扩大样本数量。但讨厌的是,样本数量的增大和准 确率的提高是一个倒霉的对数关系,要想增加一点儿准确率,需要指数级增长的样本量;另一方面,如前所述,如果增加了不是同一种“偏见”关系的样本,可能还 会造成负增长。
 那么问题来了,我们到底能不能用一个数据集的样本去提高另一个数据集的准确率呢?或者说怎么定义它们的关系呢?神奇的大牛给了我们如下方法:
假 设从A数据集用1000个样本训练出的分类器,测A的数据集得到的平均precision-recall值(一种正确率)是30,又假设用B数据集训练出 的分类器测A数据集,要想得到30的值需要5000个样本。 那么这两个数据集的关系就是1000/5000 = 0.2,也就是说一个A中的样本值0.2个B中的样本。
 
这 样一来,数据集市场就呈现出这样一种兑换关系,要想在PASCAL的市场上将用1250个训练出的分类器增加10%的准确率,那么你需要 1/0.26*1250*10=50000个LabelMe样本,好不值钱啊!而且你会发现,每个数据集都是“我的市场我做主”,其他数据集在自己市场永 远是不值钱的。所以如果你要问,如果用当前的数据集训练一个真实世界的分类器,这些数据集值多少钱呢?那我的回答恐怕只能是“聊胜于无”了。
 也 许会有人说,这不是数据集的错,而是你的表示物体的方法、训练算法有问题,最终都会“过学习”使得看上去是数据集的错。你看,我们人类学习时候也只见过一 小部分实例,但我们的视觉系统克服了这个问题学到了正确的东西。好吧,大牛说,咱退一步,至少现在还不要把所有的错都推给算法,毕竟,如果你的数据集只给 出了“从后面观看的赛车”是汽车,你不能期待我的算法能告诉你“侧面看过去的家庭轿车”也是汽车吧。
 而就目前情况来看,哪些数据集不错,哪些比较垃圾呢?大牛说,毋庸置疑,Caltech101和MSRC,你们该下岗下岗,该回家回家,赶紧洗洗睡吧。而PASCAL VOC, ImageNet, SUN09看上去还凑合,也许我们正朝着正确的路在走。
 那我们是不是应该关心我们的数据集的质量呢?大牛说,如果你只关心把你的数据集变成一堆特征向量并拿去机器学习的算法里去学习,那就不用管它了。而如果你想弄一种能理解真实世界的算法,那数据集的质量就至关重要了。
 那 么我们小弱们到底该怎么做呢?大牛解释道,你们应该先试着做做交叉检验,我们很愿意公开代码和数据集(小弱们开始欢呼雀跃)。大牛继续说道,我们再给出几 个建议供你们制作数据集时参考:对于“选择偏见”建议你们从多种源获取,例如不同国家的不同搜索引擎,或者你们找一堆没人标记过的图片,然后人肉做标记; 对于“拍照偏见”,(嘿嘿,大牛莞尔一笑)你们有没有注意过,google image搜索“mug”绝大多数杯子的手柄都在右边。对于这类问题,我们建议你们做做图像变换,翻转啦,扭曲啦神马的。而对于“负样本偏见”,你们可以 加入其他数据的负样本,或者用一些标准算法从没标记过的样本中搜索那些不容易区分的负样本。但这同样是带偏见的,即“增加你算法的难度”偏见。
 最后,大牛略作谦虚的表示,尽管标题是“无偏见的”,但也许我们自己的偏见已然融入了文字中。不过本文的目的是希望大家来一起讨论一下这个重要但又一直被忽略的问题。
 至 此,文章戛然而止,留给了我们无限遐想的空间。此文一出,炸开了好大一个坑,不知道多久会被灌满。冥冥之中也得罪了不少人,“我们还想靠着不同数据集的结 果吃饭呢!”“我们的方法只能在这个数据集上行得通,你还让不让我们发表了!”“我的结果就是比你的好1.7%,怎么样怎么样!”当然,一群小牛也要跟 上,赶紧做交叉检验,赶紧提出各种理论,争取早日灌满此坑。而至于我们这些小弱们,唉,还是不玩儿这么高端的了。老板让干嘛就干嘛,偶尔看看如此卖萌的大 牛们就心满意足了。各种羡慕去参会的人们,你们赶紧发游记发pp发总结!!!!

[Forward] De-mystifying Good Research and Good Papers By Fei-Fei Li, 2009.03.01

zz:Stanford-CV华人教授李飞飞写给她学生的一封信,如何做好研究以及写好PAPER,受益匪浅 来源: 谭丰的日志

李飞飞是斯坦福大学计算机视觉领域的牛人。
 De-mystifying Good Research and Good Papers
By Fei-Fei Li, 2009.03.01

Please remember this: 
1000+ computer vision papers get published every year!
Only 5-10 are worth reading and remembering!

Since many of you are writing your papers now, I thought that I'd share these thoughts with you. I probably have said all these at various points during our group and individual meetings. But as I continue my AC reviews these days (that's 70 papers and 200+ reviews -- between me and my AC partner), these following points just keep coming up. Not enough people conduct first class research. And not enough people write good papers. 
- Every research project and every paper should be conducted and written with one singular purpose: *to genuinely advance the field of computer vision*. So when you conceptualize and carry out your work, you need to be constantly asking yourself this question in the most critical way you could – “Would my work define or reshape xxx (problem, field, technique) in the future?” This means publishing papers is NOT about "this has not been published or written before, let me do it", nor is it about “let me find an arcane little problem that can get me an easy poster”. It's about "if I do this, I could offer a better solution to this important problem," or “if I do this, I could add a genuinely new and important piece of knowledge to the field.” You should always conduct research with the goal that it could be directly used by many people (or industry). In other words, your research topic should have many ‘customers’, and your solution would be the one they want to use.
- A good research project is not about the past (i.e. obtaining a higher performance than the previous N papers). It's about the future (i.e. inspiring N future papers to follow and cite you, N->\inf). 
- A CVPR'09 submission with a Caltech101 performance of 95% received 444 (3 weakly rejects) this year, and will be rejected. This is by far the highest performance I've seen for Caltech101. So why is this paper rejected? Because it doesn't teach us anything, and no one will likely be using it for anything. It uses a known technique (at least for many people already) with super tweaked parameters custom-made for the dataset that is no longer a good reflection of real-world image data. It uses a BoW representation without object level understanding. All reviewers (from very different angles) asked the same question "what do we learn from your method?" And the only sensible answer I could come up with is that Caltech101 is no longer a good dataset. 
- Einstein used to say: everything should be made as simple as possible, but not simpler. Your method/algorithm should be the most simple, coherent and principled one you could think of for solving this problem. Computer vision research, like many other areas of engineering and science research, is about problems, not equations. No one appreciates a complicated graphical model with super fancy inference techniques that essentially achieves the same result as a simple SVM -- unless it offers deeper understanding of your data that no other simpler methods could offer. A method in which you have to manually tune many parameters is not considered principled or coherent. 
 - This might sound corny, but it is true. You're PhD students in one of the best universities in the world. This means you embody the highest level of intellectualism of humanity today. This means you are NOT a technician and you are NOT a coding monkey. When you write your paper, you communicate  and . That's what a paper is about. This is how you should approach your writing. You need to feel proud of your paper not just for the day or week it is finished, but many for many years to come.
 - Set a high goal for yourself – the truth is, you can achieve it as long as you put your heart in it! When you think of your paper, ask yourself this question:  Is this going to be among the 10 papers of 2009 that people will remember in computer vision? If not, why not? The truth is only 10+/-epsilon gets remembered every year. Most of the papers are just meaningless publication games. A long string of mediocre papers on your resume can at best get you a Google software engineer job (if at all – 2009.03 update: no, Google doesn’t hire PhD for this anymore). A couple of seminal papers can get you a faculty job in a top university. This is the truth that most graduate students don't know, or don't have a chance to know. 
- Review process is highly random. But there is one golden rule that withstands the test of time and randomness -- badly written papers get bad reviews. Period. It doesn't matter if the idea is good, result is good, citations are good. Not at all. Writing is critical -- and this is ironic because engineers are the worst trained writers among all disciplines in a university. You need to discipline yourself: leave time for writing, think deeply about writing, and write it over and over again till it's as polished as you can think of. 
 - Last but not the least, please remember this rule: important problem (inspiring idea) + solid and novel theory + convincing and analytical experiments + good writing = seminal research + excellent paper. If any of these ingredients is weak, your paper, hence reviewer scores, would suffer.

Tuesday, March 1, 2011

a course about mobile computer vision

Just come across a course about mobile computer vision, taught by Silvio Savarese in U. Michigan. If you are an active computer vision researcher, you know who he is.

Advanced Topics in Mobile Computer Vision

There are lots of useful resource on their course site, include a brief introduction for Android and development therein. There are also several project reports, which can be used a reference for someone interested in topic.

Wednesday, February 23, 2011

DIY telepresence robot

Since I may need to work far away from home (from 200+miles, e.g., Minneapolis, MN, to 2000 miles, e.g., Los Angeles, CA) in the near future, I am considering to build a telepresence robot to keep in touch with my wife and kids at our Iowa home.

I first got the idea of telepresence robot from an issue of IEEE Spectrum a few months ago. Here is the link, A DIY Telepresence Robot, and a website mentioned in this IEEE article, Sparky Jr., dedicated to DIY, open-source mobile telepresence. Today, I found another post about this idea,titled "Google Engineer Builds an Affordable DIY Telepresence Robot To Keep In Touch With Remote Fiancee", and the Google engineer's Jonny Lee's website that explains his approach is Procrastingeering. The total cost of Lee's telepresence robot is about $500. If you don't make you hand dirty, iRobots is going to release an app platform, AVA, so you can forget the DIY work on the hardware and focus on coding on this new tech gadget now.

I think the cost can be further reduced if I replace the $250 netbook with a Chinese Shan-Zhai netbook or APad (Pad running Android). Nowadays, there are lots of APad under 1000 RMB (~150 US dollar) with touch screen. For example, 7寸Android 安卓系统的MID APAD 平板电脑 1000元以下, 1000以下平板电脑排行. I believe my kids will love the touch screen, and in turn love their daddy! In addition, it is very cool to connect a Kinect on the remote site, so I can control the robot at remote home using my hands and pose.

This idea is great, at least for me. I am planning to do it right after submitting ICCV paper.

The following are a list of useful websites for DIY robot I discovered after I first post this blog:
http://www.robotshop.com/store
http://www.parallax.com/
http://www.servocity.com/
http://www.surveyor.com/SRV_info.html

Friday, February 4, 2011

a new book about computer vision

 Computer Vision: Algorithms and Applications by Richard Szeliski at Microsoft Research. The author provides free PDF version to download. I should read it when possible.

Tuesday, January 25, 2011

(ZT) How technology will change our mind and brain?

An article on my favorite Chinese online community "cchere", discussing how technology changes our minds, our ways to use our brain or even our brain itself. The article is written in Chinese. 


科技改变大脑 (上)
科技改变大脑(下)

My abstract is as follows:
  • Part 1: how our brain/mind can be changed by new technologies
    • It is found Nietzsche changed his writing style after using typewritter
    • Google enable us to access to vast amount information so easily that it is unnecessary for us to memorize many things. As a result, our skills of memorization is impaired.
    • After more and more reading on the Web, we lose our patient to read a long article from top to bottom: we keep jump from one point to another and all we remember is a pile of small pieces of information rather the whole set of them. Consequently, our capability of reading and understanding degenerates. 
    • Our brain can be change due to our style of using it (Neuroplasticity)
  •  Part 2: how control our world by directly using our brain and how we can control our brain
    • we have already technology to issue some simple command using our brainwave, which is mostly used by disabled person
    • DARPA's project "Silent Talk" on using brainwaves to communicate in battlefield
    • DARPA's project on repairing our brain, which can be used by soldiers in battlefield
    • 2008 NSA has a report "Emerging Cognitive Neuroscience and Related Technologies"
    • Will the scenarios in "Matrix" become true? Very likely.

read your mind on iphone?

Recently, PLX devices release a new product called Xwave, which outputs eight EEG signals about your brainwave: Delta, Theta, Low Alpha, High Alpha, Low Beta, High Beta, Low Gamma and Mid Gamma and two easy-to-interpret values: attention and meditation that are derived from the Beta and Alpha wave respectively. The main advantage of the this device is its low price ($99) and easiness to use. In addition, Xwave has a SDK such that iphone developers can implement their ideas quickly. Currently, there has been a few games using this device. One of them is to try to float a ball using your attention strength. More complex and practical games are proposed in its developer guide, including
  • MindWrestle – It’s the same as arm wrestling, however it’s done with the mind.
  • Wormhole Disk - The more you relax or (meditate) the disks float into these wormholes
    which pop up from the bottom of the screen. The more tense you are, the disks just float there and takes longer for the disks to find their way into the wormhole.
  • useTheForce - use your mind's force to throw your enemy or objects
  • Yoga
  • Archery/Shooting
  • MusicMatch - compare your brainwave with your friend's when you are listening to the same song
  • CoupleSync - compare your brainwave with your loved one when you are doing the same activity
  • BrainExercise

By the nature of the EEG signals, we can not use this types of device to do activities that require accurate handling or localization, such as drive a car. However, Xwave does bring us a huge potential on a lots of futuristic games and activities. To some extents, some scenarios in sci-fi will become true. 

Here are a few more interesting applications using Xwave that come to my mind:
  • youLie - use your brainwave to tell whether you are lying
    • your wife or your fiancee will love it!
  • findMyFavorite - find your favorite picture/food/game/product by reading your mind 
    • you can rely on your instinct now
  • Zen - do Zen medication and let Xwave tell you how good you are (similar to Yoga)
    • test your brainwave when you medicate and listen to relaxing music, so that you can find the best music to bring you peace
To me, Xwave is not just an entertaining device. I think it will be also useful in my research on computer vision. Several researchers have started to investigate the mapping between images and brain activity. For example, Fei-Fei Li's group has a project on scene classification using fMRI. Kewei Tu's blog "Mind Reading" also mentioned a few research work on mapping brain activities to words/images. Here are two papers referred in his blog: 
I would like to tap into this research topic when time permits me to do so.