Collections of photos, images, and videos are quickly coming to dominate the content available on the Web. Currently internet search engines rely on the text with which the images are labeled to return matches. But why is only text being used to search visual mediums? These labels can be unreliable, unhelpful and sometimes not available at all.
To solve this problem, scientists at Stanford and Princeton have been working to “create a new generation of visual search technologies.” Dr. Fei-Fei Li, a computer scientist at Stanford, has built the world’s largest visual database, containing more than 14 million labeled objects.
A system called ImageNet, applies the data gathered from the database to recognize similar, unlabeled objects with much greater accuracy than past algorithms. According to a New York Times article:
Computer vision is one of the thorniest problems facing designers of artificial intelligence and robots. A huge portion of the human brain is devoted to vision, and scientists are still struggling to unlock the biological mechanisms by which humans learn to recognize objects. “My dream has long been to build a vision system that recognizes the world the way that humans do,” said Dr. Li, whose Princeton colleague is the computer scientist Kai Li (they are not related).
He added that ImageNet was not perfect. To organize the vast collection of images, Dr. Li uses WordNet, a database of English words designed by the Princeton psychologist George A. Miller, who died in July at 92. For Dr. Bengio, its categories are a little too elevated.
“I would have preferred if the categories chosen in ImageNet were more reflecting the distribution of interests of the population,” he said. “Most people are more interested in Lady Gaga or the iPod Mini than in this rare kind of diplodocus.”
Still, the project goes on. Jia Deng, one of Dr. Li’s graduate students, has developed an image classifier he jokingly calls infallible. Because WordNet is organized as a hierarchy of categories, the software can simply choose a level of abstraction where it has a very high probability of being correct: if it is not sure a given picture shows a rabbit, for instance, it goes to the next level (mammals) or the one above that (animals).
At one of those levels, it will almost certainly not be wrong. And Dr. Li says she expects further advances that allow ever more accuracy.
The line to Dr. Fei-Fei Li is broken. Should be: http://vision.stanford.edu/~feifeili/
Thank you Patrick, we have fixed the link.