Google Tech Talks August 13, 2007 ABSTRACT Any task which requires automatic reasoning about the content of a photograph is inherently ambiguous and ill-posed. This is because a single image does not carry enough information in itself to disambiguate the world that it's depicting. Of course, humans have no problems understanding photographs because of all the prior visual experience they can bring to bare on the task. How can we help computers do the same? Our solution is to "brute force" the problem by using massive amounts of visual data, akin to how a search engine or automatic language translator uses textual data. In this talk, I will briefly discuss our progress on a set of challenging problems including: filling holes in images, finding and segmenting objects, recovering 3D scene geometry from an image, and inserting objects into new scenes. In each case, access to a large image database proved crucial to tackling the problem. While some examples require labeled data, others just require a very large set of images, such as our recently collected dataset of 2.3 million Flick photographs. Pretty pictures will be shown.