Teaching computers to make sense of photos and videos has countless applications, from helping doctors diagnose illness to helping robots navigate their surroundings. Most machine vision technologies are trained to recognize specific features using large collections of images or videos that have been labeled, or annotated, by humans. But this annotation process is time-consuming and expensive. Could it be skipped?

Stella Yu, former director of ICSI’s Vision Group, enjoys pushing the limits of what computers can learn from images and videos alone—no labels, just vast collections of visual content to explore—akin to the way babies learn to pick out faces and objects during their first months of life. In one project done in collaboration with Microsoft Research, Yu and colleagues demonstrated how computers can learn to distinguish objects from backgrounds in unlabeled videos and even still images. Her team has also explored how computers (and humans) recognize segments and constituent parts of images and how we group objects in a scene by discerning each object’s relationship to the context in which it appears.

Teaching machines to interpret unlabeled data is not only intellectually exciting—it also brings the power of machine vision to a much broader array of practical applications. For example, doctors and scientists can use computers to analyze medical images without relying on large quantities of images annotated by human specialists. With colleagues from UC Berkeley and UC Davis, Yu demonstrated how this approach can reduce bias in the use of ophthalmologic images to assess macular degeneration. Machine vision can also help scientists find patterns that would be infeasible for a human researcher to notice, uncovering promising research directions for assessing risk or developing new therapies.

The work also has helped solve some tricky problems in 3D spatial analysis. For example, in collaboration with Glidewell Dental, Yu’s team developed a deep learning model to design dental crowns that account for mouth shape and bite dynamics. Their model has been adopted for a rapid, cost-effective 3D design system for dental restoration products that exceeds the standards established for technicians in the field. Their methods have also found applications far beyond medicine; in one study, for example, the team developed a way to extract macro-scale information about buildings across a city to help planners identify areas most vulnerable to hazards such as hurricanes and earthquakes. Throughout all of these efforts, Yu’s work has demonstrated how teaching machines to “see” can revolutionize our own view of the world.

What made ICSI a good place to pursue these projects?

“ICSI, with its close affiliations with UC Berkeley and long-established visiting programs from various countries such as Germany, is a great meeting place of diversity and creativity. With a strong group of graduate students at UC Berkeley and postdocs from several continents, I am able to conduct exciting theoretical and applied research in computer vision and machine learning. These fresh young minds take our work to all over the world, sometimes continuing our collaboration at a whole new level after they leave.”


Stella Yu

Former Director of ICSI’s Vision Group

This story was published in January 2026 as part of a retrospective series highlighting ICSI’s accomplishments and impacts over the years. To learn about our ongoing work, explore our Core Research Themes.