Teaching Humans Subtle Differences with DIFFusion
Mia Chiquier, Orr Avrech, Yossi Gandelsman, Berthy Feng, Katherine Bouman, Carl Vondrick
TL;DR
This work tackles the problem of teaching humans to distinguish extremely subtle visual differences in scientific domains where discriminative features are often unknown. It introduces DIFFusion, a diffusion-based counterfactual framework that inverts real images, performs conditioning-space arithmetic, and samples along with optional domain tuning to produce minimal, identity-preserving edits that flip class labels. Across six diverse domains, DIFFusion achieves higher discriminator flip rates and better perceptual similarity than baselines, and user studies show it substantially improves teaching effectiveness for fine-grained distinctions. The approach demonstrates the potential of combining diffusion inversion, image prompts, and retrieval-based embeddings to reveal latent discriminative cues, while also highlighting dataset biases and pointing to future directions in disentangled, controllable visual teaching tools.
Abstract
Scientific expertise often requires recognizing subtle visual differences that remain challenging to articulate even for domain experts. We present a system that leverages generative models to automatically discover and visualize minimal discriminative features between categories while preserving instance identity. Our method generates counterfactual visualizations with subtle, targeted transformations between classes, performing well even in domains where data is sparse, examples are unpaired, and category boundaries resist verbal description. Experiments across six domains, including black hole simulations, butterfly taxonomy, and medical imaging, demonstrate accurate transitions with limited training data, highlighting both established discriminative features and novel subtle distinctions that measurably improved category differentiation. User studies confirm our generated counterfactuals significantly outperform traditional approaches in teaching humans to correctly differentiate between fine-grained classes, showing the potential of generative models to advance visual learning and scientific research.
