Table of Contents
Fetching ...

Teaching Humans Subtle Differences with DIFFusion

Mia Chiquier, Orr Avrech, Yossi Gandelsman, Berthy Feng, Katherine Bouman, Carl Vondrick

TL;DR

This work tackles the problem of teaching humans to distinguish extremely subtle visual differences in scientific domains where discriminative features are often unknown. It introduces DIFFusion, a diffusion-based counterfactual framework that inverts real images, performs conditioning-space arithmetic, and samples along with optional domain tuning to produce minimal, identity-preserving edits that flip class labels. Across six diverse domains, DIFFusion achieves higher discriminator flip rates and better perceptual similarity than baselines, and user studies show it substantially improves teaching effectiveness for fine-grained distinctions. The approach demonstrates the potential of combining diffusion inversion, image prompts, and retrieval-based embeddings to reveal latent discriminative cues, while also highlighting dataset biases and pointing to future directions in disentangled, controllable visual teaching tools.

Abstract

Scientific expertise often requires recognizing subtle visual differences that remain challenging to articulate even for domain experts. We present a system that leverages generative models to automatically discover and visualize minimal discriminative features between categories while preserving instance identity. Our method generates counterfactual visualizations with subtle, targeted transformations between classes, performing well even in domains where data is sparse, examples are unpaired, and category boundaries resist verbal description. Experiments across six domains, including black hole simulations, butterfly taxonomy, and medical imaging, demonstrate accurate transitions with limited training data, highlighting both established discriminative features and novel subtle distinctions that measurably improved category differentiation. User studies confirm our generated counterfactuals significantly outperform traditional approaches in teaching humans to correctly differentiate between fine-grained classes, showing the potential of generative models to advance visual learning and scientific research.

Teaching Humans Subtle Differences with DIFFusion

TL;DR

This work tackles the problem of teaching humans to distinguish extremely subtle visual differences in scientific domains where discriminative features are often unknown. It introduces DIFFusion, a diffusion-based counterfactual framework that inverts real images, performs conditioning-space arithmetic, and samples along with optional domain tuning to produce minimal, identity-preserving edits that flip class labels. Across six diverse domains, DIFFusion achieves higher discriminator flip rates and better perceptual similarity than baselines, and user studies show it substantially improves teaching effectiveness for fine-grained distinctions. The approach demonstrates the potential of combining diffusion inversion, image prompts, and retrieval-based embeddings to reveal latent discriminative cues, while also highlighting dataset biases and pointing to future directions in disentangled, controllable visual teaching tools.

Abstract

Scientific expertise often requires recognizing subtle visual differences that remain challenging to articulate even for domain experts. We present a system that leverages generative models to automatically discover and visualize minimal discriminative features between categories while preserving instance identity. Our method generates counterfactual visualizations with subtle, targeted transformations between classes, performing well even in domains where data is sparse, examples are unpaired, and category boundaries resist verbal description. Experiments across six domains, including black hole simulations, butterfly taxonomy, and medical imaging, demonstrate accurate transitions with limited training data, highlighting both established discriminative features and novel subtle distinctions that measurably improved category differentiation. User studies confirm our generated counterfactuals significantly outperform traditional approaches in teaching humans to correctly differentiate between fine-grained classes, showing the potential of generative models to advance visual learning and scientific research.

Paper Structure

This paper contains 33 sections, 9 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: DIFFusion Counterfactuals. We illustrate the counterfactual results from our methods on the Butterfly dataset, the Black Hole dataset, and the Retina dataset. In the Butterfly dataset, the Viceroy has a cross-sectional line (yellow), a smaller head with less dots (magenta), and more "scaley" dots (blue), compared to the Monarch. In the Black Hole dataset, SANE has more uniform wisps (yellow) and less of a prominent photon ring (blue) as compared to MAD, with these distinguishing features discovered through our method rather than known a priori. In the Retina dataset, normal retinas lack the horizontal line bumps (yellow) present in retinas with drusen.
  • Figure 2: DIFFusion method. Our method consists of four parts. (i) Inverting the real image with DDPM-EF to obtain noise maps. (ii) Performing conditioning space arithmetic using positive and negative embeddings obtained from the training set. (iii) Generation via diffusion sampling, starting from the inverted noise conditioning on the manipulated conditioning vector $\hat{c}$. (iv) Optional domain tuning, in which we fine-tune the diffusion model for domain adaptation.
  • Figure 3: Qualitative Results. We present our qualitative results, where each row corresponds to one direction of our binary datasets. The first column contains the inputs, and each subsequent column contains the results from each baseline, with the last column containing the result from DIFFusion. The value in the top left corner of the image is the average probability predicted by our ensemble classifiers. In particular, the magnified boxes in the magenta frame show that our method is able to pick up on small discriminative cues. When converting from MAD to SANE, the whisps become amplified and more uniform in brightness. When converting from Drusen to Normal, the small bumps along the cross-section are flattened out. When converting from Monarch to Viceroy, a cross-sectional line is added on the wing.
  • Figure 4: Original vs. Counterfactual Overlay. We visualize the difference between the input image and the counterfactual from DIFFusion. From SANE to MAD we notice a highlighting of the photon ring (green). From MAD to SANE we notice that the ring becomes less pronounced (magenta), and wisps appear (green).
  • Figure 5: User Study Results. We plot the results from user studies across users who studied our counterfactuals, users who studied the best baseline counterfactuals, and users who studied unpaired images. For both Butterfly and Black Hole datasets, we observe that the users who studied our counterfactuals significantly outperformed the other two groups. The violin plots illustrate the distribution of user percentages, where the width of each grey shape represents the density of data points at corresponding percentages.
  • ...and 8 more figures