Table of Contents
Fetching ...

Do you see what I see? An Ambiguous Optical Illusion Dataset exposing limitations of Explainable AI

Carina Newen, Luca Hinkamp, Maria Ntonti, Emmanuel Müller

TL;DR

This paper identifies a fundamental gap in explainable AI for visual data: pixel-level attributions often fail to resolve perceptual ambiguity inherent in optical illusions. It proposes gaze direction and eye-position as generalizable concepts to guide learning and explanations, and introduces Ambivision, an open-source dataset of two-animal optical illusions with bounding boxes and explicit gaze/eye annotations. Across multiple architectures, experiments show that incorporating these concept-level cues improves classification accuracy on ambiguous images and reveals limitations of standard XAI methods like Grad-CAM, Integrated Gradients, and PipNet. The work highlights bias-mitigation strategies in synthetic data generation and advocates a shift toward concept-based explanations, with potential impact on safety-critical vision tasks.

Abstract

From uncertainty quantification to real-world object detection, we recognize the importance of machine learning algorithms, particularly in safety-critical domains such as autonomous driving or medical diagnostics. In machine learning, ambiguous data plays an important role in various machine learning domains. Optical illusions present a compelling area of study in this context, as they offer insight into the limitations of both human and machine perception. Despite this relevance, optical illusion datasets remain scarce. In this work, we introduce a novel dataset of optical illusions featuring intermingled animal pairs designed to evoke perceptual ambiguity. We identify generalizable visual concepts, particularly gaze direction and eye cues, as subtle yet impactful features that significantly influence model accuracy. By confronting models with perceptual ambiguity, our findings underscore the importance of concepts in visual learning and provide a foundation for studying bias and alignment between human and machine vision. To make this dataset useful for general purposes, we generate optical illusions systematically with different concepts discussed in our bias mitigation section. The dataset is accessible in Kaggle via https://kaggle.com/datasets/693bf7c6dd2cb45c8a863f9177350c8f9849a9508e9d50526e2ffcc5559a8333. Our source code can be found at https://github.com/KDD-OpenSource/Ambivision.git.

Do you see what I see? An Ambiguous Optical Illusion Dataset exposing limitations of Explainable AI

TL;DR

This paper identifies a fundamental gap in explainable AI for visual data: pixel-level attributions often fail to resolve perceptual ambiguity inherent in optical illusions. It proposes gaze direction and eye-position as generalizable concepts to guide learning and explanations, and introduces Ambivision, an open-source dataset of two-animal optical illusions with bounding boxes and explicit gaze/eye annotations. Across multiple architectures, experiments show that incorporating these concept-level cues improves classification accuracy on ambiguous images and reveals limitations of standard XAI methods like Grad-CAM, Integrated Gradients, and PipNet. The work highlights bias-mitigation strategies in synthetic data generation and advocates a shift toward concept-based explanations, with potential impact on safety-critical vision tasks.

Abstract

From uncertainty quantification to real-world object detection, we recognize the importance of machine learning algorithms, particularly in safety-critical domains such as autonomous driving or medical diagnostics. In machine learning, ambiguous data plays an important role in various machine learning domains. Optical illusions present a compelling area of study in this context, as they offer insight into the limitations of both human and machine perception. Despite this relevance, optical illusion datasets remain scarce. In this work, we introduce a novel dataset of optical illusions featuring intermingled animal pairs designed to evoke perceptual ambiguity. We identify generalizable visual concepts, particularly gaze direction and eye cues, as subtle yet impactful features that significantly influence model accuracy. By confronting models with perceptual ambiguity, our findings underscore the importance of concepts in visual learning and provide a foundation for studying bias and alignment between human and machine vision. To make this dataset useful for general purposes, we generate optical illusions systematically with different concepts discussed in our bias mitigation section. The dataset is accessible in Kaggle via https://kaggle.com/datasets/693bf7c6dd2cb45c8a863f9177350c8f9849a9508e9d50526e2ffcc5559a8333. Our source code can be found at https://github.com/KDD-OpenSource/Ambivision.git.

Paper Structure

This paper contains 12 sections, 1 equation, 18 figures, 1 table.

Figures (18)

  • Figure 1: In this image, you can see both a rabbit and a duck. Common XAI methods that highlight important pixels could output exactly the same explanation for either of those classes without improving human understanding of which class was chosen why. This is a critical research gap in explanations- pixel highlighting is simply not enough. One way of distinguishing the two depends on the way you consider the eyes to be looking in. While this is not the only way to approach the problem, we will highlight the usefulness of the gaze for the classification task in our evaluations. However, we show that with one very small addition, we can erase the ambiguity of an image for humans and improve it for machine learners. We argue that the future in XAI lies in uncovering such concepts rather than highlighting pixels, which is the critical research gap we address in this paper.
  • Figure 2: We feature here several examples of our dataset. For example, on the upper left side, a penguin can be seen hidden within a horse, depending on the direction we consider the animal to be looking. All of these examples have two animals distinguishable by the eye coordinate and the gaze vector, meaning they might be looking in the same direction, but their right eye (if more than one is visible) is positioned somewhere differently. That is why this is unique even in the case of the lion and eagle image, as the right eye of both is in a different position (image on the bottom right). For most of these images, however, the looking direction will also be distinguishable. The goal of this dataset was to test whether the gaze and eye coordinates prove to be useful general concepts in ambiguous settings but can, in general, be used for the evaluation of XAI as baseline for classification performance with optical illusions. More example pictures are included in the Appendix \ref{['sec:Appendix']} in Figure \ref{['fig:overviewbirds']}.
  • Figure 3: Pixel-based attribution explanations like Grad-CAM struggle to distinguish between the intermingling areas of the two animals. We later show that adding a single feature significantly enhances performance on ambiguous data.
  • Figure 4: The same can be spotted for example using Integrated Gradients: The attributions for the classes are very similar, and often not very sensible. The explanation for bear clearly marks the rabbits' head in the picture. Both the rabbit and bear explanation include markings on the bear face area. Clearly, the model struggles to distinguish the two animals, and the explanations are limited in their meaningfulness and clarity.
  • Figure 5: In this image, we see prototypes extracted via Pipnet nauta2023pip for the eagle class. Pipnet also struggles to distinguish the cheetah fur and the eagle feathers. The darker the box, the more important it is for the classification. Again, we argue that this is due to the area-based and not concept-based explanations, a clear limitation for ambiguous data. Next to the original image, we see example concepts extracted that should show similar features.
  • ...and 13 more figures