Image class translation: visual inspection of class-specific hypotheticals and classification based on translation distance
Mikyla K. Bowen, Jesse W. Wilson
TL;DR
This work tackles the explainability gap and out-of-domain sensitivity of medical image classifiers by introducing I2I-CT, which translates each input image into $K$ class-specific hypotheticals via CycleGAN for $K=2$ and StarGAN for $K>2$. Translation distances form a compact $d \,\in\, \mathbb{R}^K$ feature vector that supports visualization and simple classifiers, and in several medical datasets yields competitive or superior accuracy relative to end-to-end CNNs. Beyond classification, the approach reveals dataset biases and facilitates interpretability by visual inspection of generated hypotheticals. The results demonstrate that translation-distance classifiers can match or exceed CNN performance on multi-class tasks and offer a practical, interpretable complement to traditional black-box models in medical imaging.
Abstract
Purpose: A major barrier to the implementation of artificial intelligence for medical applications is the lack of explainability and high confidence for incorrect decisions, specifically with out-of-domain samples. We propose a generalization of image translation networks for image classification and demonstrate their potential as a more interpretable alternative to conventional black-box classifiers. Approach: We train an image2image network to translate an input image to class-specific hypotheticals, and then compare these with the input, both visually and quantitatively. Translation distances, i.e., the degree of alteration needed to conform to one class or another, are examined for clusters and trends, and used as simple low-dimensional feature vectors for classification. Results: On melanoma/benign dermoscopy images, a translation distance classifier achieved 80% accuracy using only a 2-dimensional feature space (versus 85% for a conventional CNN using a ~62,000-dimensional feature space). Visual inspection of rendered images revealed dataset biases, such as scalebars, vignetting, and pale background pigmentation in melanomas. Image distributions in translation distance space revealed a natural separation along the lines of dermatologist decision to biopsy, rather than between malignant and benign. On bone marrow cytology images, translation distance classifiers outperformed a conventional CNN in both 3-class (92% accuracy vs 89% for CNN) and 6-class (90% vs 86% for CNN) scenarios. Conclusions: This proof-of-concept shows the potential for image2image networks to go beyond artistic/stylistic changes and to expose dataset biases, perform dimension reduction and dataset visualization, and in some cases, potentially outperform conventional end-to-end CNN classifiers.
