Evolving Interpretable Visual Classifiers with Large Language Models
Mia Chiquier, Utkarsh Mall, Carl Vondrick
TL;DR
The paper tackles the interpretability gap in vision-language classifiers by learning discrete, human-interpretable attribute sets per class through an evolutionary framework in which an open LLM proposes mutations guided by past performance. A concept bottleneck model aggregates attribute scores using a CLIP-based scorer, enabling open-vocabulary classification without class-name priors. Empirical results on fine-grained iNaturalist and novel Kiki-Bouba concepts demonstrate substantial improvements over baselines and provide a mechanism to audit dataset bias via interpretable attributes. The approach offers practical benefits for trust, explainability, and bias analysis in specialized domains, with potential limitations tied to the biases of the underlying LLMs.
Abstract
Multimodal pre-trained models, such as CLIP, are popular for zero-shot classification due to their open-vocabulary flexibility and high performance. However, vision-language models, which compute similarity scores between images and class labels, are largely black-box, with limited interpretability, risk for bias, and inability to discover new visual concepts not written down. Moreover, in practical settings, the vocabulary for class names and attributes of specialized concepts will not be known, preventing these methods from performing well on images uncommon in large-scale vision-language datasets. To address these limitations, we present a novel method that discovers interpretable yet discriminative sets of attributes for visual recognition. We introduce an evolutionary search algorithm that uses a large language model and its in-context learning abilities to iteratively mutate a concept bottleneck of attributes for classification. Our method produces state-of-the-art, interpretable fine-grained classifiers. We outperform the latest baselines by 18.4% on five fine-grained iNaturalist datasets and by 22.2% on two KikiBouba datasets, despite the baselines having access to privileged information about class names.
