Fine-Grained ImageNet Classification in the Wild
Maria Lymperaiou, Konstantinos Thomas, Giorgos Stamou
TL;DR
This work tackles robustness of image classifiers under real-world distribution shifts by using uncurated web images to perform fine-grained ImageNet classification guided by WordNet hierarchies. It introduces a three-stage method: build a WordNet-guided, balanced web-image dataset without fine-tuning pre-trained CNNs and Transformers, and evaluate results with knowledge-driven metrics that quantify semantic similarity of misclassifications using $path(c_1, c_2)$, $LCH$, and $WUPS$. The study finds that accuracy alone fails to capture misclassification quality, with knowledge-driven metrics revealing whether errors are semantically related or distant, and showing transformers often align more closely with semantic relations than CNNs. The paper provides an explainable evaluation framework and a reproducible pipeline for assessing fine-grained classification under real-world conditions, highlighting practical implications for robust deployment and future research directions.
Abstract
Image classification has been one of the most popular tasks in Deep Learning, seeing an abundance of impressive implementations each year. However, there is a lot of criticism tied to promoting complex architectures that continuously push performance metrics higher and higher. Robustness tests can uncover several vulnerabilities and biases which go unnoticed during the typical model evaluation stage. So far, model robustness under distribution shifts has mainly been examined within carefully curated datasets. Nevertheless, such approaches do not test the real response of classifiers in the wild, e.g. when uncurated web-crawled image data of corresponding classes are provided. In our work, we perform fine-grained classification on closely related categories, which are identified with the help of hierarchical knowledge. Extensive experimentation on a variety of convolutional and transformer-based architectures reveals model robustness in this novel setting. Finally, hierarchical knowledge is again employed to evaluate and explain misclassifications, providing an information-rich evaluation scheme adaptable to any classifier.
