Table of Contents
Fetching ...

Fine-Grained Cat Breed Recognition with Global Context Vision Transformer

Mowmita Parvin Hera, Md. Shahriar Mahmud Kallol, Shohanur Rahman Nirob, Md. Badsha Bulbul, Jubayer Ahmed, M. Zhourul Islam, Hazrat Ali, Mohammmad Farhad Bulbul

TL;DR

This work tackles fine-grained cat breed recognition, a challenging FGVC problem with subtle inter-class differences and high intra-class variation. It introduces a GCViT-Tiny–based pipeline that uses patch embeddings and global-context attention, fine-tuned on the Oxford-IIIT Pet Dataset with extensive data augmentation. The model achieves 92.00% test accuracy and 94.54% validation accuracy, outperforming prior CNN and hybrid approaches on the same dataset, with strong per-breed performance though some confusion remains among visually similar breeds. The results demonstrate the effectiveness of global-context transformers for FGVC and suggest practical applicability in veterinary workflows and mobile breed-recognition systems, as underscored by the provided demo.

Abstract

Accurate identification of cat breeds from images is a challenging task due to subtle differences in fur patterns, facial structure, and color. In this paper, we present a deep learning-based approach for classifying cat breeds using a subset of the Oxford-IIIT Pet Dataset, which contains high-resolution images of various domestic breeds. We employed the Global Context Vision Transformer (GCViT) architecture-tiny for cat breed recognition. To improve model generalization, we used extensive data augmentation, including rotation, horizontal flipping, and brightness adjustment. Experimental results show that the GCViT-Tiny model achieved a test accuracy of 92.00% and validation accuracy of 94.54%. These findings highlight the effectiveness of transformer-based architectures for fine-grained image classification tasks. Potential applications include veterinary diagnostics, animal shelter management, and mobile-based breed recognition systems. We also provide a hugging face demo at https://huggingface.co/spaces/bfarhad/cat-breed-classifier.

Fine-Grained Cat Breed Recognition with Global Context Vision Transformer

TL;DR

This work tackles fine-grained cat breed recognition, a challenging FGVC problem with subtle inter-class differences and high intra-class variation. It introduces a GCViT-Tiny–based pipeline that uses patch embeddings and global-context attention, fine-tuned on the Oxford-IIIT Pet Dataset with extensive data augmentation. The model achieves 92.00% test accuracy and 94.54% validation accuracy, outperforming prior CNN and hybrid approaches on the same dataset, with strong per-breed performance though some confusion remains among visually similar breeds. The results demonstrate the effectiveness of global-context transformers for FGVC and suggest practical applicability in veterinary workflows and mobile breed-recognition systems, as underscored by the provided demo.

Abstract

Accurate identification of cat breeds from images is a challenging task due to subtle differences in fur patterns, facial structure, and color. In this paper, we present a deep learning-based approach for classifying cat breeds using a subset of the Oxford-IIIT Pet Dataset, which contains high-resolution images of various domestic breeds. We employed the Global Context Vision Transformer (GCViT) architecture-tiny for cat breed recognition. To improve model generalization, we used extensive data augmentation, including rotation, horizontal flipping, and brightness adjustment. Experimental results show that the GCViT-Tiny model achieved a test accuracy of 92.00% and validation accuracy of 94.54%. These findings highlight the effectiveness of transformer-based architectures for fine-grained image classification tasks. Potential applications include veterinary diagnostics, animal shelter management, and mobile-based breed recognition systems. We also provide a hugging face demo at https://huggingface.co/spaces/bfarhad/cat-breed-classifier.
Paper Structure (7 sections, 8 equations, 7 figures, 2 tables)

This paper contains 7 sections, 8 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Sample images of cat breeds
  • Figure 2: Cat breed classification pipeline
  • Figure 3: Accuracy per cat breed on the test set
  • Figure 4: Confusion matrix on the test set
  • Figure 5: Training and validation accuracy over epochs
  • ...and 2 more figures