Table of Contents
Fetching ...

Comparative Study of CNN Architectures for Binary Classification of Horses and Motorcycles in the VOC 2008 Dataset

Muhammad Annas Shaikh, Hamza Zaman, Arbaz Asif

TL;DR

This study tackles the problem of imbalanced binary classification for VOC 2008 by comparing nine CNN architectures on horse vs motorcycle detection, using a minority-class augmentation strategy to improve minority recall. It combines a consistent training regimen with a targeted augmentation pipeline and reports ranking metrics AP_{std} and AP_{11pt} to capture both threshold-based and ranking performance. The key finding is that ConvNeXt-Tiny delivers the best overall performance (e.g., $AP_{std}=0.955$ for horses and $AP_{std}=0.891$ for motorcycles), while augmentation notably enhances minority-class detection with minimal ranking impact; Swin Transformer and ViT underperform under the studied setup. These insights inform architecture selection and practical deployment for imbalanced detection tasks, suggesting strong performance can be achieved with modern CNNs and targeted augmentation, even with limited fine-tuning.

Abstract

This paper presents a comprehensive evaluation of nine convolutional neural network architectures for binary classification of horses and motorcycles in the VOC 2008 dataset. We address the significant class imbalance problem by implementing minority-class augmentation techniques. Our experiments compare modern architectures including ResNet-50, ConvNeXt-Tiny, DenseNet-121, and Vision Transformer across multiple performance metrics. Results demonstrate substantial performance variations, with ConvNeXt-Tiny achieving the highest Average Precision (AP) of 95.53% for horse detection and 89.12% for motorcycle detection. We observe that data augmentation significantly improves minority class detection, particularly benefiting deeper architectures. This study provides insights into architecture selection for imbalanced binary classification tasks and quantifies the impact of data augmentation strategies in mitigating class imbalance issues in object detection.

Comparative Study of CNN Architectures for Binary Classification of Horses and Motorcycles in the VOC 2008 Dataset

TL;DR

This study tackles the problem of imbalanced binary classification for VOC 2008 by comparing nine CNN architectures on horse vs motorcycle detection, using a minority-class augmentation strategy to improve minority recall. It combines a consistent training regimen with a targeted augmentation pipeline and reports ranking metrics AP_{std} and AP_{11pt} to capture both threshold-based and ranking performance. The key finding is that ConvNeXt-Tiny delivers the best overall performance (e.g., for horses and for motorcycles), while augmentation notably enhances minority-class detection with minimal ranking impact; Swin Transformer and ViT underperform under the studied setup. These insights inform architecture selection and practical deployment for imbalanced detection tasks, suggesting strong performance can be achieved with modern CNNs and targeted augmentation, even with limited fine-tuning.

Abstract

This paper presents a comprehensive evaluation of nine convolutional neural network architectures for binary classification of horses and motorcycles in the VOC 2008 dataset. We address the significant class imbalance problem by implementing minority-class augmentation techniques. Our experiments compare modern architectures including ResNet-50, ConvNeXt-Tiny, DenseNet-121, and Vision Transformer across multiple performance metrics. Results demonstrate substantial performance variations, with ConvNeXt-Tiny achieving the highest Average Precision (AP) of 95.53% for horse detection and 89.12% for motorcycle detection. We observe that data augmentation significantly improves minority class detection, particularly benefiting deeper architectures. This study provides insights into architecture selection for imbalanced binary classification tasks and quantifies the impact of data augmentation strategies in mitigating class imbalance issues in object detection.

Paper Structure

This paper contains 26 sections, 1 equation, 1 figure, 6 tables.

Figures (1)

  • Figure 1: Comparison of Average Precision metrics across models for Horse Classification