Online Anchor-based Training for Image Classification Tasks

Maria Tzelepi; Vasileios Mezaris

Online Anchor-based Training for Image Classification Tasks

Maria Tzelepi, Vasileios Mezaris

TL;DR

The paper addresses image classification by introducing Online Anchor-based Training (OAT), which replaces direct class-label learning with offset regression relative to dynamically defined anchors computed as batch centers at the output. Training minimizes a mean-squared error on percentage-change targets $t_{OAT}^i = \frac{\mathbf{c}_i}{\mathbf{a}^i} - 1$ using anchors $a^i = \sigma\left(\frac{1}{|\mathcal{B}^i|} \sum_{\Phi(\mathbf{x}_j;\mathcal{W}) \in \mathcal{B}^i} \Phi(\mathbf{x}_j;\mathcal{W})\right)$, and test-time predictions are mapped back to the original class space via a test anchor. The approach is validated on four datasets (CIFAR-10, UCF-101, ERA, BAR) across diverse architectures, showing consistent accuracy improvements with computational costs comparable to standard training. Key contributions include the online batch-center anchor design, the offset-based objective, and the demonstrated applicability to both CNNs and transformers like ViT-L-16, achieving near- or state-of-the-art results on challenging benchmarks. This method offers a practical, architecture-agnostic enhancement to image classification with potential for broad adoption in real-world pipelines.

Abstract

In this paper, we aim to improve the performance of a deep learning model towards image classification tasks, proposing a novel anchor-based training methodology, named \textit{Online Anchor-based Training} (OAT). The OAT method, guided by the insights provided in the anchor-based object detection methodologies, instead of learning directly the class labels, proposes to train a model to learn percentage changes of the class labels with respect to defined anchors. We define as anchors the batch centers at the output of the model. Then, during the test phase, the predictions are converted back to the original class label space, and the performance is evaluated. The effectiveness of the OAT method is validated on four datasets.

Online Anchor-based Training for Image Classification Tasks

TL;DR

using anchors

, and test-time predictions are mapped back to the original class space via a test anchor. The approach is validated on four datasets (CIFAR-10, UCF-101, ERA, BAR) across diverse architectures, showing consistent accuracy improvements with computational costs comparable to standard training. Key contributions include the online batch-center anchor design, the offset-based objective, and the demonstrated applicability to both CNNs and transformers like ViT-L-16, achieving near- or state-of-the-art results on challenging benchmarks. This method offers a practical, architecture-agnostic enhancement to image classification with potential for broad adoption in real-world pipelines.

Abstract

Paper Structure (9 sections, 4 equations, 6 figures, 7 tables)

This paper contains 9 sections, 4 equations, 6 figures, 7 tables.

Introduction
Proposed Method
Experimental Evaluation
Datasets
Network Architectures
Evaluation Metrics
Implementation Details
Experimental Results
Conclusions

Figures (6)

Figure 1: Training and test phases of the proposed OAT method: In the training phase we dynamically compute the anchors at the output space of the model, and train the network to learn percentage changes of the class labels w.r.t to the anchors. Then during the test phase, the predictions are converted back to the original class label space, so that the accuracy of the model in predicting the correct class labels can be evaluated.
Figure 2: ERA dataset: Test accuracy throughout the training epochs using the ResNet-18 model.
Figure 3: ERA dataset: Test accuracy throughout the training epochs using the WRN-50-2 model.
Figure 4: ERA dataset: Test accuracy throughout the training epochs using the VIT-L-16 model.
Figure 5: Class: Baseball
...and 1 more figures

Online Anchor-based Training for Image Classification Tasks

TL;DR

Abstract

Online Anchor-based Training for Image Classification Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)