Online Anchor-based Training for Image Classification Tasks
Maria Tzelepi, Vasileios Mezaris
TL;DR
The paper addresses image classification by introducing Online Anchor-based Training (OAT), which replaces direct class-label learning with offset regression relative to dynamically defined anchors computed as batch centers at the output. Training minimizes a mean-squared error on percentage-change targets $t_{OAT}^i = \frac{\mathbf{c}_i}{\mathbf{a}^i} - 1$ using anchors $a^i = \sigma\left(\frac{1}{|\mathcal{B}^i|} \sum_{\Phi(\mathbf{x}_j;\mathcal{W}) \in \mathcal{B}^i} \Phi(\mathbf{x}_j;\mathcal{W})\right)$, and test-time predictions are mapped back to the original class space via a test anchor. The approach is validated on four datasets (CIFAR-10, UCF-101, ERA, BAR) across diverse architectures, showing consistent accuracy improvements with computational costs comparable to standard training. Key contributions include the online batch-center anchor design, the offset-based objective, and the demonstrated applicability to both CNNs and transformers like ViT-L-16, achieving near- or state-of-the-art results on challenging benchmarks. This method offers a practical, architecture-agnostic enhancement to image classification with potential for broad adoption in real-world pipelines.
Abstract
In this paper, we aim to improve the performance of a deep learning model towards image classification tasks, proposing a novel anchor-based training methodology, named \textit{Online Anchor-based Training} (OAT). The OAT method, guided by the insights provided in the anchor-based object detection methodologies, instead of learning directly the class labels, proposes to train a model to learn percentage changes of the class labels with respect to defined anchors. We define as anchors the batch centers at the output of the model. Then, during the test phase, the predictions are converted back to the original class label space, and the performance is evaluated. The effectiveness of the OAT method is validated on four datasets.
