Hyperspherical Classification with Dynamic Label-to-Prototype Assignment
Mohammad Saeed Ebrahimi Saadabadi, Ali Dabouei, Sahar Rahimi Malakshan, Nasser M. Nasrabad
TL;DR
This work introduces a non-parametric hyperspherical classifier with fixed, uniformly distributed prototypes and a dynamic, bijective label-to-prototype mapping. By decoupling prototype positions from class labels and solving a two-stage optimization via bipartite matching (Hungarian algorithm) and gradient-based backbone training, the method achieves improved metric-space utilization without relying on privileged information. The approach yields state-of-the-art performance on balanced and long-tail settings, particularly when the metric-space dimension is smaller than the number of classes, and demonstrates robustness to architectural changes and dataset scales. Overall, dynamic label-to-prototype assignment enhances inter-class relationships and intra-class compactness within a fixed prototype framework, enabling scalable, efficient, and effective hyperspherical classification.
Abstract
Aiming to enhance the utilization of metric space by the parametric softmax classifier, recent studies suggest replacing it with a non-parametric alternative. Although a non-parametric classifier may provide better metric space utilization, it introduces the challenge of capturing inter-class relationships. A shared characteristic among prior non-parametric classifiers is the static assignment of labels to prototypes during the training, ie, each prototype consistently represents a class throughout the training course. Orthogonal to previous works, we present a simple yet effective method to optimize the category assigned to each prototype (label-to-prototype assignment) during the training. To this aim, we formalize the problem as a two-step optimization objective over network parameters and label-to-prototype assignment mapping. We solve this optimization using a sequential combination of gradient descent and Bipartide matching. We demonstrate the benefits of the proposed approach by conducting experiments on balanced and long-tail classification problems using different backbone network architectures. In particular, our method outperforms its competitors by 1.22\% accuracy on CIFAR-100, and 2.15\% on ImageNet-200 using a metric space dimension half of the size of its competitors. Code: https://github.com/msed-Ebrahimi/DL2PA_CVPR24
