Table of Contents
Fetching ...

Hyperspherical Classification with Dynamic Label-to-Prototype Assignment

Mohammad Saeed Ebrahimi Saadabadi, Ali Dabouei, Sahar Rahimi Malakshan, Nasser M. Nasrabad

TL;DR

This work introduces a non-parametric hyperspherical classifier with fixed, uniformly distributed prototypes and a dynamic, bijective label-to-prototype mapping. By decoupling prototype positions from class labels and solving a two-stage optimization via bipartite matching (Hungarian algorithm) and gradient-based backbone training, the method achieves improved metric-space utilization without relying on privileged information. The approach yields state-of-the-art performance on balanced and long-tail settings, particularly when the metric-space dimension is smaller than the number of classes, and demonstrates robustness to architectural changes and dataset scales. Overall, dynamic label-to-prototype assignment enhances inter-class relationships and intra-class compactness within a fixed prototype framework, enabling scalable, efficient, and effective hyperspherical classification.

Abstract

Aiming to enhance the utilization of metric space by the parametric softmax classifier, recent studies suggest replacing it with a non-parametric alternative. Although a non-parametric classifier may provide better metric space utilization, it introduces the challenge of capturing inter-class relationships. A shared characteristic among prior non-parametric classifiers is the static assignment of labels to prototypes during the training, ie, each prototype consistently represents a class throughout the training course. Orthogonal to previous works, we present a simple yet effective method to optimize the category assigned to each prototype (label-to-prototype assignment) during the training. To this aim, we formalize the problem as a two-step optimization objective over network parameters and label-to-prototype assignment mapping. We solve this optimization using a sequential combination of gradient descent and Bipartide matching. We demonstrate the benefits of the proposed approach by conducting experiments on balanced and long-tail classification problems using different backbone network architectures. In particular, our method outperforms its competitors by 1.22\% accuracy on CIFAR-100, and 2.15\% on ImageNet-200 using a metric space dimension half of the size of its competitors. Code: https://github.com/msed-Ebrahimi/DL2PA_CVPR24

Hyperspherical Classification with Dynamic Label-to-Prototype Assignment

TL;DR

This work introduces a non-parametric hyperspherical classifier with fixed, uniformly distributed prototypes and a dynamic, bijective label-to-prototype mapping. By decoupling prototype positions from class labels and solving a two-stage optimization via bipartite matching (Hungarian algorithm) and gradient-based backbone training, the method achieves improved metric-space utilization without relying on privileged information. The approach yields state-of-the-art performance on balanced and long-tail settings, particularly when the metric-space dimension is smaller than the number of classes, and demonstrates robustness to architectural changes and dataset scales. Overall, dynamic label-to-prototype assignment enhances inter-class relationships and intra-class compactness within a fixed prototype framework, enabling scalable, efficient, and effective hyperspherical classification.

Abstract

Aiming to enhance the utilization of metric space by the parametric softmax classifier, recent studies suggest replacing it with a non-parametric alternative. Although a non-parametric classifier may provide better metric space utilization, it introduces the challenge of capturing inter-class relationships. A shared characteristic among prior non-parametric classifiers is the static assignment of labels to prototypes during the training, ie, each prototype consistently represents a class throughout the training course. Orthogonal to previous works, we present a simple yet effective method to optimize the category assigned to each prototype (label-to-prototype assignment) during the training. To this aim, we formalize the problem as a two-step optimization objective over network parameters and label-to-prototype assignment mapping. We solve this optimization using a sequential combination of gradient descent and Bipartide matching. We demonstrate the benefits of the proposed approach by conducting experiments on balanced and long-tail classification problems using different backbone network architectures. In particular, our method outperforms its competitors by 1.22\% accuracy on CIFAR-100, and 2.15\% on ImageNet-200 using a metric space dimension half of the size of its competitors. Code: https://github.com/msed-Ebrahimi/DL2PA_CVPR24
Paper Structure (25 sections, 12 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 25 sections, 12 equations, 4 figures, 9 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison of the proposed method with the conventional PSC and the previous fixed classifier setup, using a toy example with three classes. Each color denotes a distinct class. a) Label-to-prototype assignment remains static during training. In PSC, optimization focuses on the network, consisting of the backbone and prototypes $\mathbf{W}$. In the case of a fixed classifier, only the backbone is optimized, and prototypes remain fixed. b) In the proposed method, prototypes within the hypersphere are fixed, and optimization targets the backbone and the label that each prototype represents. c) Toy example showing changes in label-to-prototype assignment during training.
  • Figure 2: Comparing the Average Pairwise Angular Distance (APAD) value of the 100 prototypes drawn from multivariate Gaussian distributions with the covariance matrix of $\sigma I$ and the proposed optimized prototypes. a) Analysis of the APAD value. Notably, random prototypes drawn from arbitrary zero-mean distributions yield the optimal APAD value, underscoring that uniformity across $\small{S^{d-1}}$ cannot be solely guaranteed by this objective function. b) Illustrating optimized and degenerate solutions on the $\small{S^2}$. Highlighted in yellow are areas where multiple prototypes exhibit closer proximity than ideal. c) Comparison of the minimum cosine distance, i.e., $1-\cos(\mathbf{w}_i,\mathbf{w}_j)$, for our optimized and degenerate solutions prototypes. Greater distances are indicative of superior utilization of metric space.
  • Figure 3: a) Average inter-prototype cosine when $\small{d=c=100}$. b) Classification accuracy (%) on CIFAR-100 with ResNet-32 when the $\tau'$ changes. c) Time consumed for updating the $A$. Since we update the label-to-prototype assignment every epoch, $\tau'=1.0$, this computation time is negligible compared to the total training time. d) Effect of regularizing the PSC with $\small{L_{uni}}$ with different scaling hyperparameter $\small{\lambda}$. The horizontal dotted and '-.-' lines represent the PSC and proposed method performance, respectively.
  • Figure 4: a) Classification accuracy (%) on CIFAR-100 using ResNet-32 w/wo dynamic assignment showing the significance of optimizing $A$ in low dimensional metric space. b) Normalized difference of consecutive assignments during training.