Generating Multi-Center Classifier via Conditional Gaussian Distribution

Zhemin Zhang; Xun Gong

Generating Multi-Center Classifier via Conditional Gaussian Distribution

Zhemin Zhang, Xun Gong

TL;DR

This work tackles the limitation of uni-center linear classifiers by formulating a multi-center classifier grounded in a Gaussian Mixture assumption for deep features. For each class, a conditional Gaussian with mean $w_c$ and learnable variance $\sigma_c^2$ seeds $K$ sub-centers via $w_c^{(k)} = w_c + \sigma_c \odot \varepsilon$, and the centers are used as $C(K+1)$ classes during training with a Multi-Center Class Label and a variance regularization term that yields total loss $\mathcal{L} = \mathcal{L}_m + \mathcal{L}_{\sigma^2}$. At test time, sub-centers are discarded and only the class means $w_c$ are kept, keeping inference identical to a vanilla linear classifier. Empirically, the method improves top-1 accuracy on ImageNet-1K for both CNNs and ViTs (e.g., +0.9 percentage points for ResNet-50 and +0.4 points for Swin-T) and remains compatible with data augmentations and softmax variants, demonstrating enhanced intra-class structure without additional test-time cost. The approach offers a practical route to richer feature distributions and reduced over-clustering in large-scale visual recognition tasks.

Abstract

The linear classifier is widely used in various image classification tasks. It works by optimizing the distance between a sample and its corresponding class center. However, in real-world data, one class can contain several local clusters, e.g., birds of different poses. To address this complexity, we propose a novel multi-center classifier. Different from the vanilla linear classifier, our proposal is established on the assumption that the deep features of the training set follow a Gaussian Mixture distribution. Specifically, we create a conditional Gaussian distribution for each class and then sample multiple sub-centers from that distribution to extend the linear classifier. This approach allows the model to capture intra-class local structures more efficiently. In addition, at test time we set the mean of the conditional Gaussian distribution as the class center of the linear classifier and follow the vanilla linear classifier outputs, thus requiring no additional parameters or computational overhead. Extensive experiments on image classification show that the proposed multi-center classifier is a powerful alternative to widely used linear classifiers. Code available at https://github.com/ZheminZhang1/MultiCenter-Classifier.

Generating Multi-Center Classifier via Conditional Gaussian Distribution

TL;DR

and learnable variance

seeds

sub-centers via

, and the centers are used as

classes during training with a Multi-Center Class Label and a variance regularization term that yields total loss

. At test time, sub-centers are discarded and only the class means

are kept, keeping inference identical to a vanilla linear classifier. Empirically, the method improves top-1 accuracy on ImageNet-1K for both CNNs and ViTs (e.g., +0.9 percentage points for ResNet-50 and +0.4 points for Swin-T) and remains compatible with data augmentations and softmax variants, demonstrating enhanced intra-class structure without additional test-time cost. The approach offers a practical route to richer feature distributions and reduced over-clustering in large-scale visual recognition tasks.

Abstract

Paper Structure (17 sections, 11 equations, 4 figures, 7 tables)

This paper contains 17 sections, 11 equations, 4 figures, 7 tables.

Introduction
Related Work
Preliminaries
Proposed Method
Sampling Sub-Center
Multi-Center Class Label
Testing Phase
Experiments
Classiﬁcation on the ImageNet-1K
Classiﬁcation on Cifar-100 and Mini-ImageNet
Combining Data Augmentations and Softmax Variants
Data augmentations
Softmax variants
Ablation Study
Number of sub-centers
...and 2 more sections

Figures (4)

Figure 1: The comparison between uni-center and multi-center approaches when dealing with classes that have different sub-classes. (a) In the uni-center approach, samples belonging to the same class are assigned to a single center, which may not be suitable for real-world data. (b) Conversely, the multi-center approach allows for greater flexibility in modeling intra-class variance by setting multiple sub-centers within a class.
Figure 2: The overall training pipeline of vanilla linear classifier.
Figure 3: The overall training pipeline of the multi-center classifier. The weight ${{w}_{c}}$ of the linear classifier is used as the mean to create a conditional Gaussian distribution for each class. Multiple sub-centers are then sampled, and instead of using the original one-hot label, the cross-entropy loss is calculated based on the multi-center class label.
Figure 4: Effect of the number of sub-centers on model performance on ImageNet-1K.

Generating Multi-Center Classifier via Conditional Gaussian Distribution

TL;DR

Abstract

Generating Multi-Center Classifier via Conditional Gaussian Distribution

Authors

TL;DR

Abstract

Table of Contents

Figures (4)