Using predefined vector systems as latent space configuration for neural network supervised training on data with arbitrarily large number of classes
Nikita Gabdullin
TL;DR
The paper tackles the problem of neural network parameter growth as the number of target classes increases, proposing latent space configuration (LSC) with predefined center vectors derived from $A_n$ root systems to train the same architecture regardless of $n_{\text{classes}}$. Embeddings are guided to match a fixed latent-space distribution using either Euclidean or cosine metrics, enabling training without class-dependent classifier parameters and with batch-efficient center handling. The approach is validated through low- and high-dimensional experiments on Cinic-10 and ImageNet-1K, including artificial 1.28 million-class scenarios, and is shown to support lifelong learning and potential distillation. While LSC trades off some training speed for scalability, the work outlines concrete avenues (e.g., SSL-LSC hybrids, improved center configurations) to accelerate convergence and broaden applicability to very large or evolving class sets.
Abstract
Supervised learning (SL) methods are indispensable for neural network (NN) training used to perform classification tasks. While resulting in very high accuracy, SL training often requires making NN parameter number dependent on the number of classes, limiting their applicability when the number of classes is extremely large or unknown in advance. In this paper we propose a methodology that allows one to train the same NN architecture regardless of the number of classes. This is achieved by using predefined vector systems as the target latent space configuration (LSC) during NN training. We discuss the desired properties of target configurations and choose randomly perturbed vectors of An root system for our experiments. These vectors are used to successfully train encoders and visual transformers (ViT) on Cinic-10 and ImageNet-1K in low- and high-dimensional cases by matching NN predictions with the predefined vectors. Finally, ViT is trained on a dataset with 1.28 million classes illustrating the applicability of the method to training on datasets with extremely large number of classes. In addition, potential applications of LSC in lifelong learning and NN distillation are discussed illustrating versatility of the proposed methodology.
