Exploring possible vector systems for faster training of neural networks with preconfigured latent spaces
Nikita Gabdullin
TL;DR
This paper investigates how predefined vector systems can configure neural network latent spaces to enable classifier-free training for extremely large class counts. It generalizes beyond the An root-system approach by introducing Vn^D families (notably Vn^21 and Vn^22) and the nmin dimensionality concept to accelerate LSC training, validated on ImageNet-scale datasets and ViT-S architectures. Key findings show faster convergence with Vn^21 compared to An-based configurations, and that minimizing LS dimensionality can reduce training time and potentially embedding-database sizes, though CE-based training remains faster overall. These results illuminate how latent-space design impacts training dynamics and open avenues for universal latent-space concepts and more efficient large-scale embedding management.
Abstract
The overall neural network (NN) performance is closely related to the properties of its embedding distribution in latent space (LS). It has recently been shown that predefined vector systems, specifically An root system vectors, can be used as targets for latent space configurations (LSC) to ensure the desired LS structure. One of the main LSC advantage is the possibility of training classifier NNs without classification layers, which facilitates training NNs on datasets with extremely large numbers of classes. This paper provides a more general overview of possible vector systems for NN training along with their properties and methods for vector system construction. These systems are used to configure LS of encoders and visual transformers to significantly speed up ImageNet-1K and 50k-600k classes LSC training. It is also shown that using the minimum number of LS dimensions for a specific number of classes results in faster convergence. The latter has potential advantages for reducing the size of vector databases used to store NN embeddings.
