Table of Contents
Fetching ...

Exploring possible vector systems for faster training of neural networks with preconfigured latent spaces

Nikita Gabdullin

TL;DR

This paper investigates how predefined vector systems can configure neural network latent spaces to enable classifier-free training for extremely large class counts. It generalizes beyond the An root-system approach by introducing Vn^D families (notably Vn^21 and Vn^22) and the nmin dimensionality concept to accelerate LSC training, validated on ImageNet-scale datasets and ViT-S architectures. Key findings show faster convergence with Vn^21 compared to An-based configurations, and that minimizing LS dimensionality can reduce training time and potentially embedding-database sizes, though CE-based training remains faster overall. These results illuminate how latent-space design impacts training dynamics and open avenues for universal latent-space concepts and more efficient large-scale embedding management.

Abstract

The overall neural network (NN) performance is closely related to the properties of its embedding distribution in latent space (LS). It has recently been shown that predefined vector systems, specifically An root system vectors, can be used as targets for latent space configurations (LSC) to ensure the desired LS structure. One of the main LSC advantage is the possibility of training classifier NNs without classification layers, which facilitates training NNs on datasets with extremely large numbers of classes. This paper provides a more general overview of possible vector systems for NN training along with their properties and methods for vector system construction. These systems are used to configure LS of encoders and visual transformers to significantly speed up ImageNet-1K and 50k-600k classes LSC training. It is also shown that using the minimum number of LS dimensions for a specific number of classes results in faster convergence. The latter has potential advantages for reducing the size of vector databases used to store NN embeddings.

Exploring possible vector systems for faster training of neural networks with preconfigured latent spaces

TL;DR

This paper investigates how predefined vector systems can configure neural network latent spaces to enable classifier-free training for extremely large class counts. It generalizes beyond the An root-system approach by introducing Vn^D families (notably Vn^21 and Vn^22) and the nmin dimensionality concept to accelerate LSC training, validated on ImageNet-scale datasets and ViT-S architectures. Key findings show faster convergence with Vn^21 compared to An-based configurations, and that minimizing LS dimensionality can reduce training time and potentially embedding-database sizes, though CE-based training remains faster overall. These results illuminate how latent-space design impacts training dynamics and open avenues for universal latent-space concepts and more efficient large-scale embedding management.

Abstract

The overall neural network (NN) performance is closely related to the properties of its embedding distribution in latent space (LS). It has recently been shown that predefined vector systems, specifically An root system vectors, can be used as targets for latent space configurations (LSC) to ensure the desired LS structure. One of the main LSC advantage is the possibility of training classifier NNs without classification layers, which facilitates training NNs on datasets with extremely large numbers of classes. This paper provides a more general overview of possible vector systems for NN training along with their properties and methods for vector system construction. These systems are used to configure LS of encoders and visual transformers to significantly speed up ImageNet-1K and 50k-600k classes LSC training. It is also shown that using the minimum number of LS dimensions for a specific number of classes results in faster convergence. The latter has potential advantages for reducing the size of vector databases used to store NN embeddings.

Paper Structure

This paper contains 12 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Training speed of encoder model trained on nclasses=5000 dataset with different LS configurations using (a) cosine and (b) Euclidean distances as loss functions.
  • Figure 2: Encoder training loss curves for various LS configurations for (a) nclasses=300k and (b) nclasses=600k training datasets.
  • Figure 3: The comparison of NN training for (a) encoder and ViT-S for $V_{n}^{21}$ on different datasets depending on ndim (b) ViT-S with nmin and different configurations for nclasses=50k.
  • Figure 4: Loss curves of ViT-S classifier trained with CE loss and ViT-S without classification layer trained with LSC using different LS configurations.
  • Figure 5: Changes in ViT-S training speed when using nmin with (a) LSC and (b) a conventional classifier trained with CE loss.