Table of Contents
Fetching ...

Towards Vector Optimization on Low-Dimensional Vector Symbolic Architecture

Shijin Duan, Yejia Liu, Gaowen Liu, Ramana Rao Kompella, Shaolei Ren, Xiaolin Xu

TL;DR

This work analyzes gradient-based training for Low-Dimensional Computing (LDC) in binary Vector Symbolic Architecture (VSA), showing that aggressive dimensionality reduction can preserve accuracy when training dynamics are stabilized. It identifies Batch Normalization (BN) and Knowledge Distillation (KD) as key enablers: BN stabilizes feature-vector updates and can be incorporated as a dimension-wise inference threshold, while KD reshapes the learning signal and improves prediction confidence via temperature-controlled distillation. The authors provide extensive analyses and ablations across multiple lightweight datasets, demonstrating that BN+KD substantially boosts inference accuracy and confidence, enabling LDC to reach or approach state-of-the-art accuracy with only a fraction of the memory and latency of high-dimensional VSA models. The results also suggest the approach generalizes to binary neural networks, offering practical strategies for training dynamics and confidence calibration in resource-constrained settings.

Abstract

Vector Symbolic Architecture (VSA) is emerging in machine learning due to its efficiency, but they are hindered by issues of hyperdimensionality and accuracy. As a promising mitigation, the Low-Dimensional Computing (LDC) method significantly reduces the vector dimension by ~100 times while maintaining accuracy, by employing a gradient-based optimization. Despite its potential, LDC optimization for VSA is still underexplored. Our investigation into vector updates underscores the importance of stable, adaptive dynamics in LDC training. We also reveal the overlooked yet critical roles of batch normalization (BN) and knowledge distillation (KD) in standard approaches. Besides the accuracy boost, BN does not add computational overhead during inference, and KD significantly enhances inference confidence. Through extensive experiments and ablation studies across multiple benchmarks, we provide a thorough evaluation of our approach and extend the interpretability of binary neural network optimization similar to LDC, previously unaddressed in BNN literature.

Towards Vector Optimization on Low-Dimensional Vector Symbolic Architecture

TL;DR

This work analyzes gradient-based training for Low-Dimensional Computing (LDC) in binary Vector Symbolic Architecture (VSA), showing that aggressive dimensionality reduction can preserve accuracy when training dynamics are stabilized. It identifies Batch Normalization (BN) and Knowledge Distillation (KD) as key enablers: BN stabilizes feature-vector updates and can be incorporated as a dimension-wise inference threshold, while KD reshapes the learning signal and improves prediction confidence via temperature-controlled distillation. The authors provide extensive analyses and ablations across multiple lightweight datasets, demonstrating that BN+KD substantially boosts inference accuracy and confidence, enabling LDC to reach or approach state-of-the-art accuracy with only a fraction of the memory and latency of high-dimensional VSA models. The results also suggest the approach generalizes to binary neural networks, offering practical strategies for training dynamics and confidence calibration in resource-constrained settings.

Abstract

Vector Symbolic Architecture (VSA) is emerging in machine learning due to its efficiency, but they are hindered by issues of hyperdimensionality and accuracy. As a promising mitigation, the Low-Dimensional Computing (LDC) method significantly reduces the vector dimension by ~100 times while maintaining accuracy, by employing a gradient-based optimization. Despite its potential, LDC optimization for VSA is still underexplored. Our investigation into vector updates underscores the importance of stable, adaptive dynamics in LDC training. We also reveal the overlooked yet critical roles of batch normalization (BN) and knowledge distillation (KD) in standard approaches. Besides the accuracy boost, BN does not add computational overhead during inference, and KD significantly enhances inference confidence. Through extensive experiments and ablation studies across multiple benchmarks, we provide a thorough evaluation of our approach and extend the interpretability of binary neural network optimization similar to LDC, previously unaddressed in BNN literature.

Paper Structure

This paper contains 29 sections, 16 equations, 8 figures, 16 tables.

Figures (8)

  • Figure 1: Feature vector training analysis for vanilla LDC (a)(b)(c) and our BN-based method (d)(e)(f). All figures are histograms of distributions. We run 10 epochs for (a)(b)(d)(e) to demonstrate the efficacy of BN on LDC training: BN shapes the accumulation $\bm{y}$ to zero mean and unit variance, providing more active and moderate gradients for weight updating. We run 50 epochs for (c) and (f) to show F$^r$ distributino during training. The case study is tested on the LDC model with dimension $D=64$ and on FashionMNIST.
  • Figure 1: Preliminary result on methods to mitigate the F training deficiencies. $\alpha(\textbf{F})\downarrow$ is to reduce the scaling factor. $\bm{\delta}\uparrow$ is to increase the active range of sgn$()$. BN is the batch normalization method that we adopt in our work. We show the variance of $\bm{y}$ distribution and the range of $\partial \mathcal{L}/\partial \textbf{F}^r$ as metrics since they directly dominate the F updating.
  • Figure 2: The C$^r$ gradient distribution of binary VSA after training for 10 epochs with and without KD.
  • Figure 3: Investigating the influence of label smoothing and teacher on LDC training. $f$ is the scaling factor for label smoothing, i.e., $f\bm{t}+(1-f)/K$. "HL-T" means hard-label from teacher.
  • Figure 3: The C$^r$ distribution along 50 epochs.
  • ...and 3 more figures