Table of Contents
Fetching ...

DecoHD: Decomposed Hyperdimensional Classification under Extreme Memory Budgets

Sanggeon Yun, Hyunwoo Oh, Ryozo Masukawa, Mohsen Imani

TL;DR

DecoHD tackles the memory bottleneck of hyperdimensional computing by decomposing class prototypes into a compact, shared channel bank learned end-to-end in decomposed space. By forming $M=\prod_i L_i$ binding paths and using a lightweight class head, it preserves ambient dimensionality $D$ while reducing memory to $O(L_{ ext{tot}}D)+O(CM)$ and maintaining strong accuracy ($\approx$0.1–0.15% gap on average) with improved bit-flip robustness. The approach yields up to $97\%$ fewer trainable parameters and substantial hardware gains (up to $277\times$ energy efficiency and $35\times$ speedup on CPU), demonstrating strong potential for near-memory accelerators, TinyML, and edge federated AI. Across diverse datasets and budgets, DecoHD provides a principled trade-off between latent dimensionality, depth, and memory, favoring shallow decompositions that meet tight memory constraints without sacrificing performance.

Abstract

Decomposition is a proven way to shrink deep networks without changing I/O. We bring this idea to hyperdimensional computing (HDC), where footprint cuts usually shrink the feature axis and erode concentration and robustness. Prior HDC decompositions decode via fixed atomic hypervectors, which are ill-suited for compressing learned class prototypes. We introduce DecoHD, which learns directly in a decomposed HDC parameterization: a small, shared set of per-layer channels with multiplicative binding across layers and bundling at the end, yielding a large representational space from compact factors. DecoHD compresses along the class axis via a lightweight bundling head while preserving native bind-bundle-score; training is end-to-end, and inference remains pure HDC, aligning with in/near-memory accelerators. In evaluation, DecoHD attains extreme memory savings with only minor accuracy degradation under tight deployment budgets. On average it stays within about 0.1-0.15% of a strong non-reduced HDC baseline (worst case 5.7%), is more robust to random bit-flip noise, reaches its accuracy plateau with up to ~97% fewer trainable parameters, and -- in hardware -- delivers roughly 277x/35x energy/speed gains over a CPU (AMD Ryzen 9 9950X), 13.5x/3.7x over a GPU (NVIDIA RTX 4090), and 2.0x/2.4x over a baseline HDC ASIC.

DecoHD: Decomposed Hyperdimensional Classification under Extreme Memory Budgets

TL;DR

DecoHD tackles the memory bottleneck of hyperdimensional computing by decomposing class prototypes into a compact, shared channel bank learned end-to-end in decomposed space. By forming binding paths and using a lightweight class head, it preserves ambient dimensionality while reducing memory to and maintaining strong accuracy (0.1–0.15% gap on average) with improved bit-flip robustness. The approach yields up to fewer trainable parameters and substantial hardware gains (up to energy efficiency and speedup on CPU), demonstrating strong potential for near-memory accelerators, TinyML, and edge federated AI. Across diverse datasets and budgets, DecoHD provides a principled trade-off between latent dimensionality, depth, and memory, favoring shallow decompositions that meet tight memory constraints without sacrificing performance.

Abstract

Decomposition is a proven way to shrink deep networks without changing I/O. We bring this idea to hyperdimensional computing (HDC), where footprint cuts usually shrink the feature axis and erode concentration and robustness. Prior HDC decompositions decode via fixed atomic hypervectors, which are ill-suited for compressing learned class prototypes. We introduce DecoHD, which learns directly in a decomposed HDC parameterization: a small, shared set of per-layer channels with multiplicative binding across layers and bundling at the end, yielding a large representational space from compact factors. DecoHD compresses along the class axis via a lightweight bundling head while preserving native bind-bundle-score; training is end-to-end, and inference remains pure HDC, aligning with in/near-memory accelerators. In evaluation, DecoHD attains extreme memory savings with only minor accuracy degradation under tight deployment budgets. On average it stays within about 0.1-0.15% of a strong non-reduced HDC baseline (worst case 5.7%), is more robust to random bit-flip noise, reaches its accuracy plateau with up to ~97% fewer trainable parameters, and -- in hardware -- delivers roughly 277x/35x energy/speed gains over a CPU (AMD Ryzen 9 9950X), 13.5x/3.7x over a GPU (NVIDIA RTX 4090), and 2.0x/2.4x over a baseline HDC ASIC.

Paper Structure

This paper contains 19 sections, 8 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Our proposed prototype reduction strategy. Conventional models store one dense hypervector per class ($\mathcal{O}(CD)$). DecoHD composes prototypes from a small shared set of channels, yielding $M=\prod_i L_i$bound paths with $\sum_i L_i \le M \le C$, reducing memory while preserving full-$D$ representations. As illustrated in the bottom-left figure, varying the number of layers $N$ and the channel count $\sum_i L_i$ enables a tunable trade-off among memory footprint, inference cost, accuracy, and robustness, whereas prior feature-axis methods collapse to a single point at fixed dimensionality.
  • Figure 2: DecoHD overview. A fixed encoder maps inputs to hypervectors $h\!\in\!\mathbb{R}^{D}$. $N$ layers each provide $L_i$ learnable channels $\{A^{(i)}_{\ell}\}$ generated from low-dimensional latents via frozen projections. For an input, all $M=\prod_i L_i$ path hypervectors are formed by successive binding ($\otimes$); class-wise vectors are then produced by weighted bundling ($\oplus$) with $W\!\in\!\mathbb{R}^{C\times M}$ and scored against $h$ via dot products. Training updates only the latents and $W$.
  • Figure 3: Accuracy versus a strong HDC baseline. Values in parentheses denote the target memory budget $m$ enforced relative to a conventional prototype table. We report test accuracy across numeric precisions and hypervector dimensionalities $D$, and compare to OnlineHD (no parameter reduction).
  • Figure 4: Accuracy compared to state-of-the-art feature-axis compression. Values in parentheses denote the target memory budget $m$. To isolate the impact of class-axis decomposition, we compare against SparseHD, a representative feature-axis reduction method, under matched budgets.
  • Figure 5: Robustness to random bit-flip noise. Values in parentheses denote the target memory budget $m$. We inject independent random bit flips into 32-bit floating-point representations and evaluate accuracy as the flip probability increases.
  • ...and 2 more figures