DecoHD: Decomposed Hyperdimensional Classification under Extreme Memory Budgets
Sanggeon Yun, Hyunwoo Oh, Ryozo Masukawa, Mohsen Imani
TL;DR
DecoHD tackles the memory bottleneck of hyperdimensional computing by decomposing class prototypes into a compact, shared channel bank learned end-to-end in decomposed space. By forming $M=\prod_i L_i$ binding paths and using a lightweight class head, it preserves ambient dimensionality $D$ while reducing memory to $O(L_{ ext{tot}}D)+O(CM)$ and maintaining strong accuracy ($\approx$0.1–0.15% gap on average) with improved bit-flip robustness. The approach yields up to $97\%$ fewer trainable parameters and substantial hardware gains (up to $277\times$ energy efficiency and $35\times$ speedup on CPU), demonstrating strong potential for near-memory accelerators, TinyML, and edge federated AI. Across diverse datasets and budgets, DecoHD provides a principled trade-off between latent dimensionality, depth, and memory, favoring shallow decompositions that meet tight memory constraints without sacrificing performance.
Abstract
Decomposition is a proven way to shrink deep networks without changing I/O. We bring this idea to hyperdimensional computing (HDC), where footprint cuts usually shrink the feature axis and erode concentration and robustness. Prior HDC decompositions decode via fixed atomic hypervectors, which are ill-suited for compressing learned class prototypes. We introduce DecoHD, which learns directly in a decomposed HDC parameterization: a small, shared set of per-layer channels with multiplicative binding across layers and bundling at the end, yielding a large representational space from compact factors. DecoHD compresses along the class axis via a lightweight bundling head while preserving native bind-bundle-score; training is end-to-end, and inference remains pure HDC, aligning with in/near-memory accelerators. In evaluation, DecoHD attains extreme memory savings with only minor accuracy degradation under tight deployment budgets. On average it stays within about 0.1-0.15% of a strong non-reduced HDC baseline (worst case 5.7%), is more robust to random bit-flip noise, reaches its accuracy plateau with up to ~97% fewer trainable parameters, and -- in hardware -- delivers roughly 277x/35x energy/speed gains over a CPU (AMD Ryzen 9 9950X), 13.5x/3.7x over a GPU (NVIDIA RTX 4090), and 2.0x/2.4x over a baseline HDC ASIC.
