Efficient Vector Symbolic Architectures from Histogram Recovery
Zirui Deng, Netanel Raviv
TL;DR
This work advances vector-symbolic architectures by introducing a concrete, explicit code construction that combines Reed-Solomon outer codes with Hadamard inner codes to yield a noise-resilient VSA with guaranteed quasi-orthogonality and efficient recovery. Recovery proceeds in two stages: lattice decoding recovers Hadamard codewords to produce histograms, and histogram recovery decodes these histograms to RS codewords via a KS-based algorithm on a disjunctive channel. The authors provide tight bounds and algorithms for both noiseless and noisy cases, including cases with non-distinct codewords and multiplicities, ensuring robust retrieval under structured noise. The approach offers practical, training-free guarantees and improves over Hadamard-only schemes by enabling larger, structured codebooks while maintaining recovery efficiency, making it well-suited for neurosymbolic AI applications requiring reliable compositional reasoning.
Abstract
Vector symbolic architectures (VSAs) are a family of information representation techniques which enable composition, i.e., creating complex information structures from atomic vectors via binding and superposition, and have recently found wide ranging applications in various neurosymbolic artificial intelligence (AI) systems. Recently, Raviv proposed the use of random linear codes in VSAs, suggesting that their subcode structure enables efficient binding, while preserving the quasi-orthogonality that is necessary for neural processing. Yet, random linear codes are difficult to decode under noise, which severely limits the resulting VSA's ability to support recovery, i.e., the retrieval of information objects and their attributes from a noisy compositional representation. In this work we bridge this gap by utilizing coding theoretic tools. First, we argue that the concatenation of Reed-Solomon and Hadamard codes is suitable for VSA, due to the mutual quasi-orthogonality of the resulting codewords (a folklore result). Second, we show that recovery of the resulting compositional representations can be done by solving a problem we call histogram recovery. In histogram recovery, a collection of $N$ histograms over a finite field is given as input, and one must find a collection of Reed-Solomon codewords of length $N$ whose entry-wise symbol frequencies obey those histograms. We present an optimal solution to the histogram recovery problem by using algorithms related to list-decoding, and analyze the resulting noise resilience. Our results give rise to a noise-resilient VSA with formal guarantees regarding efficient encoding, quasi-orthogonality, and recovery, without relying on any heuristics or training, and while operating at improved parameters relative to similar solutions such as the Hadamard code.
