Table of Contents
Fetching ...

Norm-Q: Effective Compression Method for Hidden Markov Models in Neuro-Symbolic Applications

Hanyuan Gao, Xiaoxuan Yang

TL;DR

This work tackles memory and bandwidth bottlenecks in neuro-symbolic systems by compressing probabilistic symbolic components, specifically hidden Markov models (HMMs), through Norm-Q—a row-wise normalized fixed-point quantization strategy. It introduces Norm-Q and Norm-Q aware EM training, demonstrating that an HMM with $N_h=4096$ hidden states can be quantized to $8$ bits with no loss in constraint satisfaction or generation quality and achieves over $99\%$ compression of weights; $3$-bit quantization incurs modest generation-score losses. The method outperforms traditional layer-wise quantization and clustering approaches, showing strong scalability to larger HMM sizes ($N_h=8192$, $16384$) with minimal degradation. These results suggest practical applicability in neuro-symbolic pipelines and potential hardware-accelerated deployment. The study contributes a principled, system-aware compression pathway for probabilistic symbolic components, addressing a critical barrier to real-time, interpretable AI systems.

Abstract

Hidden Markov models (HMM) are commonly used in generation tasks and have demonstrated strong capabilities in neuro-symbolic applications for the Markov property. These applications leverage the strengths of neural networks and symbolic reasoning to create robust and interpretable AI systems. However, they may inherit and amplify the shortcomings of both approaches. Both components require dense computation and data transfer, and their communication further hinders performance. This paper proposes Norm-Q, a normalized linear quantization approach for compressing probabilistic symbolic models, such as HMMs. We reduce the bit width of the data with minimal impact, thereby alleviating memory and bandwidth stress and enabling deployment on potential custom hardware. Our method introduces a normalized quantization-aware expectation maximization process for probabilistic model training. The experimental results show that Norm-Q achieves a higher compression rate with reasonable score loss compared to traditional quantization methods. In the case of the constrained generation task of large language models, we successfully quantize an HMM of 4096 hidden states to 8 bits without loss and, at most, 3 bits with acceptable loss. Notably, the Norm-Q method can achieve a compression rate of 99% for the weights of the HMM. The code is open source at https://github.com/superstarghy/Norm-Q.

Norm-Q: Effective Compression Method for Hidden Markov Models in Neuro-Symbolic Applications

TL;DR

This work tackles memory and bandwidth bottlenecks in neuro-symbolic systems by compressing probabilistic symbolic components, specifically hidden Markov models (HMMs), through Norm-Q—a row-wise normalized fixed-point quantization strategy. It introduces Norm-Q and Norm-Q aware EM training, demonstrating that an HMM with hidden states can be quantized to bits with no loss in constraint satisfaction or generation quality and achieves over compression of weights; -bit quantization incurs modest generation-score losses. The method outperforms traditional layer-wise quantization and clustering approaches, showing strong scalability to larger HMM sizes (, ) with minimal degradation. These results suggest practical applicability in neuro-symbolic pipelines and potential hardware-accelerated deployment. The study contributes a principled, system-aware compression pathway for probabilistic symbolic components, addressing a critical barrier to real-time, interpretable AI systems.

Abstract

Hidden Markov models (HMM) are commonly used in generation tasks and have demonstrated strong capabilities in neuro-symbolic applications for the Markov property. These applications leverage the strengths of neural networks and symbolic reasoning to create robust and interpretable AI systems. However, they may inherit and amplify the shortcomings of both approaches. Both components require dense computation and data transfer, and their communication further hinders performance. This paper proposes Norm-Q, a normalized linear quantization approach for compressing probabilistic symbolic models, such as HMMs. We reduce the bit width of the data with minimal impact, thereby alleviating memory and bandwidth stress and enabling deployment on potential custom hardware. Our method introduces a normalized quantization-aware expectation maximization process for probabilistic model training. The experimental results show that Norm-Q achieves a higher compression rate with reasonable score loss compared to traditional quantization methods. In the case of the constrained generation task of large language models, we successfully quantize an HMM of 4096 hidden states to 8 bits without loss and, at most, 3 bits with acceptable loss. Notably, the Norm-Q method can achieve a compression rate of 99% for the weights of the HMM. The code is open source at https://github.com/superstarghy/Norm-Q.

Paper Structure

This paper contains 14 sections, 7 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Latency profiling of an LLM integrated with an HMM.
  • Figure 2: Data distribution of transition ($\alpha$) and emission ($\beta$) matrices of an HMM. The heat map is obtained by max-pooling and sampling to 64x64.
  • Figure 3: Results (x100%) of different quantization intervals.
  • Figure 4: Comparison of likelihoods of Norm-Q aware EM and Norm-Q.
  • Figure 5: Likelihood curve during EM. Quantized to 8 bits.