Table of Contents
Fetching ...

MonoSparse-CAM: Efficient Tree Model Processing via Monotonicity and Sparsity in CAMs

Tergel Molom-Ochir, Brady Taylor, Hai Li, Yiran Chen

TL;DR

The paper tackles the challenge of energy-efficient hardware acceleration for tree-based models on analog CAM arrays. It introduces MonoSparse-CAM, a software-hardware co-design that exploits both sparsity in TBML and the monotonicity of CAM matchlines to skip nonessential computations, achieving major energy and throughput gains. Across simulations and real datasets, MonoSparse-CAM demonstrates energy reductions up to $28.56\times$ over raw processing and $18.51\times$ over prior techniques, with peak efficiency reaching $418$ GOPS/W at high sparsity. The work substantiates the practicality of TBML hardware acceleration on CAM platforms and paves the way for scalable, energy-efficient AI on tabular-data tasks.

Abstract

While the tree-based machine learning (TBML) models exhibit superior performance compared to neural networks on tabular data and hold promise for energy-efficient acceleration using aCAM arrays, their ideal deployment on hardware with explicit exploitation of TBML structure and aCAM circuitry remains a challenging task. In this work, we present MonoSparse-CAM, a new CAM-based optimization technique that exploits TBML sparsity and monotonicity in CAM circuitry to further advance processing performance. Our results indicate that MonoSparse-CAM reduces energy consumption by upto to 28.56x compared to raw processing and by 18.51x compared to state-of-the-art techniques, while improving the efficiency of computation by at least 1.68x.

MonoSparse-CAM: Efficient Tree Model Processing via Monotonicity and Sparsity in CAMs

TL;DR

The paper tackles the challenge of energy-efficient hardware acceleration for tree-based models on analog CAM arrays. It introduces MonoSparse-CAM, a software-hardware co-design that exploits both sparsity in TBML and the monotonicity of CAM matchlines to skip nonessential computations, achieving major energy and throughput gains. Across simulations and real datasets, MonoSparse-CAM demonstrates energy reductions up to over raw processing and over prior techniques, with peak efficiency reaching GOPS/W at high sparsity. The work substantiates the practicality of TBML hardware acceleration on CAM platforms and paves the way for scalable, energy-efficient AI on tabular-data tasks.

Abstract

While the tree-based machine learning (TBML) models exhibit superior performance compared to neural networks on tabular data and hold promise for energy-efficient acceleration using aCAM arrays, their ideal deployment on hardware with explicit exploitation of TBML structure and aCAM circuitry remains a challenging task. In this work, we present MonoSparse-CAM, a new CAM-based optimization technique that exploits TBML sparsity and monotonicity in CAM circuitry to further advance processing performance. Our results indicate that MonoSparse-CAM reduces energy consumption by upto to 28.56x compared to raw processing and by 18.51x compared to state-of-the-art techniques, while improving the efficiency of computation by at least 1.68x.
Paper Structure (12 sections, 1 equation, 4 figures)

This paper contains 12 sections, 1 equation, 4 figures.

Figures (4)

  • Figure 1: (a) A 4-feature XGBoost decision tree trained on the Iris Dataset iris_dataset. (b) As shown in Section \ref{['3.2']}, experiments indicate that as binary trees become more balanced, they also become sparser. Further details are provided in Section \ref{['3.2']}.
  • Figure 2: (a) Illustration of the proposed technique: matched cells (green) continue processing, while mismatched cells (red) trigger early stops to save energy. Gray cells are skipped due to monotonicity. (b) The 6T2M analog CAM cell design, highlighting key components for matching operations, including transistors (T1-T6) and memristors (M1, M2).
  • Figure 3: (a) A 160$\times$160 decision tree grid with features and outcomes, simulated using a Gaussian distribution ($\lambda$ = 0.6, $\mu$ = 0.0). Yellow cells are inactive, blue cells are active. (b) The grid after Feature Reordering. (c) After applying MonoSparse-CAM, black cells show skipped computations, reducing energy use.
  • Figure 4: Comparison of (a) total energy consumption, (b) delay, and (c) computational efficiency between different optimization techniques when processing a 240$\times$320 array with a 24$\times$48 CAM. (d) Energy consumption comparison for processing a 240x320 array with a 24$\times$48 CAM at $\lambda$=0.7 across different tech nodes: typical (tt), slow NMOS and PMOS (ss), and fast NMOS and PMOS (ff). (e) Energy consumption projection: dotted lines show the predicted curve based on four actual data points connected by solid lines, with all fitted curves having an ${R}^2$ value above 97%. (f) Cost of accelerating decision-trees trained on datasets.