RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable Memory
Yi-Chun Liao, Chieh-Lin Tsai, Yuan-Hao Chang, Camélia Slimani, Jalil Boukhobza, Tei-Wei Kuo
TL;DR
RETENTION tackles memory inefficiency in CAM-based acceleration of tree ensembles by introducing purity-threshold pruning for bagging models and two data-placement strategies, ODR and SPC, to substantially cut CAM capacity. The framework is validated on Random Forest and XGBoost across multiple UCI datasets, achieving up to around 200x capacity reduction with less than 3% accuracy loss when fully deployed. The results demonstrate that careful pruning plus intelligent path clustering and ordering can make resource-constrained CAM-based inference for ensemble methods feasible, with generalizability to ACAM. Overall, RETENTION offers a practical pathway toward resource-efficient, in-memory acceleration of structured-data models.
Abstract
Although deep learning has demonstrated remarkable capability in learning from unstructured data, modern tree-based ensemble models remain superior in extracting relevant information and learning from structured datasets. While several efforts have been made to accelerate tree-based models, the inherent characteristics of the models pose significant challenges for conventional accelerators. Recent research leveraging content-addressable memory (CAM) offers a promising solution for accelerating tree-based models, yet existing designs suffer from excessive memory consumption and low utilization. This work addresses these challenges by introducing RETENTION, an end-to-end framework that significantly reduces CAM capacity requirement for tree-based model inference. We propose an iterative pruning algorithm with a novel pruning criterion tailored for bagging-based models (e.g., Random Forest), which minimizes model complexity while ensuring controlled accuracy degradation. Additionally, we present a tree mapping scheme that incorporates two innovative data placement strategies to alleviate the memory redundancy caused by the widespread use of don't care states in CAM. Experimental results show that implementing the tree mapping scheme alone reduces CAM capacity requirement by $1.46\times$ to $21.30 \times$, while the full RETENTION framework achieves $4.35\times$ to $207.12\times$ reduction with less than 3\% accuracy loss. These results demonstrate that RETENTION is highly effective in minimizing CAM resource demand, providing a resource-efficient direction for tree-based model acceleration.
