DynFrs: An Efficient Framework for Machine Unlearning in Random Forest
Shurong Wang, Zhuoyang Shen, Xinbao Qiao, Tongning Zhang, Meng Zhang
TL;DR
DynFrs introduces an efficient machine unlearning framework for Random Forests by coupling Occ(q) cross-tree subsampling with a lazy tag mechanism (Lzy) and using Extremely Randomized Trees (ERT) as the base learner. The approach achieves exact unlearning with favorable theoretical time bounds while delivering substantial practical speedups, especially in online and batch scenarios, and often preserves or improves predictive accuracy. Empirical results across nine binary datasets and a Higgs-scale online stream show orders-of-magnitude improvements over naïve retraining and competitive performance versus prior RF unlearning methods. The framework enables real-time, continual learning and unlearning in privacy-sensitive settings, with open-source reproducibility resources provided. The combination of subsampling, lazy updates, and robust ERTs enables fast, scalable unlearning suitable for dynamic data environments.
Abstract
Random Forests are widely recognized for establishing efficacy in classification and regression tasks, standing out in various domains such as medical diagnosis, finance, and personalized recommendations. These domains, however, are inherently sensitive to privacy concerns, as personal and confidential data are involved. With increasing demand for the right to be forgotten, particularly under regulations such as GDPR and CCPA, the ability to perform machine unlearning has become crucial for Random Forests. However, insufficient attention was paid to this topic, and existing approaches face difficulties in being applied to real-world scenarios. Addressing this gap, we propose the DynFrs framework designed to enable efficient machine unlearning in Random Forests while preserving predictive accuracy. Dynfrs leverages subsampling method Occ(q) and a lazy tag strategy Lzy, and is still adaptable to any Random Forest variant. In essence, Occ(q) ensures that each sample in the training set occurs only in a proportion of trees so that the impact of deleting samples is limited, and Lzy delays the reconstruction of a tree node until necessary, thereby avoiding unnecessary modifications on tree structures. In experiments, applying Dynfrs on Extremely Randomized Trees yields substantial improvements, achieving orders of magnitude faster unlearning performance and better predictive accuracy than existing machine unlearning methods for Random Forests.
