Non-asymptotic Properties of Generalized Mondrian Forests in Statistical Learning
Haoran Zhan, Jingli Wang, Yingcun Xia
TL;DR
This work presents a general Mondrian Forest framework that extends PRFs to a broad set of learning tasks, including regression, quantile regression, density estimation, and classification, with a focus on non-asymptotic guarantees.The estimator hat{h}_n is built by aggregating B trees grown from Mondrian partitions with stopping time λ, where leaf values minimize a chosen convex loss on the associated cell, enabling a global approximation of m(x) over [0,1]^d.A key contribution is the regret/risk bound comprising a generalization term, an approximation term, and a tail term, plus a model-selection scheme Pen(λ) that yields adaptive rates through λ_n, β_n, and loss-specific constants.The results cover a variety of concrete tasks via detailed examples (least squares, generalized regression, Huber loss, quantile regression, binary classification, and density estimation) and establish consistency under mild assumptions, offering a practical, theoretically grounded tool for diverse statistical learning problems.
Abstract
Random Forests have been extensively used in regression and classification, inspiring the development of various forest-based methods. Among these, Mondrian Forests, derived from the Mondrian process, mark a significant advancement. Expanding on Mondrian Forests, this paper presents a general framework for statistical learning, encompassing a range of common learning tasks such as least squares regression, $\ell_1$ regression, quantile regression, and classification. Under mild assumptions on the loss functions, we provide upper bounds on the regret/risk functions for the estimators and demonstrate their statistical consistency.
