Non-asymptotic Properties of Generalized Mondrian Forests in Statistical Learning

Haoran Zhan; Jingli Wang; Yingcun Xia

Non-asymptotic Properties of Generalized Mondrian Forests in Statistical Learning

Haoran Zhan, Jingli Wang, Yingcun Xia

TL;DR

This work presents a general Mondrian Forest framework that extends PRFs to a broad set of learning tasks, including regression, quantile regression, density estimation, and classification, with a focus on non-asymptotic guarantees.The estimator hat{h}_n is built by aggregating B trees grown from Mondrian partitions with stopping time λ, where leaf values minimize a chosen convex loss on the associated cell, enabling a global approximation of m(x) over [0,1]^d.A key contribution is the regret/risk bound comprising a generalization term, an approximation term, and a tail term, plus a model-selection scheme Pen(λ) that yields adaptive rates through λ_n, β_n, and loss-specific constants.The results cover a variety of concrete tasks via detailed examples (least squares, generalized regression, Huber loss, quantile regression, binary classification, and density estimation) and establish consistency under mild assumptions, offering a practical, theoretically grounded tool for diverse statistical learning problems.

Abstract

Random Forests have been extensively used in regression and classification, inspiring the development of various forest-based methods. Among these, Mondrian Forests, derived from the Mondrian process, mark a significant advancement. Expanding on Mondrian Forests, this paper presents a general framework for statistical learning, encompassing a range of common learning tasks such as least squares regression, $\ell_1$ regression, quantile regression, and classification. Under mild assumptions on the loss functions, we provide upper bounds on the regret/risk functions for the estimators and demonstrate their statistical consistency.

Non-asymptotic Properties of Generalized Mondrian Forests in Statistical Learning

TL;DR

Abstract

regression, quantile regression, and classification. Under mild assumptions on the loss functions, we provide upper bounds on the regret/risk functions for the estimators and demonstrate their statistical consistency.

Paper Structure (16 sections, 14 theorems, 183 equations, 1 figure, 2 algorithms)

This paper contains 16 sections, 14 theorems, 183 equations, 1 figure, 2 algorithms.

Introduction
Background and Preliminaries
Task in Statistical Learning
Mondrian partitions
Methodology
Main results
Model selection: the choice of $\lambda_n$
Examples
Least squares regression
Generalized regression
Huber's loss
Quantile regression
Binary classification
Nonparametric density estimation
Conclusion
...and 1 more sections

Key Result

Theorem 2

Suppose that the loss function $\ell(\cdot,\cdot)$ satisfies Assumption assump1-assump3 and that the distribution of $Y$ satisfies Assumption assump_distribution. For any $h\in \mathcal{H}^{p,\beta}([0,1]^d,C)$ with $0<p\le 1$, we have where $c_1,c_2>0$ are some universal constants.

Figures (1)

Figure 1: An example of a Mondrian partition (left) with the corresponding tree structure (right). This shows how the tree grows over time. There are four partitioning times in this demo, $1,2,3,4$, which are marked by bullets ($\bullet$) and the stopping time is $\lambda=4$.

Theorems & Definitions (23)

Definition 1: $(p,C)$-smooth class
Theorem 2: Regret function bound of Mondrian forests
Remark 3
Remark 4
Remark 5
Corollary 6: Consistency rate of Mondrian forests
Corollary 7: Consistency of Mondrian forests
Theorem 8
Lemma 9
Proposition 10
...and 13 more

Non-asymptotic Properties of Generalized Mondrian Forests in Statistical Learning

TL;DR

Abstract

Non-asymptotic Properties of Generalized Mondrian Forests in Statistical Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (23)