Ranking Perspective for Tree-based Methods with Applications to Symbolic Feature Selection

Hengrui Luo; Meng Li

Ranking Perspective for Tree-based Methods with Applications to Symbolic Feature Selection

Hengrui Luo, Meng Li

TL;DR

This work provides a finite-sample analysis of tree-based methods from a ranking perspective, and proposes concordant divergence statistics $\mathcal{T}_0$ to evaluate symbolic feature mappings and establish their properties.

Abstract

Tree-based methods are powerful nonparametric techniques in statistics and machine learning. However, their effectiveness, particularly in finite-sample settings, is not fully understood. Recent applications have revealed their surprising ability to distinguish transformations (which we call symbolic feature selection) that remain obscure under current theoretical understanding. This work provides a finite-sample analysis of tree-based methods from a ranking perspective. We link oracle partitions in tree methods to response rankings at local splits, offering new insights into their finite-sample behavior in regression and feature selection tasks. Building on this local ranking perspective, we extend our analysis in two ways: (i) We examine the global ranking performance of individual trees and ensembles, including Classification and Regression Trees (CART) and Bayesian Additive Regression Trees (BART), providing finite-sample oracle bounds, ranking consistency, and posterior contraction results. (ii) Inspired by the ranking perspective, we propose concordant divergence statistics $\mathcal{T}_0$ to evaluate symbolic feature mappings and establish their properties. Numerical experiments demonstrate the competitive performance of these statistics in symbolic feature selection tasks compared to existing methods.

Ranking Perspective for Tree-based Methods with Applications to Symbolic Feature Selection

TL;DR

This work provides a finite-sample analysis of tree-based methods from a ranking perspective, and proposes concordant divergence statistics

to evaluate symbolic feature mappings and establish their properties.

Abstract

to evaluate symbolic feature mappings and establish their properties. Numerical experiments demonstrate the competitive performance of these statistics in symbolic feature selection tasks compared to existing methods.

Paper Structure (24 sections, 11 theorems, 56 equations, 6 figures, 3 tables)

This paper contains 24 sections, 11 theorems, 56 equations, 6 figures, 3 tables.

Introduction
Tree regressions
Tree methods for variable selections
Tree-based rankings
Oracle Partitions
Notation
Recursive partition in tree-based models
Local splits and principal decision ratio
Local Ranking at Local Splits
Optimal 2-partition
Piece-wise monotonic transforms
Global Rankings with Regressions
Concordant Divergence
Experiments
AUC for feature selection
...and 9 more sections

Key Result

Lemma 1

(Oracle 2-partition with fixed sizes) For a 2-partition of n elements $y_{(1)}<y_{(2)}<\cdots<y_{(n)}$ into components of size $i$ and $n-i$, we assume that $n>4,\min(n-i,i)\geq2$ to ensure variances are defined. Then the following partitions are the only 2-partitions of size $i$ and $n-i$ that minimize eq:loss.y.ranking.

Figures (6)

Figure 1: We illustrate 2-layer symbolic regression with $\mathcal{O}_{u}=\{id,x^{3}\}$ and $\mathcal{O}_{b}=\{+,\times\}$. We also follow the notation convention $\mathcal{O}_{A_{u}}^{(2)}$ and $\mathcal{O}_{A_{b}}^{(2)}$ for the architectures specified in ye2021operator. We displayed all of the possible features in a 2-step symbolic composition using tree structure, showing the rapidly increasing number $q$ of features, namely transformed symbolic feature $\bm{z}$'s.
Figure 2: A depth 2 tree with 5 observations showing two possible oracle partitions in Lemma \ref{['lem:LemmaA']}. In the first column, we present the raw $(x_{i},y_{i})$ pair of dataset; In the second column, we present the oracle partition using red and blue colors, and the support of indicator functions on the $x$-axis. The horizontal solid lines represent the group mean of $y$ values (as prediction value as well); the vertical dashed lines represent the point-to-mean distances. In the third column, we illustrate the loss function \ref{['eq:loss.y.ranking']} The minimum in row (a) is attained by $\{y_{(4)},y_{(5)}\}=\{y_{1},y_{5}\}$ and $\{y_{(1)},y_{(2)},y_{(3)}\}=\{y_{2},y_{3},y_{4}\}$. The minimum in orw (b) is attained by $\{y_{(3)},y_{(4)},y_{(5)}\}=\{y_{1},y_{2},y_{5}\}$ and $\{y_{(1)},y_{(2)}\}=\{y_{3},y_{4}\}$. We color the dots by the actual loss function values, and annotate the ordered statistics near each dot.
Figure 3: Refined monotonic intervals $\mathcal{I}_{2}=\{[0,1/2],[1/2,1]\}$ for the $\theta_{1}(x)=x$, $\theta_{2}(x)=-4x^{2}+4x$ shown. We use vertical black dashed lines to illustrate the refined monotonic intervals, and count the number of pre-images for $\theta_{1},\theta_{2}$ over each refined intervals.
Figure 4: Correlation between $\bm{x}$ and $\bm{y} = \theta_i(\bm{x})$ for $i = 1, \ldots, 5$. The expression and figure for each $\theta_i$ are reported in the top two rows in the table. Left to Right (in the 3rd and 4th rows): Chatterjee correlation chatterjee2021new, absolute Pearson correlation, absolute Spearman correlation and absolute Kendall correlation, $\log(\mathcal{T}_{0})$. The $\mathcal{T}_{0}$ is shown on a log-scale for better comparison. We generate an equally spaced $\bm{x}$ on $[-1,1]$ with sample size $N=50$ (3rd row) and $N=500$ (4th row). Gaussian noises with variance $\sigma^2$ are added to $\theta_i(\bm{x})$.
Figure 5: We illustrate the PR curves from 50 repeats ($n=100$) of a 2-layer symbolic regression with $\mathcal{O}_{u}=\{id,x^{3}\}$ and $\mathcal{O}_{b}=\{+,\times\}$. The true signal is \ref{['eq:3_var_true_signal']} with no noise. The first row corresponds to the architecture of $\mathcal{O}_{A_{u}}^{(2)}$ and the second row corresponds to the architecture of $\mathcal{O}_{A_{b}}^{(2)}$. We provide the boxplot to show the AUC values amongst 50 repeats.
...and 1 more figures

Theorems & Definitions (26)

Example 1
Example 2
Lemma 1
Remark 2
Example 3
Remark 3
Corollary 4
Example 4
Example 5
Definition 5
...and 16 more

Ranking Perspective for Tree-based Methods with Applications to Symbolic Feature Selection

TL;DR

Abstract

Ranking Perspective for Tree-based Methods with Applications to Symbolic Feature Selection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (26)