Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting
Hongyi Li, Han Lin, Jun Xu
TL;DR
The paper tackles learning oblique splits in regression trees, a problem with NP-hard optimal solutions. It introduces the Hinge Regression Tree (HRT), which reframes each node split as a nonlinear least-squares problem over two linear predictors connected by a hinge, yielding ReLU-like expressivity through a damped Newton (Gauss-Newton) optimization with alternating partitions. The authors establish monotone descent and convergence at the node level under backtracking line search, prove a universal $O(\delta^2)$ approximation rate for the resulting piecewise-linear class, and demonstrate that HRT achieves competitive or superior accuracy with markedly shallower trees on synthetic and real-world regression tasks. The approach offers robustness via ridge regularization and demonstrates practical scalability, making oblique single-tree models both effective and interpretable for nonlinear function approximation.
Abstract
Oblique decision trees combine the transparency of trees with the power of multivariate decision boundaries, but learning high-quality oblique splits is NP-hard, and practical methods still rely on slow search or theory-free heuristics. We present the Hinge Regression Tree (HRT), which reframes each split as a non-linear least-squares problem over two linear predictors whose max/min envelope induces ReLU-like expressive power. The resulting alternating fitting procedure is exactly equivalent to a damped Newton (Gauss-Newton) method within fixed partitions. We analyze this node-level optimization and, for a backtracking line-search variant, prove that the local objective decreases monotonically and converges; in practice, both fixed and adaptive damping yield fast, stable convergence and can be combined with optional ridge regularization. We further prove that HRT's model class is a universal approximator with an explicit $O(δ^2)$ approximation rate, and show on synthetic and real-world benchmarks that it matches or outperforms single-tree baselines with more compact structures.
