Table of Contents
Fetching ...

Additive Model Boosting: New Insights and Path(ologie)s

Rickmer Schulte, David Rügamer

TL;DR

This work investigates Additive Model Boosting (BAMs), addressing the theoretical gaps in understanding their convergence and implicit regularization. It develops exact parameter-path results for $L_2$-Boosting variants, connects greedy and block-wise BAMs to generalized coordinate descent with GSQ updates, and proves linear convergence under $\mu$-PL and $L$-smoothness, with specific results for regression splines, CSS, and exponential-family losses. The analysis reveals pathologies such as convergence toward unpenalized fits for penalized base learners and potential non-convergence in certain exponential-family settings, guiding practical choices of step size and penalties. Empirical experiments validate the theory and illustrate implications for model selection, penalty design, and potential avenues for inference based on boosting paths.

Abstract

Additive models (AMs) have sparked a lot of interest in machine learning recently, allowing the incorporation of interpretable structures into a wide range of model classes. Many commonly used approaches to fit a wide variety of potentially complex additive models build on the idea of boosting additive models. While boosted additive models (BAMs) work well in practice, certain theoretical aspects are still poorly understood, including general convergence behavior and what optimization problem is being solved when accounting for the implicit regularizing nature of boosting. In this work, we study the solution paths of BAMs and establish connections with other approaches for certain classes of problems. Along these lines, we derive novel convergence results for BAMs, which yield crucial insights into the inner workings of the method. While our results generally provide reassuring theoretical evidence for the practical use of BAMs, they also uncover some ``pathologies'' of boosting for certain additive model classes concerning their convergence behavior that require caution in practice. We empirically validate our theoretical findings through several numerical experiments.

Additive Model Boosting: New Insights and Path(ologie)s

TL;DR

This work investigates Additive Model Boosting (BAMs), addressing the theoretical gaps in understanding their convergence and implicit regularization. It develops exact parameter-path results for -Boosting variants, connects greedy and block-wise BAMs to generalized coordinate descent with GSQ updates, and proves linear convergence under -PL and -smoothness, with specific results for regression splines, CSS, and exponential-family losses. The analysis reveals pathologies such as convergence toward unpenalized fits for penalized base learners and potential non-convergence in certain exponential-family settings, guiding practical choices of step size and penalties. Empirical experiments validate the theory and illustrate implications for model selection, penalty design, and potential avenues for inference based on boosting paths.

Abstract

Additive models (AMs) have sparked a lot of interest in machine learning recently, allowing the incorporation of interpretable structures into a wide range of model classes. Many commonly used approaches to fit a wide variety of potentially complex additive models build on the idea of boosting additive models. While boosted additive models (BAMs) work well in practice, certain theoretical aspects are still poorly understood, including general convergence behavior and what optimization problem is being solved when accounting for the implicit regularizing nature of boosting. In this work, we study the solution paths of BAMs and establish connections with other approaches for certain classes of problems. Along these lines, we derive novel convergence results for BAMs, which yield crucial insights into the inner workings of the method. While our results generally provide reassuring theoretical evidence for the practical use of BAMs, they also uncover some ``pathologies'' of boosting for certain additive model classes concerning their convergence behavior that require caution in practice. We empirically validate our theoretical findings through several numerical experiments.

Paper Structure

This paper contains 68 sections, 10 theorems, 100 equations, 16 figures, 1 algorithm.

Key Result

Proposition 1

The estimates of $L_2$-Boosting with quadratic penalty and joint updates in iteration $k$ are given by with step size $\nu \in (0,1]$, $\lambda>0$, $P$ a symmetric penalty matrix, and the penalized least squares solution $\beta^{PLS}:=(X^{\top}X + \lambda P)^{-1}X^{\top}y$. If $X$ has full column rank, otherwise parameters converge to the min-norm solution $\beta^{[k]} \overset{k \to \infty}{\lo

Figures (16)

  • Figure 1: Left: Paths of boosted linear models with ridge penalty (ridge boosting) for different penalty parameters $\lambda$ (colored lines according to the legend) together with the ridge regression path (blue). Path of linear model boosting is the limiting case of ridge boosting with $\lambda=0$ (black). Block contour lines represent the loss surface. Right: Same plot but zoomed in.
  • Figure 2: Estimated logarithmic Covid-19 prevalence in San Francisco (SF) via BAMs. Left: Analytical B-spline (purple) and P-spline solution (blue). Right: P-spline boosting iterates converge to the unpenalized (B-spline) solution.
  • Figure 3: Smallest non-zero eigenvalue of $Q$ (left) and convergence rate $\gamma$ as given in \ref{['conv_bgcd_quad']} (right) for component-wise boosting for a linear model (hence a quadratic problem) with varying pairwise correlation $\rho$ between predictor variables (color) for fixed $n=200$, $\nu=1$, and varying $p$ (x-axis).
  • Figure 4: Mean-centered spatial effect of Covid-19 prevalence in the United States obtained with BAMs. From left to right: Penalized least squares (PLS); early-stopped BAM; BAM with a large number of iterations; unpenalized least squares fit (OLS) to which boosting is converging to.
  • Figure 5: Loss path for Poisson (top) and Binomial (bottom) BAMs with different learning rates (colors) showing potential convergence issues for Poisson BAMs.
  • ...and 11 more figures

Theorems & Definitions (23)

  • Proposition 1
  • Remark 1
  • Theorem 1
  • Remark 2
  • Corollary 1
  • Proposition 2
  • Remark 4
  • Theorem 2
  • Remark 5
  • Corollary 2
  • ...and 13 more