Table of Contents
Fetching ...

Functional Frank-Wolfe Boosting for General Loss Functions

Chu Wang, Yingfei Wang, Weinan E, Robert Schapire

TL;DR

Boosting's risk of overfitting, especially in regression, motivates an $l_1$-constrained approach. The authors develop FWBoost, a functional Frank-Wolfe boosting algorithm for general loss functions, yielding an $l_1$-regularized framework that reduces to AdaBoost under exponential loss and has an $O(1/t)$ convergence rate. They establish Rademacher-based generalization bounds independent of boosting iterations and provide an away-steps variant for sparsity. Empirical results on UCI datasets show FWBoost maintains test performance with increasing rounds and can outperform regularized gradient boosting, validating both theory and practicality. The work offers a scalable, principled boosting paradigm with theoretical guarantees for broad loss landscapes.

Abstract

Boosting is a generic learning method for classification and regression. Yet, as the number of base hypotheses becomes larger, boosting can lead to a deterioration of test performance. Overfitting is an important and ubiquitous phenomenon, especially in regression settings. To avoid overfitting, we consider using $l_1$ regularization. We propose a novel Frank-Wolfe type boosting algorithm (FWBoost) applied to general loss functions. By using exponential loss, the FWBoost algorithm can be rewritten as a variant of AdaBoost for binary classification. FWBoost algorithms have exactly the same form as existing boosting methods, in terms of making calls to a base learning algorithm with different weights update. This direct connection between boosting and Frank-Wolfe yields a new algorithm that is as practical as existing boosting methods but with new guarantees and rates of convergence. Experimental results show that the test performance of FWBoost is not degraded with larger rounds in boosting, which is consistent with the theoretical analysis.

Functional Frank-Wolfe Boosting for General Loss Functions

TL;DR

Boosting's risk of overfitting, especially in regression, motivates an -constrained approach. The authors develop FWBoost, a functional Frank-Wolfe boosting algorithm for general loss functions, yielding an -regularized framework that reduces to AdaBoost under exponential loss and has an convergence rate. They establish Rademacher-based generalization bounds independent of boosting iterations and provide an away-steps variant for sparsity. Empirical results on UCI datasets show FWBoost maintains test performance with increasing rounds and can outperform regularized gradient boosting, validating both theory and practicality. The work offers a scalable, principled boosting paradigm with theoretical guarantees for broad loss landscapes.

Abstract

Boosting is a generic learning method for classification and regression. Yet, as the number of base hypotheses becomes larger, boosting can lead to a deterioration of test performance. Overfitting is an important and ubiquitous phenomenon, especially in regression settings. To avoid overfitting, we consider using regularization. We propose a novel Frank-Wolfe type boosting algorithm (FWBoost) applied to general loss functions. By using exponential loss, the FWBoost algorithm can be rewritten as a variant of AdaBoost for binary classification. FWBoost algorithms have exactly the same form as existing boosting methods, in terms of making calls to a base learning algorithm with different weights update. This direct connection between boosting and Frank-Wolfe yields a new algorithm that is as practical as existing boosting methods but with new guarantees and rates of convergence. Experimental results show that the test performance of FWBoost is not degraded with larger rounds in boosting, which is consistent with the theoretical analysis.

Paper Structure

This paper contains 15 sections, 4 theorems, 13 equations, 3 figures, 6 algorithms.

Key Result

Theorem 1

Let $\mathcal{F}$ be a set of real-valued functions. Assume the loss function $l$ is $L_l$-Lipschitz continuous with respect to its first argument and that $l(y, y') \le M$, $\forall y, y' \in \mathcal{Y}$. For any $\delta > 0$ and with probability at least $1- \delta$ over a sample $S$ of size $m$,

Figures (3)

  • Figure 1: Comparison of boosting methods on UCI datasets. The x-axis is the number of boosting iterations. The first row is the averaged empirical risk and the second row is the MSE on test set.
  • Figure 2: Comparison of boosting methods on UCI datasets. The x-axis is the number of boosting iterations. The first (second) row is the averaged training (test) error over 20 runs.
  • Figure : Frank-Wolfe

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4