Table of Contents
Fetching ...

Reweighting Improves Conditional Risk Bounds

Yikai Zhang, Jiahe Lin, Fengpei Li, Songzhu Zheng, Anant Raj, Anderson Schneider, Yuriy Nevmyvaka

TL;DR

It is shown that under a general ``balanceable"Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions over the one obtained from standard ERM, and the superiority manifests itself through a data-dependent constant term in the error bound.

Abstract

In this work, we study the weighted empirical risk minimization (weighted ERM) schema, in which an additional data-dependent weight function is incorporated when the empirical risk function is being minimized. We show that under a general ``balanceable" Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions over the one obtained from standard ERM, and the superiority manifests itself through a data-dependent constant term in the error bound. These sub-regions correspond to large-margin ones in classification settings and low-variance ones in heteroscedastic regression settings, respectively. Our findings are supported by evidence from synthetic data experiments.

Reweighting Improves Conditional Risk Bounds

TL;DR

It is shown that under a general ``balanceable"Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions over the one obtained from standard ERM, and the superiority manifests itself through a data-dependent constant term in the error bound.

Abstract

In this work, we study the weighted empirical risk minimization (weighted ERM) schema, in which an additional data-dependent weight function is incorporated when the empirical risk function is being minimized. We show that under a general ``balanceable" Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions over the one obtained from standard ERM, and the superiority manifests itself through a data-dependent constant term in the error bound. These sub-regions correspond to large-margin ones in classification settings and low-variance ones in heteroscedastic regression settings, respectively. Our findings are supported by evidence from synthetic data experiments.
Paper Structure (32 sections, 12 theorems, 124 equations, 2 figures, 1 table)

This paper contains 32 sections, 12 theorems, 124 equations, 2 figures, 1 table.

Key Result

Theorem 4.1

Suppose that we have $\widehat{\omega}(\cdot) \in {\mathcal{W}}$ s.t. $\mathbb{E}_{{\boldsymbol x}}[(\widehat{\omega}({\boldsymbol x}) - \omega^*({\boldsymbol x}))^2] \leq \varepsilon$ is satisfied. Let $S_n = \{(\boldsymbol x_i,y_i)\}_{i=1}^{n}$ be i.i.d. samples drawn according to the DGP describe provided that the sample size $n$ satisfies $n \gtrsim \frac{ d_{VC}({\mathcal{F}}) \log(\frac{1}{\

Figures (2)

  • Figure 1: Regression setting: underlying true data, estimates from ERM and weighted ERM, and the selective risk
  • Figure 2: Classification setting: underlying true data, estimates from ERM and weighted ERM and the selective risk

Theorems & Definitions (30)

  • Definition 1: Empirical risk and the ERM estimator
  • Definition 2: Weighted empirical risk and the weighted ERM estimator
  • Theorem 4.1: Risk Bound for the case of Classification
  • Theorem 4.2
  • Remark 1: On the bounds established
  • Theorem 4.3: Risk bound for estimating $\omega^*$
  • Remark 2
  • Corollary 1
  • Theorem 4.4
  • Theorem 4.5
  • ...and 20 more