Robust Regression with Ensembles Communicating over Noisy Channels

Yuval Ben-Hur; Yuval Cassuto

Robust Regression with Ensembles Communicating over Noisy Channels

Yuval Ben-Hur, Yuval Cassuto

TL;DR

This work studies robust regression for ensembles communicating over additive channel noise, formulating a distributed regression model with outputs $\tilde{\varphi}_t=\varphi_t+n_t$ and an aggregation $\tilde{f}=\boldsymbol{\alpha}^{\top}\tilde{\boldsymbol{\varphi}}$. It develops noise-aware aggregation strategies for bagging and gradient boosting that optimize general loss functions, including MSE and MAE, by trading off model error against aggregated channel noise via $\tilde{J}_{2}^{(\lambda)}(\boldsymbol{\alpha})=J_2(\boldsymbol{\alpha})+\lambda\boldsymbol{\alpha}^{\top}\boldsymbol{\Sigma}\boldsymbol{\alpha}$ and related formulations. For bagging, it yields a closed-form (or easily computed) optimal $\boldsymbol{\alpha}$ and, in the constrained form, uses a Neumann-series-based approximation to determine the noise-budget parameter; for MAE it provides a gradient-based optimizer with analytical gradients and develops performance bounds. For gradient boosting, it introduces a noise-informed training procedure that assigns coefficients to minimize the expected noisy loss, with closed-form MSE updates. Empirical results on synthetic and real-world datasets show substantial robustness gains, including large MSE reductions and stable performance as the ensemble grows, demonstrating the practicality of robust distributed regression under communication noise.

Abstract

As machine-learning models grow in size, their implementation requirements cannot be met by a single computer system. This observation motivates distributed settings, in which intermediate computations are performed across a network of processing units, while the central node only aggregates their outputs. However, distributing inference tasks across low-precision or faulty edge devices, operating over a network of noisy communication channels, gives rise to serious reliability challenges. We study the problem of an ensemble of devices, implementing regression algorithms, that communicate through additive noisy channels in order to collaboratively perform a joint regression task. We define the problem formally, and develop methods for optimizing the aggregation coefficients for the parameters of the noise in the channels, which can potentially be correlated. Our results apply to the leading state-of-the-art ensemble regression methods: bagging and gradient boosting. We demonstrate the effectiveness of our algorithms on both synthetic and real-world datasets.

Robust Regression with Ensembles Communicating over Noisy Channels

TL;DR

This work studies robust regression for ensembles communicating over additive channel noise, formulating a distributed regression model with outputs

and an aggregation

. It develops noise-aware aggregation strategies for bagging and gradient boosting that optimize general loss functions, including MSE and MAE, by trading off model error against aggregated channel noise via

and related formulations. For bagging, it yields a closed-form (or easily computed) optimal

and, in the constrained form, uses a Neumann-series-based approximation to determine the noise-budget parameter; for MAE it provides a gradient-based optimizer with analytical gradients and develops performance bounds. For gradient boosting, it introduces a noise-informed training procedure that assigns coefficients to minimize the expected noisy loss, with closed-form MSE updates. Empirical results on synthetic and real-world datasets show substantial robustness gains, including large MSE reductions and stable performance as the ensemble grows, demonstrating the practicality of robust distributed regression under communication noise.

Abstract

Paper Structure (14 sections, 10 theorems, 52 equations, 9 figures, 2 algorithms)

This paper contains 14 sections, 10 theorems, 52 equations, 9 figures, 2 algorithms.

Introduction
Distributed noisy regression
Model formulation
Motivation for robust prediction
Robust Bagging
Robust bagging for MSE loss
Robust bagging for MAE loss
Performance bounds for robust MAE
Robust Gradient Boosting
Experimental results
Robust MSE-optimal bagging ensembles
Robust MAE-optimized bagging ensembles
Robust training using gradient boosting
Conclusion

Key Result

Theorem 1

For any coefficient vector $\boldsymbol{\alpha}$ and $p\in\mathbb{N}$, where $\boldsymbol{n}$ is the noise random vector and $J_{\ell_p}(\boldsymbol{\alpha})$ is the $\ell_p$ loss of the noiseless predictor $\boldsymbol{\alpha}^\top \boldsymbol{\varphi}(\cdot)$.

Figures (9)

Figure 1: Block diagram of a noisy ensemble regression system.
Figure 2: Illustration of noisy and noiseless prediction for a Sine target function, with the corresponding values of its (root) MSE and MAE. Note that $x$ was centralized to $0$.
Figure 3: MSE reduction of TEM bagging ensembles relative to the prior method GEM, as a function of $\mathrm{SNR}$.
Figure 4: Histogram of TEM (MSE-optimal) aggregation coefficients for the sinusoidal sum dataset with equi-variance noise (left) and $m=2$ noisier-subset profile (right). The colors/patterns depict different folds of the training data.
Figure 5: TEM (MSE-optimal) aggregation coefficients (right) calculated with heterogeneous base-regressor model errors (left) for three equi-variance noise settings: noiseless (blue '/'), weak noise (orange '\\'), and high noise (green '|').
...and 4 more figures

Theorems & Definitions (23)

Definition 1: regression ensemble
Definition 2: noisy ensemble prediction
Definition 3
Theorem 1
proof
Proposition 2
proof
Definition 4
Theorem 3
proof
...and 13 more

Robust Regression with Ensembles Communicating over Noisy Channels

TL;DR

Abstract

Robust Regression with Ensembles Communicating over Noisy Channels

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (23)