Active Learning for Regression based on Wasserstein distance and GroupSort Neural Networks

Benjamin Bobbia; Matthias Picard

Active Learning for Regression based on Wasserstein distance and GroupSort Neural Networks

Benjamin Bobbia, Matthias Picard

TL;DR

The study empirically shows the pertinence of such a representativity-uncertainty approach, which provides good estimation all along the query procedure and often achieves more precise estimations and tends to improve accuracy faster than other models.

Abstract

This paper addresses a new active learning strategy for regression problems. The presented Wasserstein active regression model is based on the principles of distribution-matching to measure the representativeness of the labeled dataset. The Wasserstein distance is computed using GroupSort Neural Networks. The use of such networks provides theoretical foundations giving a way to quantify errors with explicit bounds for their size and depth. This solution is combined with another uncertainty-based approach that is more outlier-tolerant to complete the query strategy. Finally, this method is compared with other classical and recent solutions. The study empirically shows the pertinence of such a representativity-uncertainty approach, which provides good estimation all along the query procedure. Moreover, the Wasserstein active regression often achieves more precise estimations and tends to improve accuracy faster than other models.

Active Learning for Regression based on Wasserstein distance and GroupSort Neural Networks

TL;DR

Abstract

Paper Structure (20 sections, 5 theorems, 33 equations, 5 figures, 2 tables)

This paper contains 20 sections, 5 theorems, 33 equations, 5 figures, 2 tables.

Introduction
Active Learning framework and notations
The Wasserstein distance
Wasserstein Active Regression
Theoretical foundations
Training the estimator
Minimizing the Wasserstein distance
Minimizing the predictions uncertainty and query procedure
Group Sort Neural Networks
Presentation
Network Architecture
Numerical experiments
Models and datasets used
Implementation
Results
...and 5 more sections

Key Result

Proposition 1

Let $P$ and $Q$ two probability measures on the same space $\mathcal{X}$, and $c$ a cost function. The 1-order Wasserstein distance is defined as Where the supremum is taken over all $1-$Lipschitz functions with respect to the cost $c$, namely $|\varphi(x)-\varphi(y)|\leq c(x,y)$ for all $x,y\in \mathcal{X}$.

Figures (5)

Figure 1: Values of $\varphi$ on a labelled and an unlabelled distribution in $[0,1]$
Figure 2: Evolution of the mean of each model RMSE when increasing the trainset size for Concrete Slump data set
Figure 3: Model hyperparameters
Figure 4: Dataset details
Figure 5: Evolution of the mean of each model RMSE when increasing the trainset size

Theorems & Definitions (16)

Definition 1
Proposition 1: Kantorovich-Rubinstein duality V09
Theorem 1
proof
Remark 1
Corollary 1
proof
Remark 2
Remark 3
Remark 4
...and 6 more

Active Learning for Regression based on Wasserstein distance and GroupSort Neural Networks

TL;DR

Abstract

Active Learning for Regression based on Wasserstein distance and GroupSort Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (16)