Approximate Algorithms For $k$-Sparse Wasserstein Barycenter With Outliers

Qingyuan Yang; Hu Ding

Approximate Algorithms For $k$-Sparse Wasserstein Barycenter With Outliers

Qingyuan Yang, Hu Ding

TL;DR

This work tackles the computation of a $k$-sparse Wasserstein barycenter in the presence of outliers, a problem that blends optimal transport with robust clustering. The authors connect $k$-sparse WB with outliers to $k$-means clustering and develop two main approaches: a clustering-based LP method that yields constant-factor guarantees and a low-dimensional coreset technique that achieves a $(1+ ext{ε})$-approximation when the ambient dimension is manageable. They provide rigorous approximation bounds, including a $(2+ ext{√α})^2$-type guarantee when using an $ ext{α}$-approximate clustering method, and extend the results to doubling metrics via anchor-based coresets. Empirical results on synthetic and real data, including MNIST, demonstrate practical effectiveness and robustness to outliers. Overall, the paper advances robust WB with provable guarantees and scalable techniques, enabling practical use in noisy settings.

Abstract

Wasserstein Barycenter (WB) is one of the most fundamental optimization problems in optimal transportation. Given a set of distributions, the goal of WB is to find a new distribution that minimizes the average Wasserstein distance to them. The problem becomes even harder if we restrict the solution to be ``$k$-sparse''. In this paper, we study the $k$-sparse WB problem in the presence of outliers, which is a more practical setting since real-world data often contains noise. Existing WB algorithms cannot be directly extended to handle the case with outliers, and thus it is urgently needed to develop some novel ideas. First, we investigate the relation between $k$-sparse WB with outliers and the clustering (with outliers) problems. In particular, we propose a clustering based LP method that yields constant approximation factor for the $k$-sparse WB with outliers problem. Further, we utilize the coreset technique to achieve the $(1+ε)$-approximation factor for any $ε>0$, if the dimensionality is not high. Finally, we conduct the experiments for our proposed algorithms and illustrate their efficiencies in practice.

Approximate Algorithms For $k$-Sparse Wasserstein Barycenter With Outliers

TL;DR

This work tackles the computation of a

-sparse Wasserstein barycenter in the presence of outliers, a problem that blends optimal transport with robust clustering. The authors connect

-sparse WB with outliers to

-means clustering and develop two main approaches: a clustering-based LP method that yields constant-factor guarantees and a low-dimensional coreset technique that achieves a

-approximation when the ambient dimension is manageable. They provide rigorous approximation bounds, including a

-type guarantee when using an

-approximate clustering method, and extend the results to doubling metrics via anchor-based coresets. Empirical results on synthetic and real data, including MNIST, demonstrate practical effectiveness and robustness to outliers. Overall, the paper advances robust WB with provable guarantees and scalable techniques, enabling practical use in noisy settings.

Abstract

-sparse''. In this paper, we study the

-sparse WB problem in the presence of outliers, which is a more practical setting since real-world data often contains noise. Existing WB algorithms cannot be directly extended to handle the case with outliers, and thus it is urgently needed to develop some novel ideas. First, we investigate the relation between

-sparse WB with outliers and the clustering (with outliers) problems. In particular, we propose a clustering based LP method that yields constant approximation factor for the

-sparse WB with outliers problem. Further, we utilize the coreset technique to achieve the

-approximation factor for any

, if the dimensionality is not high. Finally, we conduct the experiments for our proposed algorithms and illustrate their efficiencies in practice.

Paper Structure (25 sections, 17 theorems, 64 equations, 5 figures, 3 tables)

This paper contains 25 sections, 17 theorems, 64 equations, 5 figures, 3 tables.

Introduction
Related Works
Preliminaries
Our Clustering based LP Algorithm
The clustering based LP algorithm.
Improvement in Low-dimensional Space
Experiments
Conclusions
Omitted Proofs for Theorem \ref{['the-result1-2']}
The Proof of Lemma \ref{['lem-3']}
The Proof of Lemma \ref{['lem-4']}
Omitted Proofs for Theorem \ref{['the-result2']}
The Proof of Lemma \ref{['lem-5']}
The Proof of Lemma \ref{['lem-6']}
Extension in Doubling Metric
...and 10 more sections

Key Result

Theorem 1

Our clustering based LP Algorithm returns a solution $\tilde{T}_{j_0}$ for $k$-sparse WB with outliers and achieves the following quality guarantee:

Figures (5)

Figure 1: The obtained costs on real datasets.
Figure 2: $k$-sparse WB obtained by Our_$\mathcal{A}$ for $k=40$.
Figure 3: The obtained costs on real datasets.
Figure 4: $k$-sparse WB obtained by Our_$\mathcal{A}$ for $k=40$.
Figure 5: $k$-sparse WB obtained by Our_$\mathcal{B}$ for $k=40$.

Theorems & Definitions (47)

Remark 1
Definition 1: Wasserstein distance with $z$ outliers
Remark 2
Claim 1
Definition 2: $k$-sparse WB with $z$ outliers
Remark 3
Claim 2
Claim 3
Theorem 1
Lemma 1
...and 37 more

Approximate Algorithms For $k$-Sparse Wasserstein Barycenter With Outliers

TL;DR

Abstract

Approximate Algorithms For $k$-Sparse Wasserstein Barycenter With Outliers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (47)