Double-Bounded Optimal Transport for Advanced Clustering and Classification

Liangliang Shi; Zhaoqi Shen; Junchi Yan

Double-Bounded Optimal Transport for Advanced Clustering and Classification

Liangliang Shi, Zhaoqi Shen, Junchi Yan

TL;DR

This paper introduces Double-Bounded Optimal Transport (DB-OT), a variant of optimal transport that replaces the target mass equality with a double-bounded range, enabling controllable transport outcomes for uncertain targets. It develops three entropic-regularized, Sinkhorn-style algorithms to solve DB-OT, including a Bregman-iteration Schrödinger form, a Sinkhorn-Knopp factorization with triple scaling, and a dual-coordinate ascent method. DB-OT is applied to barycenter-based clustering to regulate cluster sizes and to long-tailed classification by connecting training to Inverse OT and testing to OT-based inference, with Balanced Softmax as a special case. Empirical results on Gaussian mixtures, MNIST, and LT vision benchmarks demonstrate improved clustering controllability and robust LT performance, highlighting DB-OT’s practical impact for data with varying target distributions and class imbalance.

Abstract

Optimal transport (OT) is attracting increasing attention in machine learning. It aims to transport a source distribution to a target one at minimal cost. In its vanilla form, the source and target distributions are predetermined, which contracts to the real-world case involving undetermined targets. In this paper, we propose Doubly Bounded Optimal Transport (DB-OT), which assumes that the target distribution is restricted within two boundaries instead of a fixed one, thus giving more freedom for the transport to find solutions. Based on the entropic regularization of DB-OT, three scaling-based algorithms are devised for calculating the optimal solution. We also show that our DB-OT is helpful for barycenter-based clustering, which can avoid the excessive concentration of samples in a single cluster. Then we further develop DB-OT techniques for long-tailed classification which is an emerging and open problem. We first propose a connection between OT and classification, that is, in the classification task, training involves optimizing the Inverse OT to learn the representations, while testing involves optimizing the OT for predictions. With this OT perspective, we first apply DB-OT to improve the loss, and the Balanced Softmax is shown as a special case. Then we apply DB-OT for inference in the testing process. Even with vanilla Softmax trained features, our extensive experimental results show that our method can achieve good results with our improved inference scheme in the testing stage.

Double-Bounded Optimal Transport for Advanced Clustering and Classification

TL;DR

Abstract

Paper Structure (33 sections, 6 theorems, 65 equations, 4 figures, 5 tables)

This paper contains 33 sections, 6 theorems, 65 equations, 4 figures, 5 tables.

Introduction
Preliminaries and Related Work
Basics of Optimal Transport
Optimal Transport w/ Inequality Constraints
Unbalanced Optimal Transport
Unbalanced Image Recognition
Double-Bounded Optimal Transport
Formulation of DB-OT
Sinkhorn Algorithm Variants for DB-OT
DB-OT for Clustering and Classification
Barycenter-based Clustering
DB-OT for Long-tailed Classification
Experiments
Experiments on Size-controlled Clustering
Experiments on Long-tailed Classification
...and 18 more sections

Key Result

Proposition 1

Redefine a general KL divergence in line with benamou2015iterative$\widetilde{KL}(\mathbf{P}|\mathbf{K})=\sum_{ij}\mathbf{P}_{ij}\log\frac{\mathbf{P}_{ij}}{\mathbf{K}_{ij}}-\mathbf{P}_{ij}+\mathbf{K}_{ij}.$ Let $\mathbf{K}_{ij}=e^{-\mathbf{C}_{ij}/\epsilon}$, the optimization in Eq. eq:EDB-OT is equ

Figures (4)

Figure 1: Illustration for the difference between vanilla OT and our DB-OT using the example of mines and factories as source and target, respectively. Vanilla OT assumes the equivalence between the supply and demand. In our DB-OT, we assume that the demand of the factory is not a fixed value, but rather a certain range by upper and lower bounds.
Figure 2: The results of Barycenter-based clustering, which is performed on data points sampled from 5 Gaussian distributions. The colors represent the cluster assignments of the samples, and the red crosses denote the centroids/barycenters. Note that in both OT-LDA and our method without reweighting the barycenter weights, the calculated centroids exhibit a noticeable bias.
Figure 3: Clustering distribution and the pixel-wise mean centroids (forming into numbers) on MNIST. Our results are well controlled within the bounds, and kmeans cannot satisfy this property resulting in more scattered clusters of varying size.
Figure 4: The result of clustering with different bounds. The top-1 accuracy is 72.50, 70.83, 75.00, 70.00, 70.00, 68.33 respectively. The six histograms indicate the number of each class and the balck dotted line is the bound of each case.

Theorems & Definitions (11)

Proposition 1: Static Schrödinger Form
Proposition 2: Solution Property
Proposition 3: Dual Formulation
Proposition 4: Static Schrödinger Form
Proof 1
Proof 2
Proposition 5: Solution Property
Proof 3
Proof 4
Proposition 6: Dual Formulation
...and 1 more

Double-Bounded Optimal Transport for Advanced Clustering and Classification

TL;DR

Abstract

Double-Bounded Optimal Transport for Advanced Clustering and Classification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (11)