Submodular Framework for Structured-Sparse Optimal Transport

Piyushi Manupriya; Pratik Jawanpuria; Karthik S. Gurumoorthy; SakethaNath Jagarlapudi; Bamdev Mishra

Submodular Framework for Structured-Sparse Optimal Transport

Piyushi Manupriya, Pratik Jawanpuria, Karthik S. Gurumoorthy, SakethaNath Jagarlapudi, Bamdev Mishra

TL;DR

This work introduces sparsity-constrained unbalanced optimal transport (UOT) by recasting sparsity requirements as matroid constraints and leveraging a weakly submodular surrogate. By showing that the induced set function is $\alpha$-weakly submodular, the authors design gradient-based discrete greedy algorithms with approximation guarantees for both general sparsity and column-wise sparsity, backed by a convex inner optimization via MMD-UOT and an explicit dual analysis. The approach yields sparse transport plans that are interpretable and diverse, and is demonstrated across topology design, word alignment, and mixture-of-experts gating, often outperforming strong baselines and achieving tight duality gaps. The results highlight a principled link between OT and submodularity, offering scalable, principled tools for structured sparsity in transport problems with unnormalized measures.

Abstract

Unbalanced optimal transport (UOT) has recently gained much attention due to its flexible framework for handling un-normalized measures and its robustness properties. In this work, we explore learning (structured) sparse transport plans in the UOT setting, i.e., transport plans have an upper bound on the number of non-sparse entries in each column (structured sparse pattern) or in the whole plan (general sparse pattern). We propose novel sparsity-constrained UOT formulations building on the recently explored maximum mean discrepancy based UOT. We show that the proposed optimization problem is equivalent to the maximization of a weakly submodular function over a uniform matroid or a partition matroid. We develop efficient gradient-based discrete greedy algorithms and provide the corresponding theoretical guarantees. Empirically, we observe that our proposed greedy algorithms select a diverse support set and we illustrate the efficacy of the proposed approach in various applications.

Submodular Framework for Structured-Sparse Optimal Transport

TL;DR

-weakly submodular, the authors design gradient-based discrete greedy algorithms with approximation guarantees for both general sparsity and column-wise sparsity, backed by a convex inner optimization via MMD-UOT and an explicit dual analysis. The approach yields sparse transport plans that are interpretable and diverse, and is demonstrated across topology design, word alignment, and mixture-of-experts gating, often outperforming strong baselines and achieving tight duality gaps. The results highlight a principled link between OT and submodularity, offering scalable, principled tools for structured sparsity in transport problems with unnormalized measures.

Abstract

Paper Structure (39 sections, 9 theorems, 37 equations, 8 figures, 9 tables, 4 algorithms)

This paper contains 39 sections, 9 theorems, 37 equations, 8 figures, 9 tables, 4 algorithms.

Introduction
Preliminaries
Optimal Transport
Submodularity
Restricted Strong Concavity and Restricted Smoothness
Proposed Method
Learning (General) Sparse Transport Plan
Learning Column-wise Sparse Transport Plan
Gradient Computation & Computational Cost
Dual Analysis of (\ref{['eqn:sparse-uotmmd']}) and (\ref{['eqn:reform']})
Related Works
Experimental Results
General Sparsity
Designing Topology
Monolingual Word Alignment
...and 24 more sections

Key Result

Lemma 3.1

$F(.)$ is a monotone, non-negative, and $\alpha$-weakly submodular function with the submodularity ratio $\alpha\geq \frac{u_{2K}}{\tilde{U}_1}>0$, where $K$ denotes the sparsity level of the transport plan $\bm{\gamma}$. Here, $K=K_1$ for $\mathcal{M}=\mathcal{M}_1$ and $K=nK_2$ for $\mathcal{M}=\m

Figures (8)

Figure 1: Example of a word alignment matrix obtained by our GenSparseUOT approach. Since the sentences convey similar information, most words in either sentences have a semantic counterpart, and our approach aligns them (almost) correctly. E.g., it correctly aligns 'powerful'$\leftrightarrow$'best' and 'abilities'$\leftrightarrow$'power' and (correctly) does not map 'powerful'$\leftrightarrow$'power' even though this pair is semantically close. Words without a semantic counterpart are left unaligned (null alignment).
Figure 2: (a)-(c) t-SNE mappings of the experts learned by different approaches. 'Expert$i\_\textup{C}j$' denotes the embeddings learnt by expert $i$ for samples belonging to class $j$. The embeddings learned with the proposed approach not only distinguish the instances from the two classes but also exhibit more diversity in the knowledge acquired by every expert. (d) The accuracy obtained on the test set.
Figure A3: (Best viewed in color.) The source samples are shown in blue and the target samples are shown in red. We show an edge between source point $i$ and target point $j$ if $\bm{\gamma}_{i, j}>0$. The intensity of the color represents the magnitude of $\bm{\gamma}_{i, j}$. (a) SSOT Blondel2018 results in 4 non-zero entries in $\bm{\gamma}$. (b) The top-4 entries of the MMD-UOT transport plan. (c) Proposed GenSparseUOT transport plan obtained with sparsity constraint $K=4$. We can see that the support points of the transport plan obtained by GenSparseUOT are the most diverse, resulting in one-to-one mapping between the source and the target.
Figure A4: (Best viewed in color.) The source samples are shown in blue and the target samples are shown in red. We show an edge between source point $i$ and target point $j$ if $\bm{\gamma}_{i, j}>0$. The intensity of the color represents the magnitude of $\bm{\gamma}_{i, j}$. (a) SSOT Blondel2018 results in 3 non-zero entries in $\bm{\gamma}$. (b) The top-3 entries of the MMD-UOT transport plan. (c) Proposed GenSparseUOT transport plan obtained with sparsity constraint $K=3$. We can see that the support points of the transport plan obtained by GenSparseUOT are the most diverse, resulting in one-to-one mapping between the source and the target.
Figure A5: (Best viewed in color.) (a) Initial source points (rainbow color) on the left and target points (in blue) on the right. (b) Gradient Flow results of MMD-UOT (c) Gradient Flow results of proposed GenSparseUOT solved with Algorithm \ref{['alg:gensparseOT_dash']}.
...and 3 more figures

Theorems & Definitions (17)

Lemma 3.1
Lemma 3.2
Lemma 3.3
Lemma 3.4
Proposition 3.5
Lemma A2.1
proof
Lemma A2.2
proof
Lemma A2.3
...and 7 more

Submodular Framework for Structured-Sparse Optimal Transport

TL;DR

Abstract

Submodular Framework for Structured-Sparse Optimal Transport

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (17)