Table of Contents
Fetching ...

On the Lipschitz Continuity of Set Aggregation Functions and Neural Networks for Sets

Giannis Nikolentzos, Konstantinos Skianis

Abstract

The Lipschitz constant of a neural network is connected to several important prop- erties of the network such as its robustness and generalization. It is thus useful in many settings to estimate the Lipschitz constant of a model. Prior work has fo- cused mainly on estimating the Lipschitz constant of multi-layer perceptrons and convolutional neural networks. Here we focus on data modeled as sets or multi- sets of vectors and on neural networks that can handle such data. These models typically apply some permutation invariant aggregation function, such as the sum, mean or max operator, to the input multisets to produce a single vector for each input sample. In this paper, we investigate whether these aggregation functions, along with an attention-based aggregation function, are Lipschitz continuous with respect to three distance functions for unordered multisets, and we compute their Lipschitz constants. In the general case, we find that each aggregation function is Lipschitz continuous with respect to only one of the three distance functions, while the attention-based function is not Lipschitz continuous with respect to any of them. Then, we build on these results to derive upper bounds on the Lipschitz constant of neural networks that can process multisets of vectors, while we also study their stability to perturbations and generalization under distribution shifts. To empirically verify our theoretical analysis, we conduct a series of experiments on datasets from different domains.

On the Lipschitz Continuity of Set Aggregation Functions and Neural Networks for Sets

Abstract

The Lipschitz constant of a neural network is connected to several important prop- erties of the network such as its robustness and generalization. It is thus useful in many settings to estimate the Lipschitz constant of a model. Prior work has fo- cused mainly on estimating the Lipschitz constant of multi-layer perceptrons and convolutional neural networks. Here we focus on data modeled as sets or multi- sets of vectors and on neural networks that can handle such data. These models typically apply some permutation invariant aggregation function, such as the sum, mean or max operator, to the input multisets to produce a single vector for each input sample. In this paper, we investigate whether these aggregation functions, along with an attention-based aggregation function, are Lipschitz continuous with respect to three distance functions for unordered multisets, and we compute their Lipschitz constants. In the general case, we find that each aggregation function is Lipschitz continuous with respect to only one of the three distance functions, while the attention-based function is not Lipschitz continuous with respect to any of them. Then, we build on these results to derive upper bounds on the Lipschitz constant of neural networks that can process multisets of vectors, while we also study their stability to perturbations and generalization under distribution shifts. To empirically verify our theoretical analysis, we conduct a series of experiments on datasets from different domains.

Paper Structure

This paper contains 78 sections, 11 theorems, 104 equations, 7 figures, 3 tables.

Key Result

Proposition 2.2

The matching distance is a metric on $\mathcal{S}(\mathbb{R}^d \setminus \{ \mathbf{0}\})$ where $d \in \mathbb{N}$ and $\mathbf{0}$ is the zero vector. It is a pseudometric on $\mathcal{S}(\mathbb{R}^d)$.

Figures (7)

  • Figure 1: Each dot corresponds to a pair of point clouds from the test set of ModelNet40. Each subfigure compares the distance between the latent representations of pairs of point clouds computed by a distance function for multisets (i. e., EMD, Hausdorff distance or matching distance) with the Euclidean distance between the representations of the pairs obtained after applying an aggregation function (i. e., mean, sum or max). The correlation between the two distances is also computed and visualized. The Lipschitz bounds are illustrated with dashed lines.
  • Figure 2: Each dot corresponds to a pair of point clouds from the test set of ModelNet40. Each subfigure compares the distance between the pairs of point clouds computed by EMD, Hausdorff distance or matching distance with the Euclidean distance between the representations of the pairs that emerge at the second-to-last layer of $\textsc{NN}_\textsc{mean}$, $\textsc{NN}_\textsc{sum}$ or $\textsc{NN}_\textsc{max}$.
  • Figure 3: Size Generalization of $\textsc{NN}_\textsc{mean}$ and $\textsc{NN}_\textsc{max}$ models. For illustration purposes, the Wasserstein distances $\mathcal{W}_1$ are normalized to make the maximal distance equal to the greatest performance drops. The models in the left plots are trained on the first bucket, while those in the right plots are trained on the last bucket.
  • Figure 4: Each dot corresponds to a pair of documents from the test set of Polarity that is represented as a multiset of word vectors. Each subfigure compares the distance between the latent representations of pairs of documents computed by a distance function for multisets (i. e., EMD, Hausdorff distance or matching distance) with the Euclidean distance between the representations of the pairs obtained after applying an aggregation function (i. e., mean, sum or max). The correlation between the two distances is also computed and visualized. The Lipschitz bounds are illustrated with dashed lines.
  • Figure 5: Each dot corresponds to a pair of documents from the test set of Polarity that is represented as a multiset of word vectors. Each subfigure compares the distance between the pairs of documents computed by a distance function for multisets (i. e., EMD, Hausdorff distance or matching distance) with the Euclidean distance between the representations of the pairs that emerge at the second-to-last layer of $\textsc{NN}_\textsc{mean}$, $\textsc{NN}_\textsc{sum}$ or $\textsc{NN}_\textsc{max}$. The correlation between the two distances is also computed and visualized. The Lipschitz bounds are illustrated with dashed lines.
  • ...and 2 more figures

Theorems & Definitions (14)

  • Definition 2.1
  • Proposition 2.2: Proof in Appendix \ref{['sec:proof_prop1']}
  • Proposition 2.3: Proof in Appendix \ref{['sec:proof_prop2']}
  • Theorem 3.1: Proof in Appendix \ref{['sec:proof_thm1']}
  • Lemma 3.2: Proof in Appendix \ref{['sec:proof_lem1']}
  • Proposition 3.3: Proof in Appendix \ref{['sec:proof_prop3']}
  • Theorem 3.4: Proof in Appendix \ref{['sec:proof_thm2']}
  • Lemma 3.5: Proof in Appendix \ref{['sec:proof_lem2']}
  • Proposition 3.6: Proof in Appendix \ref{['sec:proof_prop4']}
  • Theorem 3.7: shen2018wasserstein
  • ...and 4 more