Table of Contents
Fetching ...

Polynomial Width is Sufficient for Set Representation with High-dimensional Features

Peihao Wang, Shenghao Yang, Shu Li, Zhangyang Wang, Pan Li

TL;DR

This work addresses the expressiveness of DeepSets-style architectures for set functions with high-dimensional features ($D>1$) by proving that the intermediate embedding width $L$ can grow polynomially with the set size $N$ and feature dimension $D$. It introduces two embedding schemes, LP and LE, and provides constructive proofs that $L$ lies within polynomial bounds for both, extending the classic $D=1$ results to the high-dimensional setting. The authors also extend the theory to permutation-equivariant functions and the complex domain, and they provide empirical validation supporting the polynomial scaling of the required embedding width. The findings have practical implications for scalable set-function representations in DeepSets-based modules within GNNs and related architectures, enabling efficient yet expressive set processing with polynomial resources.

Abstract

Set representation has become ubiquitous in deep learning for modeling the inductive bias of neural networks that are insensitive to the input order. DeepSets is the most widely used neural network architecture for set representation. It involves embedding each set element into a latent space with dimension $L$, followed by a sum pooling to obtain a whole-set embedding, and finally mapping the whole-set embedding to the output. In this work, we investigate the impact of the dimension $L$ on the expressive power of DeepSets. Previous analyses either oversimplified high-dimensional features to be one-dimensional features or were limited to analytic activations, thereby diverging from practical use or resulting in $L$ that grows exponentially with the set size $N$ and feature dimension $D$. To investigate the minimal value of $L$ that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE). We demonstrate that $L$ being poly$(N, D)$ is sufficient for set representation using both embedding layers. We also provide a lower bound of $L$ for the LP embedding layer. Furthermore, we extend our results to permutation-equivariant set functions and the complex field.

Polynomial Width is Sufficient for Set Representation with High-dimensional Features

TL;DR

This work addresses the expressiveness of DeepSets-style architectures for set functions with high-dimensional features () by proving that the intermediate embedding width can grow polynomially with the set size and feature dimension . It introduces two embedding schemes, LP and LE, and provides constructive proofs that lies within polynomial bounds for both, extending the classic results to the high-dimensional setting. The authors also extend the theory to permutation-equivariant functions and the complex domain, and they provide empirical validation supporting the polynomial scaling of the required embedding width. The findings have practical implications for scalable set-function representations in DeepSets-based modules within GNNs and related architectures, enabling efficient yet expressive set processing with polynomial resources.

Abstract

Set representation has become ubiquitous in deep learning for modeling the inductive bias of neural networks that are insensitive to the input order. DeepSets is the most widely used neural network architecture for set representation. It involves embedding each set element into a latent space with dimension , followed by a sum pooling to obtain a whole-set embedding, and finally mapping the whole-set embedding to the output. In this work, we investigate the impact of the dimension on the expressive power of DeepSets. Previous analyses either oversimplified high-dimensional features to be one-dimensional features or were limited to analytic activations, thereby diverging from practical use or resulting in that grows exponentially with the set size and feature dimension . To investigate the minimal value of that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE). We demonstrate that being poly is sufficient for set representation using both embedding layers. We also provide a lower bound of for the LP embedding layer. Furthermore, we extend our results to permutation-equivariant set functions and the complex field.
Paper Structure (41 sections, 42 theorems, 65 equations, 3 figures, 1 table)

This paper contains 41 sections, 42 theorems, 65 equations, 3 figures, 1 table.

Key Result

Theorem 2.4

A continuous function $f: \mathbb{R}^{N} \rightarrow \mathbb{R}$ is permutation-invariant (i.e., a set function) if and only if there exists continuous functions $\phi: \mathbb{R} \rightarrow \mathbb{R}^{L}$ and $\rho: \mathbb{R}^L \rightarrow \mathbb{R}$ such that $f({\boldsymbol{X}}) = \rho\left(

Figures (3)

  • Figure 1: Illustration of the proposed linear + power mapping embedding layer (LP) and linear + exponential activation embedding layer (LE).
  • Figure 2: The relationship among the critical width $L$, set size $N$, and feature dimension $D$. The phenomenon that $\log(L)$ scales linearly with $\log(N)$ and $\log(D)$ validates our theory.
  • Figure 3: (a) illustrates the overall idea to construct LP and LE embedding layers and prove their injectivity. In the forward pass, LP and LE will 1) construct an anchor with redundant non-anchor channels through a linear layer ${\boldsymbol{A}} = {\boldsymbol{\alpha}}_1\cdots{\boldsymbol{\alpha}}_{K_1}$ (Lemma \ref{['lem:anchor_con']}), 2) and couple each feature channel with the both anchor and non-anchor channels with the their own coupling schemes, respectively. To prove injectivity, the implication follows the converse agenda of construction: 1) by the properties of coupling schemes specified by LP (Lemma \ref{['lem:lin_coupling']}) and LE (Lemma \ref{['lem:mono_coupling']}) layers, we obtain pairwise equivalence with anchors, 2) and by union alignment lemma (Lemma \ref{['lem:union_align']}), we recover the global equivalence. (b)(c) depict the detailed construction inside the LP and LE embedding layers, respectively. LP embedding layer utilizes linear combination plus a power mapping to couple feature channels with the anchor(s) and non-anchors, while LE adopts a linear combination plus an exponential mapping, which is essentially an exponential function followed by a bivariate monomial. The constructed components marked in gray represent the redundant pairs between feature channels and non-anchor channels, which will not be used in the chain of implication to prove the injectivity.

Theorems & Definitions (93)

  • Definition 2.1: Equivalence Class
  • Definition 2.2
  • Definition 2.3
  • Theorem 2.4: DeepSets zaheer2017deepset, $D=1$
  • Remark 2.5
  • Definition 2.6: Power mapping
  • Definition 2.7: Injectivity
  • Lemma 2.8: Existence of Continuous Inverse of Sum-of-Power zaheer2017deepsetwagstaff2019limitations
  • Theorem 3.1: The main result
  • Remark 3.2
  • ...and 83 more