Polynomial Width is Sufficient for Set Representation with High-dimensional Features

Peihao Wang; Shenghao Yang; Shu Li; Zhangyang Wang; Pan Li

Polynomial Width is Sufficient for Set Representation with High-dimensional Features

Peihao Wang, Shenghao Yang, Shu Li, Zhangyang Wang, Pan Li

TL;DR

This work addresses the expressiveness of DeepSets-style architectures for set functions with high-dimensional features ($D>1$) by proving that the intermediate embedding width $L$ can grow polynomially with the set size $N$ and feature dimension $D$. It introduces two embedding schemes, LP and LE, and provides constructive proofs that $L$ lies within polynomial bounds for both, extending the classic $D=1$ results to the high-dimensional setting. The authors also extend the theory to permutation-equivariant functions and the complex domain, and they provide empirical validation supporting the polynomial scaling of the required embedding width. The findings have practical implications for scalable set-function representations in DeepSets-based modules within GNNs and related architectures, enabling efficient yet expressive set processing with polynomial resources.

Abstract

Set representation has become ubiquitous in deep learning for modeling the inductive bias of neural networks that are insensitive to the input order. DeepSets is the most widely used neural network architecture for set representation. It involves embedding each set element into a latent space with dimension $L$, followed by a sum pooling to obtain a whole-set embedding, and finally mapping the whole-set embedding to the output. In this work, we investigate the impact of the dimension $L$ on the expressive power of DeepSets. Previous analyses either oversimplified high-dimensional features to be one-dimensional features or were limited to analytic activations, thereby diverging from practical use or resulting in $L$ that grows exponentially with the set size $N$ and feature dimension $D$. To investigate the minimal value of $L$ that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE). We demonstrate that $L$ being poly$(N, D)$ is sufficient for set representation using both embedding layers. We also provide a lower bound of $L$ for the LP embedding layer. Furthermore, we extend our results to permutation-equivariant set functions and the complex field.

Polynomial Width is Sufficient for Set Representation with High-dimensional Features

TL;DR

This work addresses the expressiveness of DeepSets-style architectures for set functions with high-dimensional features (

) by proving that the intermediate embedding width

can grow polynomially with the set size

and feature dimension

. It introduces two embedding schemes, LP and LE, and provides constructive proofs that

lies within polynomial bounds for both, extending the classic

results to the high-dimensional setting. The authors also extend the theory to permutation-equivariant functions and the complex domain, and they provide empirical validation supporting the polynomial scaling of the required embedding width. The findings have practical implications for scalable set-function representations in DeepSets-based modules within GNNs and related architectures, enabling efficient yet expressive set processing with polynomial resources.

Abstract

, followed by a sum pooling to obtain a whole-set embedding, and finally mapping the whole-set embedding to the output. In this work, we investigate the impact of the dimension

on the expressive power of DeepSets. Previous analyses either oversimplified high-dimensional features to be one-dimensional features or were limited to analytic activations, thereby diverging from practical use or resulting in

that grows exponentially with the set size

and feature dimension

. To investigate the minimal value of

that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE). We demonstrate that

being poly

is sufficient for set representation using both embedding layers. We also provide a lower bound of

for the LP embedding layer. Furthermore, we extend our results to permutation-equivariant set functions and the complex field.

Paper Structure (41 sections, 42 theorems, 65 equations, 3 figures, 1 table)

This paper contains 41 sections, 42 theorems, 65 equations, 3 figures, 1 table.

Introduction
Practical Implications.
Preliminaries
Notations and Problem Setup
DeepSets and The Proof for the One-Dimensional Case ($D=1$)
Curse of High-dimensional Features ($D \ge 2$)
Main Results
Empirical Validation.
Importance of Continuity.
Comparison with Prior Results.
Proof Sketch
Injectivity
Anchor
Injectivity of LP
Construction.
...and 26 more sections

Key Result

Theorem 2.4

A continuous function $f: \mathbb{R}^{N} \rightarrow \mathbb{R}$ is permutation-invariant (i.e., a set function) if and only if there exists continuous functions $\phi: \mathbb{R} \rightarrow \mathbb{R}^{L}$ and $\rho: \mathbb{R}^L \rightarrow \mathbb{R}$ such that $f({\boldsymbol{X}}) = \rho\left(

Figures (3)

Figure 1: Illustration of the proposed linear + power mapping embedding layer (LP) and linear + exponential activation embedding layer (LE).
Figure 2: The relationship among the critical width $L$, set size $N$, and feature dimension $D$. The phenomenon that $\log(L)$ scales linearly with $\log(N)$ and $\log(D)$ validates our theory.
Figure 3: (a) illustrates the overall idea to construct LP and LE embedding layers and prove their injectivity. In the forward pass, LP and LE will 1) construct an anchor with redundant non-anchor channels through a linear layer ${\boldsymbol{A}} = {\boldsymbol{\alpha}}_1\cdots{\boldsymbol{\alpha}}_{K_1}$ (Lemma \ref{['lem:anchor_con']}), 2) and couple each feature channel with the both anchor and non-anchor channels with the their own coupling schemes, respectively. To prove injectivity, the implication follows the converse agenda of construction: 1) by the properties of coupling schemes specified by LP (Lemma \ref{['lem:lin_coupling']}) and LE (Lemma \ref{['lem:mono_coupling']}) layers, we obtain pairwise equivalence with anchors, 2) and by union alignment lemma (Lemma \ref{['lem:union_align']}), we recover the global equivalence. (b)(c) depict the detailed construction inside the LP and LE embedding layers, respectively. LP embedding layer utilizes linear combination plus a power mapping to couple feature channels with the anchor(s) and non-anchors, while LE adopts a linear combination plus an exponential mapping, which is essentially an exponential function followed by a bivariate monomial. The constructed components marked in gray represent the redundant pairs between feature channels and non-anchor channels, which will not be used in the chain of implication to prove the injectivity.

Theorems & Definitions (93)

Definition 2.1: Equivalence Class
Definition 2.2
Definition 2.3
Theorem 2.4: DeepSets zaheer2017deepset, $D=1$
Remark 2.5
Definition 2.6: Power mapping
Definition 2.7: Injectivity
Lemma 2.8: Existence of Continuous Inverse of Sum-of-Power zaheer2017deepsetwagstaff2019limitations
Theorem 3.1: The main result
Remark 3.2
...and 83 more

Polynomial Width is Sufficient for Set Representation with High-dimensional Features

TL;DR

Abstract

Polynomial Width is Sufficient for Set Representation with High-dimensional Features

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (93)