On permutation-invariant neural networks
Masanari Kimura, Ryotaro Shimizu, Yuki Hirakawa, Ryosuke Goto, Yuki Saito
TL;DR
This work surveys neural networks that process sets by enforcing permutation invariance, covering foundational architectures (e.g., Deep Sets, PointNet, Set Transformer) and newer variants (SetNorm-based, DSPN/iDSPN, SetVAE, PointCLIP, Slot Attention, Perceiver) across a broad spectrum of tasks. It formalizes the connecting theory around sum- and max-decomposability, introduces Janossy pooling as a unifying framework, and shows how aggregation choices critically shape expressivity; these insights motivate a generalized class called Hölder's Power Deep Sets that interpolate between common aggregations. The paper catalogs datasets and theoretical results, demonstrating how permutation-invariant models enable robust set-function approximation in domains ranging from 3D point clouds to visual reasoning and beyond, while highlighting limitations and practical considerations. Overall, it positions permutation-invariant architectures as a versatile toolkit for extracting structured information from unordered collections, with implications for scalability, interpretability, and cross-domain applicability.
Abstract
Conventional machine learning algorithms have traditionally been designed under the assumption that input data follows a vector-based format, with an emphasis on vector-centric paradigms. However, as the demand for tasks involving set-based inputs has grown, there has been a paradigm shift in the research community towards addressing these challenges. In recent years, the emergence of neural network architectures such as Deep Sets and Transformers has presented a significant advancement in the treatment of set-based data. These architectures are specifically engineered to naturally accommodate sets as input, enabling more effective representation and processing of set structures. Consequently, there has been a surge of research endeavors dedicated to exploring and harnessing the capabilities of these architectures for various tasks involving the approximation of set functions. This comprehensive survey aims to provide an overview of the diverse problem settings and ongoing research efforts pertaining to neural networks that approximate set functions. By delving into the intricacies of these approaches and elucidating the associated challenges, the survey aims to equip readers with a comprehensive understanding of the field. Through this comprehensive perspective, we hope that researchers can gain valuable insights into the potential applications, inherent limitations, and future directions of set-based neural networks. Indeed, from this survey we gain two insights: i) Deep Sets and its variants can be generalized by differences in the aggregation function, and ii) the behavior of Deep Sets is sensitive to the choice of the aggregation function. From these observations, we show that Deep Sets, one of the well-known permutation-invariant neural networks, can be generalized in the sense of a quasi-arithmetic mean.
