Identifiability of Deep Polynomial Neural Networks
Konstantin Usevich, Ricardo Borsoi, Clara Dérand, Marianne Clausel
TL;DR
This work addresses the identifiability of deep polynomial neural networks (PNNs) with monomial activations by establishing a localization principle that reduces deep identifiability to shallow two-layer subnetworks, and by leveraging a close connection to partially symmetric CP decompositions and Kruskal-type uniqueness. It provides constructive proofs showing that an $L$-layer hPNN is finitely identifiable if every 2-layer block is identifiable on a subset of inputs, and derives architectural conditions under which identifiability holds, including pyramidal and encoder–decoder networks, with activation thresholds scaling linearly in layer widths. The authors extend identifiability to biased PNNs via a homogenization technique, linking the biased case to the homogeneous theory and enabling transfer of results. These contributions settle conjectures about neurovariety dimensions, yield practical bounds on activation degrees, and offer a pathway to tensor-based training and model-stitching, with implications for interpretability and network compression. Overall, the paper provides a rigorous, tensor-algebraic framework for understanding when deep PNNs admit finite or unique representations, and how architecture shapes identifiability and the geometry of neurovarieties.
Abstract
Polynomial Neural Networks (PNNs) possess a rich algebraic and geometric structure. However, their identifiability -- a key property for ensuring interpretability -- remains poorly understood. In this work, we present a comprehensive analysis of the identifiability of deep PNNs, including architectures with and without bias terms. Our results reveal an intricate interplay between activation degrees and layer widths in achieving identifiability. As special cases, we show that architectures with non-increasing layer widths are generically identifiable under mild conditions, while encoder-decoder networks are identifiable when the decoder widths do not grow too rapidly compared to the activation degrees. Our proofs are constructive and center on a connection between deep PNNs and low-rank tensor decompositions, and Kruskal-type uniqueness theorems. We also settle an open conjecture on the dimension of PNN's neurovarieties, and provide new bounds on the activation degrees required for it to reach the expected dimension.
