Table of Contents
Fetching ...

On a group of invariances in a class of functions

Shravan Mohan

TL;DR

This work analyzes a class of compositional models built from alternating polynomial layers and rectified monomial activations, revealing a concrete invariance group generated by input linear reparameterizations and inter-layer permutation/diagonal scalings. It provides a constructive description of how these invariances can be exploited for canonical representations, efficient regularized optimization via geometric programming, and parameter range minimization to improve quantization and robustness. The authors develop obfuscation protocols for private inference and secure remote training, leveraging the invariance structure to hide inputs and parameters while preserving functional behavior. They also extend the invariance analysis to self-attention, identifying a bilinear query–key invariance and a linear value–output invariance, along with permutation equivariance of Pre-LN Transformer blocks. Overall, the work offers a principled toolkit for model compression, privacy-preserving computation, and deeper understanding of nonlinear compositional architectures.

Abstract

A class of parametric functions formed by alternating compositions of multivariate polynomials and rectification style monomial maps is studied (the layer-wise exponents are treated as fixed hyperparameters and are not optimized). For this family, nontrivial parametric invariances are identified and characterized, i.e., distinct parameter settings that induce identical input-output maps. A constructive description of the invariance structure is provided, enabling sparse function representations, parameter obfuscation, and potential dimensionality reduction for optimization.

On a group of invariances in a class of functions

TL;DR

This work analyzes a class of compositional models built from alternating polynomial layers and rectified monomial activations, revealing a concrete invariance group generated by input linear reparameterizations and inter-layer permutation/diagonal scalings. It provides a constructive description of how these invariances can be exploited for canonical representations, efficient regularized optimization via geometric programming, and parameter range minimization to improve quantization and robustness. The authors develop obfuscation protocols for private inference and secure remote training, leveraging the invariance structure to hide inputs and parameters while preserving functional behavior. They also extend the invariance analysis to self-attention, identifying a bilinear query–key invariance and a linear value–output invariance, along with permutation equivariance of Pre-LN Transformer blocks. Overall, the work offers a principled toolkit for model compression, privacy-preserving computation, and deeper understanding of nonlinear compositional architectures.

Abstract

A class of parametric functions formed by alternating compositions of multivariate polynomials and rectification style monomial maps is studied (the layer-wise exponents are treated as fixed hyperparameters and are not optimized). For this family, nontrivial parametric invariances are identified and characterized, i.e., distinct parameter settings that induce identical input-output maps. A constructive description of the invariance structure is provided, enabling sparse function representations, parameter obfuscation, and potential dimensionality reduction for optimization.
Paper Structure (31 sections, 3 theorems, 88 equations)

This paper contains 31 sections, 3 theorems, 88 equations.

Key Result

Lemma 1

Let $P \in \mathbb{R}^{d_k \times d_k}$ be any invertible matrix. Define transformed parameters Then the attention scores and attention weights are unchanged, i.e.,

Theorems & Definitions (6)

  • Lemma 1: Query--Key Bilinear Invariance
  • proof
  • Lemma 2: Value--Output Linear Invariance
  • proof
  • Theorem 1: Permutation Equivariance
  • proof