Efficient Model-Agnostic Multi-Group Equivariant Networks

Razan Baltaji; Sourya Basu; Lav R. Varshney

Efficient Model-Agnostic Multi-Group Equivariant Networks

Razan Baltaji, Sourya Basu, Lav R. Varshney

TL;DR

The paper addresses the high computational cost of model-agnostic group equivariant networks when handling large product groups and multiple inputs. It introduces two efficient designs: (i) a multi-input architecture with an invariant-symmetric (IS) fusion layer that characterizes and leverages the linear equivariant space, extendable to nonlinear models, and (ii) a large-product-group design for single-input cases that achieves equivariance with complexity $O(|G_1|+\cdots+|G_N|)$ instead of the naively exponential $|G_1|\cdots|G_N|$. The IS layer is shown to be a universal approximator of invariant-symmetric functions, and the large-product design achieves comparable performance to equitune with substantially lower compute. Empirically, the methods are validated on multi-image classification, SCAN-II compositional language tasks, intersectional fairness in NLG, and robust CLIP-based classification, demonstrating competitive results and meaningful efficiency gains for practical deployment.

Abstract

Constructing model-agnostic group equivariant networks, such as equitune (Basu et al., 2023b) and its generalizations (Kim et al., 2023), can be computationally expensive for large product groups. We address this problem by providing efficient model-agnostic equivariant designs for two related problems: one where the network has multiple inputs each with potentially different groups acting on them, and another where there is a single input but the group acting on it is a large product group. For the first design, we initially consider a linear model and characterize the entire equivariant space that satisfies this constraint. This characterization gives rise to a novel fusion layer between different channels that satisfies an invariance-symmetry (IS) constraint, which we call an IS layer. We then extend this design beyond linear models, similar to equitune, consisting of equivariant and IS layers. We also show that the IS layer is a universal approximator of invariant-symmetric functions. Inspired by the first design, we use the notion of the IS property to design a second efficient model-agnostic equivariant design for large product groups acting on a single input. For the first design, we provide experiments on multi-image classification where each view is transformed independently with transformations such as rotations. We find equivariant models are robust to such transformations and perform competitively otherwise. For the second design, we consider three applications: language compositionality on the SCAN dataset to product groups; fairness in natural language generation from GPT-2 to address intersectionality; and robust zero-shot image classification with CLIP. Overall, our methods are simple and general, competitive with equitune and its variants, while also being computationally more efficient.

Efficient Model-Agnostic Multi-Group Equivariant Networks

TL;DR

instead of the naively exponential

. The IS layer is shown to be a universal approximator of invariant-symmetric functions, and the large-product design achieves comparable performance to equitune with substantially lower compute. Empirically, the methods are validated on multi-image classification, SCAN-II compositional language tasks, intersectional fairness in NLG, and robust CLIP-based classification, demonstrating competitive results and meaningful efficiency gains for practical deployment.

Abstract

Paper Structure (48 sections, 5 theorems, 13 equations, 10 figures, 8 tables)

This paper contains 48 sections, 5 theorems, 13 equations, 10 figures, 8 tables.

Introduction
Background and Related Works
Group equivariance and invariance-symmetry
Model-agnostic group equivariant networks
Additional related works
Method
Problem Formulation and Proof of Equivariance
Characterization of the Linear Equivariant Space
Beyond Linear Equivariant Space
Equivariant Network for Large Discrete Product Groups
Applications
Multi-Image Classification
Compositional Generalization in Languages
Intersectional Fairness in Natural Language Generation
Robust Image Classification using CLIP
...and 33 more sections

Key Result

Theorem 1

The multi-group equivariant layer $L_{G_1, G_2} ([X_1, X_2])$ defined in equation eqn: linear_multi_equi_layer is equivariant to $(G_1, G_2)$ applied to $(X_1, X_2)$, respectively.

Figures (10)

Figure 1: (a) a multi-input group equivariant network defined in §\ref{['subsec:method_IS_layer']}, where groups $G_1, G_2$ act on the inputs $X_1, X_2$. Here $M^{Eq}_{i, G_i}$ denotes a layer equivariant to $G_i$ and $M^{IS}_{ij, G_i, G_j}$ denote a layer invariant-symmetric to groups $G_i, G_j$. (b) a model equivariant to $G_1\rtimes G_2$ defined in §\ref{['subsec:method_equivariant_large_discrete_groups']} but with only a computational complexity of $O(|G_1| + |G_2|)$. Here $X^{Inv}_{G}$ denotes that the input features are invariant $G$ and $Y^{Sym}_G$ denotes that the output features are symmetric with respect to $G$.
Figure 2: Multi-Equituning for SCAN for (a) LSTM (b) GRU (c) RNN Models. Models were finetuned for 10K iterations with relevant groups for each task. Comparisons are done with pretrained and equi-tuned models. Results are over three random seeds.
Figure 3: The plots (a), (b), and (c) show the distribution of regard scores for the respect task for the set of demographic groups gender, race, and an intersection of gender, race, and sexual orientation respectively. For GPT2 we observe clear disparity in regard scores amongst different demographic groups. Each bar in the plots correspond to 500 generated samples. Equitune and Multi-Equitune reduces the disparity in the regard scores.
Figure 4: (a) shows that CLIP is not robust to the transformations of $90^{\circ}$ rotations (rot90) and flips. (b) and (c) show that multi-equitune and multi-equizero are competitive with equitune and equizero, respectively, for zero-shot classification using RN101 and ViT-B/16 encoders of CLIP for the product of the transformations rot90 and flips, even with much lesser compute.
Figure 5: The plots (a), (b), and (c) show the distribution of regard scores for the occupation task for the set of demographic groups gender, race, and an intersection of gender, race, and sexual orientation respectively. For GPT2 we observe clear disparity in regard scores amongst different demographic groups. Each bar in the plots correspond to 500 generated samples. Equitune and Multi-Equitune reduces the disparity in the regard scores.
...and 5 more figures

Theorems & Definitions (11)

Theorem 1
Lemma 1
Theorem 2
Definition 1
Theorem 3
Theorem 4
proof : Proof to Thm. \ref{['thm: linear_multi_equi_layer']}
proof : Proof to Lem. \ref{['lemma: dimension_of_IS_layer']}
proof : Proof to Thm. \ref{['thm: characterization']}
proof : Proof to Thm. \ref{['thm: universality_invariant_symmetric']}
...and 1 more

Efficient Model-Agnostic Multi-Group Equivariant Networks

TL;DR

Abstract

Efficient Model-Agnostic Multi-Group Equivariant Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (11)