Table of Contents
Fetching ...

Soft Task-Aware Routing of Experts for Equivariant Representation Learning

Jaebyeong Jeon, Hyeonseo Jang, Jy-yong Sohn, Kibok Lee

TL;DR

The paper tackles redundancy in joint invariant and equivariant representation learning by treating projection heads as a set of soft-routed experts. It introduces Soft Task-Aware Routing (STAR), which either adds a shared projection or employs an MMoE-based projection to separate shared from task-specific information during self-supervised pretraining, while using a shift-predictor for equivariant learning. Empirical results across image classification, object detection, and few-shot benchmarks show STAR reduces redundant feature learning (lower canonical correlations) and improves transfer performance, with analyses confirming meaningful expert specialization and stronger equivariance metrics. The approach offers a principled, transfer-friendly way to harness both invariant and equivariant signals, improving generalization while revealing the internal dynamics of feature routing.

Abstract

Equivariant representation learning aims to capture variations induced by input transformations in the representation space, whereas invariant representation learning encodes semantic information by disregarding such transformations. Recent studies have shown that jointly learning both types of representations is often beneficial for downstream tasks, typically by employing separate projection heads. However, this design overlooks information shared between invariant and equivariant learning, which leads to redundant feature learning and inefficient use of model capacity. To address this, we introduce Soft Task-Aware Routing (STAR), a routing strategy for projection heads that models them as experts. STAR induces the experts to specialize in capturing either shared or task-specific information, thereby reducing redundant feature learning. We validate this effect by observing lower canonical correlations between invariant and equivariant embeddings. Experimental results show consistent improvements across diverse transfer learning tasks. The code is available at https://github.com/YonseiML/star.

Soft Task-Aware Routing of Experts for Equivariant Representation Learning

TL;DR

The paper tackles redundancy in joint invariant and equivariant representation learning by treating projection heads as a set of soft-routed experts. It introduces Soft Task-Aware Routing (STAR), which either adds a shared projection or employs an MMoE-based projection to separate shared from task-specific information during self-supervised pretraining, while using a shift-predictor for equivariant learning. Empirical results across image classification, object detection, and few-shot benchmarks show STAR reduces redundant feature learning (lower canonical correlations) and improves transfer performance, with analyses confirming meaningful expert specialization and stronger equivariance metrics. The approach offers a principled, transfer-friendly way to harness both invariant and equivariant signals, improving generalization while revealing the internal dynamics of feature routing.

Abstract

Equivariant representation learning aims to capture variations induced by input transformations in the representation space, whereas invariant representation learning encodes semantic information by disregarding such transformations. Recent studies have shown that jointly learning both types of representations is often beneficial for downstream tasks, typically by employing separate projection heads. However, this design overlooks information shared between invariant and equivariant learning, which leads to redundant feature learning and inefficient use of model capacity. To address this, we introduce Soft Task-Aware Routing (STAR), a routing strategy for projection heads that models them as experts. STAR induces the experts to specialize in capturing either shared or task-specific information, thereby reducing redundant feature learning. We validate this effect by observing lower canonical correlations between invariant and equivariant embeddings. Experimental results show consistent improvements across diverse transfer learning tasks. The code is available at https://github.com/YonseiML/star.

Paper Structure

This paper contains 54 sections, 14 equations, 16 figures, 11 tables.

Figures (16)

  • Figure 1: Crater Illusion. A lunar image that appear as a dome (left) or a crater (right) depending on orientation nasa_crater_illusion.
  • Figure 2: Overview of Proposed Routing Strategy: Soft Task-Aware Routing (STAR) of Experts. Given two augmented views $T(x; a)$ and $T(x; a')$, the encoder $f$ extracts features, which are then projected by the single shared projection (with three experts) or the MMoE projection module into invariant ($z^{\text{inv}}$) and equivariant ($z^{\text{eq}}$) embeddings. For equivariant learning, the projected augmentation parameter $\psi(a)$ and the equivariant embedding $z^{\text{eq}}$ are fed into a predictor $\phi_T$ to predict the target embedding $z^{\text{eq}}$. In practice, we implement $\psi$ as a single-layer MLP, and $\phi_T$ as a 3-layer MLP.
  • Figure 3: $k$-NN Retrieval on STL10. Query image (left); retrievals from the invariance-only model (top-right) and the equivariance-only model (bottom-right).
  • Figure 4: Analysis of Expert Specialization. (a) Routing weights averaged over test data in STL10, with experts reordered based on their roles. The min/max ratio (purple) measures the balance between how much each expert is utilized by the invariant and equivariant objectives. (b) $k$-NN retrieval results using the output embeddings of individual experts.
  • Figure 5: Analysis of Redundant Feature Learning. (a) Pairwise canonical correlation between expert outputs. (b) Mean canonical correlation and (c) mean classification accuracy on 11 out-of-domain datasets across different numbers of experts in our proposed method. For (a), the numerical values of the diagonal elements are omitted for better visualization.
  • ...and 11 more figures