Table of Contents
Fetching ...

$\boldsymbolλ$-Orthogonality Regularization for Compatible Representation Learning

Simone Ricci, Niccolò Biondi, Federico Pernici, Ioannis Patras, Alberto Del Bimbo

TL;DR

This paper tackles the challenge of backward-compatible representations in retrieval systems when independently trained models diverge. It introduces a framework that jointly learns a backward orthogonal/adaptive map and a forward affine transform, augmented by a novel λ-Orthogonality regularization that smoothly enforces near-orthogonality with a tunable threshold $\lambda$. A supervised contrastive loss and a partial backfilling strategy further enhance alignment and efficient gallery updates. Empirically, the method preserves zero-shot performance while achieving strong backward compatibility and improved downstream task accuracy across diverse architectures and datasets. The approach provides a practical, architecture-agnostic solution for maintaining consistent retrieval performance during model updates and can be extended to broader representation-adaptation settings.

Abstract

Retrieval systems rely on representations learned by increasingly powerful models. However, due to the high training cost and inconsistencies in learned representations, there is significant interest in facilitating communication between representations and ensuring compatibility across independently trained neural networks. In the literature, two primary approaches are commonly used to adapt different learned representations: affine transformations, which adapt well to specific distributions but can significantly alter the original representation, and orthogonal transformations, which preserve the original structure with strict geometric constraints but limit adaptability. A key challenge is adapting the latent spaces of updated models to align with those of previous models on downstream distributions while preserving the newly learned representation spaces. In this paper, we impose a relaxed orthogonality constraint, namely $λ$-Orthogonality regularization, while learning an affine transformation, to obtain distribution-specific adaptation while retaining the original learned representations. Extensive experiments across various architectures and datasets validate our approach, demonstrating that it preserves the model's zero-shot performance and ensures compatibility across model updates. Code available at: \href{https://github.com/miccunifi/lambda_orthogonality.git}{https://github.com/miccunifi/lambda\_orthogonality}.

$\boldsymbolλ$-Orthogonality Regularization for Compatible Representation Learning

TL;DR

This paper tackles the challenge of backward-compatible representations in retrieval systems when independently trained models diverge. It introduces a framework that jointly learns a backward orthogonal/adaptive map and a forward affine transform, augmented by a novel λ-Orthogonality regularization that smoothly enforces near-orthogonality with a tunable threshold . A supervised contrastive loss and a partial backfilling strategy further enhance alignment and efficient gallery updates. Empirically, the method preserves zero-shot performance while achieving strong backward compatibility and improved downstream task accuracy across diverse architectures and datasets. The approach provides a practical, architecture-agnostic solution for maintaining consistent retrieval performance during model updates and can be extended to broader representation-adaptation settings.

Abstract

Retrieval systems rely on representations learned by increasingly powerful models. However, due to the high training cost and inconsistencies in learned representations, there is significant interest in facilitating communication between representations and ensuring compatibility across independently trained neural networks. In the literature, two primary approaches are commonly used to adapt different learned representations: affine transformations, which adapt well to specific distributions but can significantly alter the original representation, and orthogonal transformations, which preserve the original structure with strict geometric constraints but limit adaptability. A key challenge is adapting the latent spaces of updated models to align with those of previous models on downstream distributions while preserving the newly learned representation spaces. In this paper, we impose a relaxed orthogonality constraint, namely -Orthogonality regularization, while learning an affine transformation, to obtain distribution-specific adaptation while retaining the original learned representations. Extensive experiments across various architectures and datasets validate our approach, demonstrating that it preserves the model's zero-shot performance and ensures compatibility across model updates. Code available at: \href{https://github.com/miccunifi/lambda_orthogonality.git}{https://github.com/miccunifi/lambda\_orthogonality}.

Paper Structure

This paper contains 28 sections, 11 equations, 6 figures, 14 tables.

Figures (6)

  • Figure 1: Overview of the proposed approach for achieving representation compatibility during retrieval system updates. A newly independently trained model is aligned to the old representation space via an orthogonal transformation $B_{\perp}$, which preserves geometric structure. A forward transformation $F$ maps the old representations to the backward-aligned space of the new model. Only the transformation parameters are optimized during training, while model parameters remain fixed.
  • Figure 2: Impact of $\lambda$-Orthogonality regularization on affine transformations. Fig. \ref{['fig:lambda']} shows the variation of Eq. \ref{['eq:orth_smooth']} for different values of $\lambda$, demonstrating the influence of the threshold in the regularization. Fig. \ref{['fig:alpha']} illustrates the effect of varying $\alpha$ while keeping $\lambda=6$, highlighting its behavior in the sigmoid function. Fig. \ref{['fig:angle']} presents the kernel density estimation (KDE) of angles between the columns of matrix $W$ for different values of $\lambda$, showcasing the impact of regularization on orthogonality preservation.
  • Figure 3: Effects of affine (Fig. \ref{['fig:affine']}), strictly orthogonal (Fig. \ref{['fig:sorth']}), and $\lambda$-orthogonality (with $\lambda=1$) regularized (Fig. \ref{['fig:near_orth_mnist']}) transformations trained to align a source representation space (Fig. \ref{['fig:source']}) learned with a LeNet model (embedding dimension = 2) on the complete MNIST dataset, with a target representation space (Fig. \ref{['fig:target']}) learned on the first five classes of MNIST using the same architecture.
  • Figure 4: Partial backfilling results for the Extending Classes setting (top Figures) of Tab. \ref{['table:imagenet_ext']}, and Independently Pretrained Models setting (bottom Figures) of Tab. \ref{['table:imagenet_arch']}. We use features from the new model $\phi_{\text{new}}$ for the query set (otherwise $B_{\perp}(\phi_{\text{new}})$ if trained). For the gallery set, we begin with forward-adapted old features $F(\phi_{\text{old}})$ and incrementally replace them with new features.
  • Figure 5: Ablation on our $\lambda$-orthogonal regularization on CUB dataset. Displayed are the compatibility metrics on CUB and the zero‐shot (ZS) improvement on ImageNet1K at different value of $\lambda$. Results correspond to those in Tab. \ref{['table:lambda_ablation']}.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Definition 3.1: Backward-Compatibility