Table of Contents
Fetching ...

LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders

Borna Khodabandeh, Amirabbas Afzali, Amirhossein Afsharrad, Seyed Shahabeddin Mousavi, Sanjay Lall, Sajjad Amini, Seyed-Mohsen Moosavi-Dezfooli

TL;DR

LORE introduces a Lagrangian-duality–based proximity constraint to a frozen reference embedding for unsupervised adversarial fine-tuning of vision encoders. By enforcing per-sample embedding proximity while optimizing worst-case perturbations, it achieves superior robustness with minimal loss of clean accuracy across CLIP and DINOv2 backbones, and it improves out-of-distribution robustness and embedding interpretability. The framework provides a tunable parameter ρ to control the robustness–fidelity trade-off and demonstrates strong performance in zero-shot and in-domain classification, with extensive ablations validating adaptive margins and dual-network elasticity. The work also shows the method's viability for supervised extensions and segmentation tasks, highlighting its broad applicability to foundation models. Overall, LORE offers a principled, scalable approach to stabilizing adversarial fine-tuning while preserving semantic fidelity in visual encoders.

Abstract

Visual encoders have become fundamental components in modern computer vision pipelines. However, ensuring robustness against adversarial perturbations remains a critical challenge. Recent efforts have explored both supervised and unsupervised adversarial fine-tuning strategies. We identify two key limitations in these approaches: (i) they often suffer from instability, especially during the early stages of fine-tuning, resulting in suboptimal convergence and degraded performance on clean data, and (ii) they exhibit a suboptimal trade-off between robustness and clean data accuracy, hindering the simultaneous optimization of both objectives. To overcome these challenges, we propose Lagrangian-Optimized Robust Embeddings (LORE), a novel unsupervised adversarial fine-tuning framework. LORE utilizes constrained optimization, which offers a principled approach to balancing competing goals, such as improving robustness while preserving nominal performance. By enforcing embedding-space proximity constraints, LORE effectively maintains clean data performance throughout adversarial fine-tuning. Extensive experiments show that LORE significantly improves zero-shot adversarial robustness with minimal degradation in clean data accuracy. Furthermore, we demonstrate the effectiveness of the adversarially fine-tuned CLIP image encoder in out-of-distribution generalization and enhancing the interpretability of image embeddings.

LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders

TL;DR

LORE introduces a Lagrangian-duality–based proximity constraint to a frozen reference embedding for unsupervised adversarial fine-tuning of vision encoders. By enforcing per-sample embedding proximity while optimizing worst-case perturbations, it achieves superior robustness with minimal loss of clean accuracy across CLIP and DINOv2 backbones, and it improves out-of-distribution robustness and embedding interpretability. The framework provides a tunable parameter ρ to control the robustness–fidelity trade-off and demonstrates strong performance in zero-shot and in-domain classification, with extensive ablations validating adaptive margins and dual-network elasticity. The work also shows the method's viability for supervised extensions and segmentation tasks, highlighting its broad applicability to foundation models. Overall, LORE offers a principled, scalable approach to stabilizing adversarial fine-tuning while preserving semantic fidelity in visual encoders.

Abstract

Visual encoders have become fundamental components in modern computer vision pipelines. However, ensuring robustness against adversarial perturbations remains a critical challenge. Recent efforts have explored both supervised and unsupervised adversarial fine-tuning strategies. We identify two key limitations in these approaches: (i) they often suffer from instability, especially during the early stages of fine-tuning, resulting in suboptimal convergence and degraded performance on clean data, and (ii) they exhibit a suboptimal trade-off between robustness and clean data accuracy, hindering the simultaneous optimization of both objectives. To overcome these challenges, we propose Lagrangian-Optimized Robust Embeddings (LORE), a novel unsupervised adversarial fine-tuning framework. LORE utilizes constrained optimization, which offers a principled approach to balancing competing goals, such as improving robustness while preserving nominal performance. By enforcing embedding-space proximity constraints, LORE effectively maintains clean data performance throughout adversarial fine-tuning. Extensive experiments show that LORE significantly improves zero-shot adversarial robustness with minimal degradation in clean data accuracy. Furthermore, we demonstrate the effectiveness of the adversarially fine-tuned CLIP image encoder in out-of-distribution generalization and enhancing the interpretability of image embeddings.

Paper Structure

This paper contains 36 sections, 3 theorems, 58 equations, 34 figures, 17 tables, 1 algorithm.

Key Result

Theorem 6.2

Let: Since $\mathcal{H}_\rho\subset\mathcal{H}$, we have $R_\rho\ge R$. Moreover, the suboptimality gap satisfies: Here $\phi^*$ and $\phi_\rho^*$ are the minimizers in $\mathcal{H}$ and $\mathcal{H}_\rho$, respectively; $L^*_\rho$ and $L'$ are their associated Lipschitz constants; and adversarial perturbations are $\ell_\infty$-bounded by $\varepsilon$ in $\mathbb{R}^k$. (Proof in Appendix proo

Figures (34)

  • Figure 1: (a) clean data accuracy during adversarial fine-tuning with different training perturbation strengths $\varepsilon$, using the loss from Eq. (\ref{['eq:fare']}). Larger $\varepsilon$ values result in substantial drops in clean data accuracy and early training instability. LORE (with $\rho=0.1$) mitigates this effect, even on ($\varepsilon = 10$), maintaining stable and higher clean data accuracy. (b) Pareto frontier comparison between naive regularization of Eq. (\ref{['eq:fare']}) (blue, varying $\lambda$) and LORE (orange, varying $\rho$). LORE yields a strictly better empirical Pareto front, demonstrating superior trade-offs.
  • Figure 2: Lagrangian-Optimized Robust Embeddings (LORE)
  • Figure 3: Influence of constraint threshold $\rho$ on model behavior. As $\rho$ increases, robustness improves at the cost of clean data accuracy, cosine alignment, and embedding fidelity, highlighting the effectiveness of controlling the trade-off between robustness and fidelity by tuning $\rho$ in LORE.
  • Figure 4: (a) Robustness to common corruptions on ImageNet-C as an OOD evaluation. (b) Embedding interpretability assessment based on average cosine similarity between clean image embeddings and 150 corresponding GPT4-generated text templates.
  • Figure 5: Comparison of LORE and FARE across different training and evaluation perturbations $(\varepsilon)$. LORE consistently outperforms FARE, particularly at higher $\varepsilon$ values, achieving higher robust accuracy while maintaining better clean performance, especially at higher perturbation intensities.
  • ...and 29 more figures

Theorems & Definitions (6)

  • Theorem 6.2: Robustness Suboptimality Bounds
  • proof
  • proof
  • Theorem E.1: Generalization Gap in Adversarial Training
  • Proposition F.2
  • proof