LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders
Borna Khodabandeh, Amirabbas Afzali, Amirhossein Afsharrad, Seyed Shahabeddin Mousavi, Sanjay Lall, Sajjad Amini, Seyed-Mohsen Moosavi-Dezfooli
TL;DR
LORE introduces a Lagrangian-duality–based proximity constraint to a frozen reference embedding for unsupervised adversarial fine-tuning of vision encoders. By enforcing per-sample embedding proximity while optimizing worst-case perturbations, it achieves superior robustness with minimal loss of clean accuracy across CLIP and DINOv2 backbones, and it improves out-of-distribution robustness and embedding interpretability. The framework provides a tunable parameter ρ to control the robustness–fidelity trade-off and demonstrates strong performance in zero-shot and in-domain classification, with extensive ablations validating adaptive margins and dual-network elasticity. The work also shows the method's viability for supervised extensions and segmentation tasks, highlighting its broad applicability to foundation models. Overall, LORE offers a principled, scalable approach to stabilizing adversarial fine-tuning while preserving semantic fidelity in visual encoders.
Abstract
Visual encoders have become fundamental components in modern computer vision pipelines. However, ensuring robustness against adversarial perturbations remains a critical challenge. Recent efforts have explored both supervised and unsupervised adversarial fine-tuning strategies. We identify two key limitations in these approaches: (i) they often suffer from instability, especially during the early stages of fine-tuning, resulting in suboptimal convergence and degraded performance on clean data, and (ii) they exhibit a suboptimal trade-off between robustness and clean data accuracy, hindering the simultaneous optimization of both objectives. To overcome these challenges, we propose Lagrangian-Optimized Robust Embeddings (LORE), a novel unsupervised adversarial fine-tuning framework. LORE utilizes constrained optimization, which offers a principled approach to balancing competing goals, such as improving robustness while preserving nominal performance. By enforcing embedding-space proximity constraints, LORE effectively maintains clean data performance throughout adversarial fine-tuning. Extensive experiments show that LORE significantly improves zero-shot adversarial robustness with minimal degradation in clean data accuracy. Furthermore, we demonstrate the effectiveness of the adversarially fine-tuned CLIP image encoder in out-of-distribution generalization and enhancing the interpretability of image embeddings.
