Table of Contents
Fetching ...

Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models

Kaican Li, Weiyan Xie, Yongxiang Huang, Didan Deng, Lanqing Hong, Zhenguo Li, Ricardo Silva, Nevin L. Zhang

TL;DR

Fine-tuning zero-shot foundation models often degrades robustness to distribution shifts by overfitting to non-core features. The paper introduces Dual Risk Minimization (DRM), which couples ERM with WRM and uses LLM-generated concept descriptions to proxy core-feature risk, enabling a tractable optimization that preserves core features while maintaining good average performance. DRM achieves new state-of-the-art OOD results across ImageNet variants, iWildCam, and FMoW, including sizable gains without and with WiSE-FT, albeit with modest computational overhead. The combination of dual prompts, robust p_c(y|x) estimation, and dual-inference blending yields practical, scalable robustness improvements for fine-tuning zero-shot CLIP models.

Abstract

Fine-tuning foundation models often compromises their robustness to distribution shifts. To remedy this, most robust fine-tuning methods aim to preserve the pre-trained features. However, not all pre-trained features are robust and those methods are largely indifferent to which ones to preserve. We propose dual risk minimization (DRM), which combines empirical risk minimization with worst-case risk minimization, to better preserve the core features of downstream tasks. In particular, we utilize core-feature descriptions generated by LLMs to induce core-based zero-shot predictions which then serve as proxies to estimate the worst-case risk. DRM balances two crucial aspects of model robustness: expected performance and worst-case performance, establishing a new state of the art on various real-world benchmarks. DRM significantly improves the out-of-distribution performance of CLIP ViT-L/14@336 on ImageNet (75.9 to 77.1), WILDS-iWildCam (47.1 to 51.8), and WILDS-FMoW (50.7 to 53.1); opening up new avenues for robust fine-tuning. Our code is available at https://github.com/vaynexie/DRM .

Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models

TL;DR

Fine-tuning zero-shot foundation models often degrades robustness to distribution shifts by overfitting to non-core features. The paper introduces Dual Risk Minimization (DRM), which couples ERM with WRM and uses LLM-generated concept descriptions to proxy core-feature risk, enabling a tractable optimization that preserves core features while maintaining good average performance. DRM achieves new state-of-the-art OOD results across ImageNet variants, iWildCam, and FMoW, including sizable gains without and with WiSE-FT, albeit with modest computational overhead. The combination of dual prompts, robust p_c(y|x) estimation, and dual-inference blending yields practical, scalable robustness improvements for fine-tuning zero-shot CLIP models.

Abstract

Fine-tuning foundation models often compromises their robustness to distribution shifts. To remedy this, most robust fine-tuning methods aim to preserve the pre-trained features. However, not all pre-trained features are robust and those methods are largely indifferent to which ones to preserve. We propose dual risk minimization (DRM), which combines empirical risk minimization with worst-case risk minimization, to better preserve the core features of downstream tasks. In particular, we utilize core-feature descriptions generated by LLMs to induce core-based zero-shot predictions which then serve as proxies to estimate the worst-case risk. DRM balances two crucial aspects of model robustness: expected performance and worst-case performance, establishing a new state of the art on various real-world benchmarks. DRM significantly improves the out-of-distribution performance of CLIP ViT-L/14@336 on ImageNet (75.9 to 77.1), WILDS-iWildCam (47.1 to 51.8), and WILDS-FMoW (50.7 to 53.1); opening up new avenues for robust fine-tuning. Our code is available at https://github.com/vaynexie/DRM .

Paper Structure

This paper contains 54 sections, 3 theorems, 19 equations, 3 figures, 13 tables.

Key Result

Theorem 1

Strong duality holds between IDRM and the following dual problem:

Figures (3)

  • Figure 1: Dual risk minimization (DRM) combines empirical risk minimization (ERM) and worst-case risk minimization (WRM) to complement their weaknesses. In this binary classification task predicting if there are skis in a given image, (i) ERM underperforms when the core features of skis are clear but the non-core features such as background/context are spurious (i.e. negatively correlated with ski), and (ii) WRM underperforms when the core features are unclear but the non-core features are robust (i.e. positively correlated with ski). DRM outperforms ERM and WRM when the core features are not always clear and the non-core features are more often robust than not.
  • Figure 2: Concept descriptions better capture core features than default prompts. The affinities between images and default prompts ( df) are not stable w.r.t. changes in image background (BG) containing non-core features and are insensitive to changes in image foreground (FG) containing core features, as indicated by the relative changes (gray numbers in parentheses) w.r.t. the affinities of the original images. In contrast, the affinities between images and concept descriptions ( cd) are stable w.r.t. to changes in BG while being highly responsive to changes in FG, making them a good detector for core features. See Appendix \ref{['appendix:more_examples']} for more examples and a full quantitative study on this.
  • Figure 3: Concept description prompts ( cd) yield affinities which are more robust to the change of context information than the affinities yielded by the default text prompts ( df).

Theorems & Definitions (5)

  • Theorem 1
  • Lemma 1
  • proof
  • Theorem 1
  • proof