Table of Contents
Fetching ...

DoubleCCA: Improving Foundation Model Group Robustness with Random Sentence Embeddings

Hong Liu, Yitong Lu

TL;DR

A simple yet effective method that leverages random sentences and Canonical Correlation Analysis to enrich the text embeddings of the foundation model and uses CCA double twice to align the representations and reconstruct them back to the original representation space is proposed.

Abstract

This paper presents a novel method to improve the robustness of foundation models to group-based biases. We propose a simple yet effective method, called DoubleCCA, that leverages random sentences and Canonical Correlation Analysis (CCA) to enrich the text embeddings of the foundation model. First, we generate various random sentences that augment the original prompts, which extends the original prompts with random words or character sequences. Second, we use an additional sentence embedding model to generate different text embeddings with respect to these random sentences. We then use CCA double twice to align the representations and reconstruct them back to the original representation space. We demonstrate the effectiveness of our method on a variety of tasks and datasets, showing that it outperforms existing methods in terms of both performance and robustness. Our method is simple to implement and can be easily integrated into existing models, making it a practical solution for improving the robustness of foundation models to group-based biases.

DoubleCCA: Improving Foundation Model Group Robustness with Random Sentence Embeddings

TL;DR

A simple yet effective method that leverages random sentences and Canonical Correlation Analysis to enrich the text embeddings of the foundation model and uses CCA double twice to align the representations and reconstruct them back to the original representation space is proposed.

Abstract

This paper presents a novel method to improve the robustness of foundation models to group-based biases. We propose a simple yet effective method, called DoubleCCA, that leverages random sentences and Canonical Correlation Analysis (CCA) to enrich the text embeddings of the foundation model. First, we generate various random sentences that augment the original prompts, which extends the original prompts with random words or character sequences. Second, we use an additional sentence embedding model to generate different text embeddings with respect to these random sentences. We then use CCA double twice to align the representations and reconstruct them back to the original representation space. We demonstrate the effectiveness of our method on a variety of tasks and datasets, showing that it outperforms existing methods in terms of both performance and robustness. Our method is simple to implement and can be easily integrated into existing models, making it a practical solution for improving the robustness of foundation models to group-based biases.

Paper Structure

This paper contains 18 sections, 8 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: The pipeline of our proposed DoubleCCA. We leverage random words to augment semantic descriptions and introduce an additional sentence embedding model to complement the semantic limitations of the original VLM text encoder. We use classical CCA technique double twice to merge different semantic information, which helps to improve the group robustness of the CLIP model.
  • Figure 2: We compare the performance of different prompts with different backbone models on the Waterbirds dataset. "Ori" denotes the original prompt of CLIP, i.e., "a photo of a $\langle$class name$\rangle$". "Waffle-1" denotes the combination of the original prompt and the random words, i.e., "a photo of a $\langle$class name$\rangle$, which has $\langle$random word$\rangle$". "Waffle-2" also denotes the combination of the original prompt and the random words, but with different template, i.e., "a photo of a $\langle$class name$\rangle$, $\langle$random characters$\rangle$".
  • Figure 3: The visualization of the image embeddings of the Waterbirds dataset. We also visualize the text embedding features extracted by the CLIP text encoder. The "Ori prompt" means the original prompt, i.e., "a photo of a $\langle$class name$\rangle$". The "Waffle prompt" denote the prompts with the random words and characters.
  • Figure 4: Combination of Contrastive Adapter (CA) and our proposed DoubleCCA. We report the average accuracy and worst group robustness on the Waterbirds dataset. The backbone model is ViT-L/14 and ResNet-50.
  • Figure 5: Ablation study results on the Waterbirds dataset.
  • ...and 1 more figures