Model Reprogramming Outperforms Fine-tuning on Out-of-distribution Data in Text-Image Encoders
Andrew Geng, Pin-Yu Chen
TL;DR
This work addresses the problem that standard fine-tuning of CLIP-like text-image encoders can degrade out-of-distribution (OOD) generalization and OOD detection. It introduces Reprogrammer, a lightweight input-transformation approach that reuses pre-trained parameters, and Residual Reprogrammer, which adds a residual connection to better preserve pre-training representations. Empirical results on CIFAR-10 and ImageNet-1k show that Reprogrammer methods consistently outperform traditional fine-tuning across ID, OOD generalization, and OOD detection, with Residual Reprogrammer achieving the strongest holistic gains. The study highlights the importance of maintaining pre-training representations for robust downstream performance and suggests reprogramming as a practical, efficient alternative for multi-modal text-image encoders.
Abstract
When evaluating the performance of a pre-trained model transferred to a downstream task, it is imperative to assess not only the in-distribution (ID) accuracy of the downstream model but also its capacity to generalize and identify out-of-distribution (OOD) samples. In this paper, we unveil the hidden costs associated with intrusive fine-tuning techniques. Specifically, we demonstrate that commonly used fine-tuning methods not only distort the representations necessary for generalizing to covariate-shifted OOD samples (OOD generalization) but also distort the representations necessary for detecting semantically-shifted OOD samples (OOD detection). To address these challenges, we introduce a new model reprogramming approach for fine-tuning, which we name Reprogrammer. Reprogrammer aims to improve the holistic performance of the downstream model across ID, OOD generalization, and OOD detection tasks. Our empirical evidence reveals that Reprogrammer is less intrusive and yields superior downstream models. Furthermore, we demonstrate that by appending an additional representation residual connection to Reprogrammer, we can further preserve pre-training representations, resulting in an even more safe and robust downstream model capable of excelling in many ID classification, OOD generalization, and OOD detection settings.
