TwinTURBO: Semi-Supervised Fine-Tuning of Foundation Models via Mutual Information Decompositions for Downstream Task and Latent Spaces
Guillaume Quétant, Pavlo Molchanov, Slava Voloshynovskiy
TL;DR
TwinTURBO tackles the challenge of fine-tuning foundation models with extremely limited labels by exploiting mutual information decomposition. It derives two lower bounds: one on the downstream task space $I(X;Y)$ and another on latent representations $I(X;Z^*)$, implemented via density parameterisations and a discriminator to manage the KL term, all within a lightweight adapter-based setup. The method realises practical losses (Categorical, Binary, and InfoNCE variants) and a discriminator-based critic to leverage unlabeled data, plus latent-space alignment losses to stabilize representations. Empirical results on MNIST, CIFAR-10, and SVHN under low-label regimes show substantial accuracy gains and reduced variance, underscoring the value of information-theoretic objectives for semi-supervised fine-tuning and hinting at extensions to multimodal settings.
Abstract
We present a semi-supervised fine-tuning framework for foundation models that utilises mutual information decomposition to address the challenges of training for a limited amount of labelled data. Our approach derives two distinct lower bounds: i) for the downstream task space, such as classification, optimised using conditional and marginal cross-entropy alongside Kullback-Leibler divergence, and ii) for the latent space representation, regularised and aligned using a contrastive-like decomposition. This fine-tuning strategy retains the pre-trained structure of the foundation model, modifying only a specialised projector module comprising a small transformer and a token aggregation technique. Experiments on several datasets demonstrate significant improvements in classification tasks under extremely low-labelled conditions by effectively leveraging unlabelled data.
