Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

Yuxiang Lu; Shengcao Cao; Yu-Xiong Wang

Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

Yuxiang Lu, Shengcao Cao, Yu-Xiong Wang

TL;DR

This work tackles the uneven transfer of knowledge from diverse Vision Foundation Models by revealing their task-specific biases and proposing a bias-preserving distillation framework. The Swiss Army Knife (SAK) integrates a Teacher-Agnostic Stem with per-teacher Adapter Paths and a Mixture-of-Representations Router to dynamically fuse representations for multiple tasks, trained in two stages on ImageNet and downstream data. Empirically, SAK achieves state-of-the-art multi-task gains on PASCAL-Context and NYUD-v2, notably surpassing prior multi-teacher distillation approaches while maintaining efficiency, with robust ablations supporting the importance of bias preservation and adaptive fusion. This approach offers a scalable pathway to harness multiple VFMs for coordinated, robust multi-task vision, enabling easier extension to new teachers and tasks while reducing inference overhead.

Abstract

Vision Foundation Models (VFMs) have demonstrated outstanding performance on numerous downstream tasks. However, due to their inherent representation biases originating from different training paradigms, VFMs exhibit advantages and disadvantages across distinct vision tasks. Although amalgamating the strengths of multiple VFMs for downstream tasks is an intuitive strategy, effectively exploiting these biases remains a significant challenge. In this paper, we propose a novel and versatile "Swiss Army Knife" (SAK) solution, which adaptively distills knowledge from a committee of VFMs to enhance multi-task learning. Unlike existing methods that use a single backbone for knowledge transfer, our approach preserves the unique representation bias of each teacher by collaborating the lightweight Teacher-Specific Adapter Path modules with the Teacher-Agnostic Stem. Through dynamic selection and combination of representations with Mixture-of-Representations Routers, our SAK is capable of synergizing the complementary strengths of multiple VFMs. Extensive experiments show that our SAK remarkably outperforms prior state of the arts in multi-task learning by 10% on the NYUD-v2 benchmark, while also providing a flexible and robust framework that can readily accommodate more advanced model designs. Project page: https://innovator-zero.github.io/SAK/ .

Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

TL;DR

Abstract

Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)