Table of Contents
Fetching ...

Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning

Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Jing Xiao

TL;DR

The paper tackles redundancy and inefficiency in multi-model fusion by introducing Adversarial Complementary Representation Learning (ACoRL), which compels a new alliance model to learn distinct, complementary representations from a set of pre-trained models using a gradient reversal mechanism. The approach defines a min–max objective that jointly optimizes a task loss and an adversarial loss to expand the latent representation space, promoting diversity across models. Empirical results on image classification (ImageNet-100) and speaker verification (VoxCeleb1/2) show that ACoRL-based fusion outperforms traditional MMF, with attribution analyses confirming the emergence of complementary knowledge in the learned representations. Overall, ACoRL offers a general, efficient framework for robust multi-model fusion with potential applicability across diverse domains and tasks.

Abstract

Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial complementary representation learning (ACoRL) framework that enables newly trained models to avoid previously acquired knowledge, allowing each individual component model to learn maximally distinct, complementary representations. We make three detailed explanations of why this works and experimental results demonstrate that our method more efficiently improves performance compared to traditional MMF. Furthermore, attribution analysis validates the model trained under ACoRL acquires more complementary knowledge, highlighting the efficacy of our approach in enhancing efficiency and robustness across tasks.

Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning

TL;DR

The paper tackles redundancy and inefficiency in multi-model fusion by introducing Adversarial Complementary Representation Learning (ACoRL), which compels a new alliance model to learn distinct, complementary representations from a set of pre-trained models using a gradient reversal mechanism. The approach defines a min–max objective that jointly optimizes a task loss and an adversarial loss to expand the latent representation space, promoting diversity across models. Empirical results on image classification (ImageNet-100) and speaker verification (VoxCeleb1/2) show that ACoRL-based fusion outperforms traditional MMF, with attribution analyses confirming the emergence of complementary knowledge in the learned representations. Overall, ACoRL offers a general, efficient framework for robust multi-model fusion with potential applicability across diverse domains and tasks.

Abstract

Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial complementary representation learning (ACoRL) framework that enables newly trained models to avoid previously acquired knowledge, allowing each individual component model to learn maximally distinct, complementary representations. We make three detailed explanations of why this works and experimental results demonstrate that our method more efficiently improves performance compared to traditional MMF. Furthermore, attribution analysis validates the model trained under ACoRL acquires more complementary knowledge, highlighting the efficacy of our approach in enhancing efficiency and robustness across tasks.
Paper Structure (10 sections, 1 equation, 4 figures, 2 tables)

This paper contains 10 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The overview of ACoRL framework.
  • Figure 2: Illustration of the process of Explain 3.
  • Figure 3: Attribution analysis of image samples on ImageNet-100 by GradCAM Chattopadhyay2017GradCAMGG method. From top to bottom are the newly trained model A, model B, and model B trained under ACoRL on pre-trained model A. Red regions represent focus.
  • Figure 4: Attribution analysis of audio samples on VoxCeleb by integrated gradients Sundararajan2017AxiomaticAF method. Three rows are the same as Figure \ref{['fig:attri_ic']}. The blue and red regions represent positive and negative gradients. (click on the figure to hear the sound)