Table of Contents
Fetching ...

Decorrelating Structure via Adapters Makes Ensemble Learning Practical for Semi-supervised Learning

Jiaqi Wu, Junbiao Pang, Qingming Huang

TL;DR

DSA tackles robustness gaps in semi-supervised learning by replacing decorrelation losses with a structured adapter-based mechanism that decorrelates multiple heads. By inserting unique adapters before each head, DSA maps shared features into diverse spaces, reducing inter-head correlation without additional loss terms. Theoretical analysis shows DSA achieves lower head correlation and variance than single-head baselines and loss-based decorrelation methods, improving reliability under noisy labels and data corruption. Empirically, DSA yields significant gains in SSL classification (CIFAR-10/100) and SSL pose estimation (Sniffing, FLIC, LSP) while maintaining competitive computational costs.

Abstract

In computer vision, traditional ensemble learning methods exhibit either a low training efficiency or the limited performance to enhance the reliability of deep neural networks. In this paper, we propose a lightweight, loss-function-free, and architecture-agnostic ensemble learning by the Decorrelating Structure via Adapters (DSA) for various visual tasks. Concretely, the proposed DSA leverages the structure-diverse adapters to decorrelate multiple prediction heads without any tailed regularization or loss. This allows DSA to be easily extensible to architecture-agnostic networks for a range of computer vision tasks. Importantly, the theoretically analysis shows that the proposed DSA has a lower bias and variance than that of the single head based method (which is adopted by most of the state of art approaches). Consequently, the DSA makes deep networks reliable and robust for the various real-world challenges, \textit{e.g.}, data corruption, and label noises. Extensive experiments combining the proposed method with FreeMatch achieved the accuracy improvements of 5.35% on CIFAR-10 dataset with 40 labeled data and 0.71% on CIFAR-100 dataset with 400 labeled data. Besides, combining the proposed method with DualPose achieved the improvements in the Percentage of Correct Keypoints (PCK) by 2.08% on the Sniffing dataset with 100 data (30 labeled data), 5.2% on the FLIC dataset with 100 data (including 50 labeled data), and 2.35% on the LSP dataset with 200 data (100 labeled data).

Decorrelating Structure via Adapters Makes Ensemble Learning Practical for Semi-supervised Learning

TL;DR

DSA tackles robustness gaps in semi-supervised learning by replacing decorrelation losses with a structured adapter-based mechanism that decorrelates multiple heads. By inserting unique adapters before each head, DSA maps shared features into diverse spaces, reducing inter-head correlation without additional loss terms. Theoretical analysis shows DSA achieves lower head correlation and variance than single-head baselines and loss-based decorrelation methods, improving reliability under noisy labels and data corruption. Empirically, DSA yields significant gains in SSL classification (CIFAR-10/100) and SSL pose estimation (Sniffing, FLIC, LSP) while maintaining competitive computational costs.

Abstract

In computer vision, traditional ensemble learning methods exhibit either a low training efficiency or the limited performance to enhance the reliability of deep neural networks. In this paper, we propose a lightweight, loss-function-free, and architecture-agnostic ensemble learning by the Decorrelating Structure via Adapters (DSA) for various visual tasks. Concretely, the proposed DSA leverages the structure-diverse adapters to decorrelate multiple prediction heads without any tailed regularization or loss. This allows DSA to be easily extensible to architecture-agnostic networks for a range of computer vision tasks. Importantly, the theoretically analysis shows that the proposed DSA has a lower bias and variance than that of the single head based method (which is adopted by most of the state of art approaches). Consequently, the DSA makes deep networks reliable and robust for the various real-world challenges, \textit{e.g.}, data corruption, and label noises. Extensive experiments combining the proposed method with FreeMatch achieved the accuracy improvements of 5.35% on CIFAR-10 dataset with 40 labeled data and 0.71% on CIFAR-100 dataset with 400 labeled data. Besides, combining the proposed method with DualPose achieved the improvements in the Percentage of Correct Keypoints (PCK) by 2.08% on the Sniffing dataset with 100 data (30 labeled data), 5.2% on the FLIC dataset with 100 data (including 50 labeled data), and 2.35% on the LSP dataset with 200 data (100 labeled data).
Paper Structure (28 sections, 2 theorems, 15 equations, 5 figures, 9 tables)

This paper contains 28 sections, 2 theorems, 15 equations, 5 figures, 9 tables.

Key Result

Lemma 1

Due to $\Delta\mathcal{H}_1 \ll \mathcal{H}$ and $\Delta\mathcal{H}_2 \ll \mathcal{H}$, we approximate $\tilde{H} \approx \mathcal{H} + \Delta\mathcal{H}_1 \approx \mathcal{H} + \Delta\mathcal{H}_2$. The relationship between $\mathcal{C}_{\mathcal{F}^{DSA}}$ and $\mathcal{C}_{\mathcal{F}^{SDoAs}}$ i where $\mathbb{E}(\cdot)$ and $\text{var}(\cdot)$ denote the expectation and variance functions, re

Figures (5)

  • Figure 1: A comparison of structure between the classical ensemble methods and our DSA is presented. The term "Low Decorrelation Parts" in the figure denotes the components within each branch that exhibit low correlation due to the influence of the network structure or loss function. In the branch structure, a higher proportion of low-correlation parts contributes to enhancing the accuracy of ensemble predictions.
  • Figure 2: The neural network structure of DSA is as follows. The basic CBE, as depicted in (b), reduces correlation through the use of Low Bias loss. Our proposed AdapterCB module, as depicted in (a), further reduces correlation by incorporating a set of adapters.
  • Figure 3: Samples from the Sniffing (top row), FLIC (middle row), and LSP (bottom row) datasets are presented.
  • Figure 4: The comparison results of feature maps output by the $5$ adapters in FreeMatch+DSA on the same channel are presented. The displayed features are 0.25x scaled versions of the original features.
  • Figure 5: Comparing the similarity among the prediction heads of FreeMatch+MHE, FreeMatch+CBE, and FreeMatch+DSA.

Theorems & Definitions (2)

  • Lemma 1: DSA exhibits lower correlation than CBE.
  • Lemma 2: DSA exhibits lower correlation than SDoAs.