Multi-Task Learning with Additive U-Net for Image Denoising and Classification
Vikram Lakkavalli, Neelam Sinha
TL;DR
The paper addresses how concatenative skip connections in U-Net can inflate representational capacity and complicate joint optimization for denoising and recognition. It proposes AddUNet, which uses additive skip fusion with fixed dimensionality and learnable nonnegative gates to regulate encoder–decoder information flow, optimized via a joint denoising/classification objective $\mathcal{L} = \mathcal{L}_{\mathrm{den}} + \lambda \mathcal{L}_{\mathrm{cls}}$. Key findings include improved training stability and generalization in denoising-centric MTL, competitive denoising performance under constrained capacity, and task-aware redistribution of skip weights that decouple reconstruction from discrimination. The learned gates offer an inference-time mechanism to modulate fusion, and experiments with baselines and frequency analyses demonstrate AddUNet as a principled architectural regularizer for stable, scalable multi-task learning without added parameter count.
Abstract
We investigate additive skip fusion in U-Net architectures for image denoising and denoising-centric multi-task learning (MTL). By replacing concatenative skips with gated additive fusion, the proposed Additive U-Net (AddUNet) constrains shortcut capacity while preserving fixed feature dimensionality across depth. This structural regularization induces controlled encoder-decoder information flow and stabilizes joint optimization. Across single-task denoising and joint denoising-classification settings, AddUNet achieves competitive reconstruction performance with improved training stability. In MTL, learned skip weights exhibit systematic task-aware redistribution: shallow skips favor reconstruction, while deeper features support discrimination. Notably, reconstruction remains robust even under limited classification capacity, indicating implicit task decoupling through additive fusion. These findings show that simple constraints on skip connections act as an effective architectural regularizer for stable and scalable multi-task learning without increasing model complexity.
