AdaResNet: Enhancing Residual Networks with Dynamic Weight Adjustment for Improved Feature Integration
Hong Su
TL;DR
AdaResNet tackles the fixed 1:1 skip-connection fusion in ResNet by introducing a trainable weight $weight_{tfd}^{ipd}$ that dynamically balances input represent data ($ipd$) and transformed data ($tfd$) during training. The method provides gradient-based updates for the weight integrated into forward and backward passes, with potential per-layer or per-stage weighting. Empirical results on CIFAR-10 with ResNet-50 show substantial accuracy gains over traditional ResNet, and analysis reveals that optimal weights are layer- and task-dependent, supporting adaptive skip connections as a generalizable improvement. This approach offers a flexible mechanism to improve deep network training and generalization across architectures and datasets.
Abstract
In very deep neural networks, gradients can become extremely small during backpropagation, making it challenging to train the early layers. ResNet (Residual Network) addresses this issue by enabling gradients to flow directly through the network via skip connections, facilitating the training of much deeper networks. However, in these skip connections, the input ipd is directly added to the transformed data tfd, treating ipd and tfd equally, without adapting to different scenarios. In this paper, we propose AdaResNet (Auto-Adapting Residual Network), which automatically adjusts the ratio between ipd and tfd based on the training data. We introduce a variable, weight}_{tfd}^{ipd, to represent this ratio. This variable is dynamically adjusted during backpropagation, allowing it to adapt to the training data rather than remaining fixed. Experimental results demonstrate that AdaResNet achieves a maximum accuracy improvement of over 50\% compared to traditional ResNet.
