Table of Contents
Fetching ...

Dynamic Continual Learning: Harnessing Parameter Uncertainty for Improved Network Adaptation

Christopher Angelini, Nidhal Bouaynaya

TL;DR

This work proposes using parameter-based uncertainty to determine which parameters are relevant to a network’s learned function and regularize training to prevent change in these important parameters, using a Bayesian Moment Propagation framework.

Abstract

When fine-tuning Deep Neural Networks (DNNs) to new data, DNNs are prone to overwriting network parameters required for task-specific functionality on previously learned tasks, resulting in a loss of performance on those tasks. We propose using parameter-based uncertainty to determine which parameters are relevant to a network's learned function and regularize training to prevent change in these important parameters. We approach this regularization in two ways: (1), we constrain critical parameters from significant changes by associating more critical parameters with lower learning rates, thereby limiting alterations in those parameters; (2), important parameters are restricted from change by imposing a higher regularization weighting, causing parameters to revert to their states prior to the learning of subsequent tasks. We leverage a Bayesian Moment Propagation framework which learns network parameters concurrently with their associated uncertainties while allowing each parameter to contribute uncertainty to the network's predictive distribution, avoiding the pitfalls of existing sampling-based methods. The proposed approach is evaluated for common sequential benchmark datasets and compared to existing published approaches from the Continual Learning community. Ultimately, we show improved Continual Learning performance for Average Test Accuracy and Backward Transfer metrics compared to sampling-based methods and other non-uncertainty-based approaches.

Dynamic Continual Learning: Harnessing Parameter Uncertainty for Improved Network Adaptation

TL;DR

This work proposes using parameter-based uncertainty to determine which parameters are relevant to a network’s learned function and regularize training to prevent change in these important parameters, using a Bayesian Moment Propagation framework.

Abstract

When fine-tuning Deep Neural Networks (DNNs) to new data, DNNs are prone to overwriting network parameters required for task-specific functionality on previously learned tasks, resulting in a loss of performance on those tasks. We propose using parameter-based uncertainty to determine which parameters are relevant to a network's learned function and regularize training to prevent change in these important parameters. We approach this regularization in two ways: (1), we constrain critical parameters from significant changes by associating more critical parameters with lower learning rates, thereby limiting alterations in those parameters; (2), important parameters are restricted from change by imposing a higher regularization weighting, causing parameters to revert to their states prior to the learning of subsequent tasks. We leverage a Bayesian Moment Propagation framework which learns network parameters concurrently with their associated uncertainties while allowing each parameter to contribute uncertainty to the network's predictive distribution, avoiding the pitfalls of existing sampling-based methods. The proposed approach is evaluated for common sequential benchmark datasets and compared to existing published approaches from the Continual Learning community. Ultimately, we show improved Continual Learning performance for Average Test Accuracy and Backward Transfer metrics compared to sampling-based methods and other non-uncertainty-based approaches.
Paper Structure (21 sections, 9 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 9 equations, 2 figures, 1 table, 1 algorithm.

Figures (2)

  • Figure 1: Analysis of parameter uncertainty from a two 800-node hidden layer fully connected network (A) Cumulative Distribution Function plot of the Signal-to-Noise Ratio (SNR) demonstrating $95\%$ of the parameters are approximately -400dB SNR. (B) Cumulative Distribution Function plot of the Variance demonstrating $95\%$ of the parameters have a variance of 1 or greater.
  • Figure 2: Loss in performance from the original validation accuracy as a result of various pruning methods. Moment Propagation performance is presented in warm colors, Bayes-by-Backprop (BBB) performance in cool colors, and deterministic performance in grey and black.