Table of Contents
Fetching ...

Less-forgetting Learning in Deep Neural Networks

Heechul Jung, Jeongwoo Ju, Minju Jung, Junmo Kim

TL;DR

Catastrophic forgetting in DNNs during domain adaptation is a major challenge, especially when source data are unavailable during target learning. The authors propose Less-forgetting Learning (LF), which freezes the classifier boundary and minimizes a joint loss combining cross-entropy on target data with an Euclidean penalty that aligns hidden features with those of the source network. They further extend the approach to general SGD training by addressing forgetting between mini-batches and introducing an alternating training scheme. Empirical results on CIFAR-10, MNIST, and SVHN show LF improves retention of source-domain information and enhances generalization compared with traditional transfer learning and activation-function-based methods.

Abstract

A catastrophic forgetting problem makes deep neural networks forget the previously learned information, when learning data collected in new environments, such as by different sensors or in different light conditions. This paper presents a new method for alleviating the catastrophic forgetting problem. Unlike previous research, our method does not use any information from the source domain. Surprisingly, our method is very effective to forget less of the information in the source domain, and we show the effectiveness of our method using several experiments. Furthermore, we observed that the forgetting problem occurs between mini-batches when performing general training processes using stochastic gradient descent methods, and this problem is one of the factors that degrades generalization performance of the network. We also try to solve this problem using the proposed method. Finally, we show our less-forgetting learning method is also helpful to improve the performance of deep neural networks in terms of recognition rates.

Less-forgetting Learning in Deep Neural Networks

TL;DR

Catastrophic forgetting in DNNs during domain adaptation is a major challenge, especially when source data are unavailable during target learning. The authors propose Less-forgetting Learning (LF), which freezes the classifier boundary and minimizes a joint loss combining cross-entropy on target data with an Euclidean penalty that aligns hidden features with those of the source network. They further extend the approach to general SGD training by addressing forgetting between mini-batches and introducing an alternating training scheme. Empirical results on CIFAR-10, MNIST, and SVHN show LF improves retention of source-domain information and enhances generalization compared with traditional transfer learning and activation-function-based methods.

Abstract

A catastrophic forgetting problem makes deep neural networks forget the previously learned information, when learning data collected in new environments, such as by different sensors or in different light conditions. This paper presents a new method for alleviating the catastrophic forgetting problem. Unlike previous research, our method does not use any information from the source domain. Surprisingly, our method is very effective to forget less of the information in the source domain, and we show the effectiveness of our method using several experiments. Furthermore, we observed that the forgetting problem occurs between mini-batches when performing general training processes using stochastic gradient descent methods, and this problem is one of the factors that degrades generalization performance of the network. We also try to solve this problem using the proposed method. Finally, we show our less-forgetting learning method is also helpful to improve the performance of deep neural networks in terms of recognition rates.

Paper Structure

This paper contains 8 sections, 3 equations, 4 figures, 2 tables, 2 algorithms.

Figures (4)

  • Figure 1: Visualization of the feature space for ten classes using t-SNE van2008visualizing. Each color represents each class. Filled circles denote features of the source training data extracted by the source network. Circles represent features of the source training data extracted by the target network. (a) Traditional transfer learning method. (b) Proposed method.
  • Figure 2: Conceptual diagram for describing a less-forgetting method. Our learning method uses the trained weights of the source network as the initial weights of the target network and minimizes two loss functions simultaneously.
  • Figure 3: Graphs for observing forgetting in general learning approach. The x-axis denotes the value of iteration number, and the y-axis represents the training loss value and training accuracy of a particular batch.
  • Figure 4: Source accuracy versus target accuracy. On the left side is the object recognition source accuracy versus the target accuracy in CIFAR-10. On the right side is the digit recognition source (MNIST) accuracy versus the target (SVHN) accuracy. The accuracy curve is generated according to the value of $\lambda_e$ in Equation \ref{['eq:total_loss']}.