Table of Contents
Fetching ...

Task-conditioned Ensemble of Expert Models for Continuous Learning

Renu Sharma, Debasmita Pal, Arun Ross

TL;DR

This work tackles continual learning under distribution shifts by introducing a task-conditioned ensemble of expert models augmented with an in-domain model that estimates task membership. The in-domain component uses a Vision Transformer feature extractor trained with center and mean-shifted intra-class losses, plus a LOF-based distance measure to generate membership scores, enabling a dynamic fusion of task-specific experts via $s = s_1 m_1 + s_2 m_2$. Across LivDet iris datasets and Split MNIST, the approach delivers strong retention of old-task performance and competitive accuracy under various shifts while reducing memory demands compared with replay-based methods. The results highlight effective membership allocation, robust representations, and scalable paths through distillation or model merging to handle growing task pools.

Abstract

One of the major challenges in machine learning is maintaining the accuracy of the deployed model (e.g., a classifier) in a non-stationary environment. The non-stationary environment results in distribution shifts and, consequently, a degradation in accuracy. Continuous learning of the deployed model with new data could be one remedy. However, the question arises as to how we should update the model with new training data so that it retains its accuracy on the old data while adapting to the new data. In this work, we propose a task-conditioned ensemble of models to maintain the performance of the existing model. The method involves an ensemble of expert models based on task membership information. The in-domain models-based on the local outlier concept (different from the expert models) provide task membership information dynamically at run-time to each probe sample. To evaluate the proposed method, we experiment with three setups: the first represents distribution shift between tasks (LivDet-Iris-2017), the second represents distribution shift both between and within tasks (LivDet-Iris-2020), and the third represents disjoint distribution between tasks (Split MNIST). The experiments highlight the benefits of the proposed method. The source code is available at https://github.com/iPRoBe-lab/Continuous_Learning_FE_DM.

Task-conditioned Ensemble of Expert Models for Continuous Learning

TL;DR

This work tackles continual learning under distribution shifts by introducing a task-conditioned ensemble of expert models augmented with an in-domain model that estimates task membership. The in-domain component uses a Vision Transformer feature extractor trained with center and mean-shifted intra-class losses, plus a LOF-based distance measure to generate membership scores, enabling a dynamic fusion of task-specific experts via . Across LivDet iris datasets and Split MNIST, the approach delivers strong retention of old-task performance and competitive accuracy under various shifts while reducing memory demands compared with replay-based methods. The results highlight effective membership allocation, robust representations, and scalable paths through distillation or model merging to handle growing task pools.

Abstract

One of the major challenges in machine learning is maintaining the accuracy of the deployed model (e.g., a classifier) in a non-stationary environment. The non-stationary environment results in distribution shifts and, consequently, a degradation in accuracy. Continuous learning of the deployed model with new data could be one remedy. However, the question arises as to how we should update the model with new training data so that it retains its accuracy on the old data while adapting to the new data. In this work, we propose a task-conditioned ensemble of models to maintain the performance of the existing model. The method involves an ensemble of expert models based on task membership information. The in-domain models-based on the local outlier concept (different from the expert models) provide task membership information dynamically at run-time to each probe sample. To evaluate the proposed method, we experiment with three setups: the first represents distribution shift between tasks (LivDet-Iris-2017), the second represents distribution shift both between and within tasks (LivDet-Iris-2020), and the third represents disjoint distribution between tasks (Split MNIST). The experiments highlight the benefits of the proposed method. The source code is available at https://github.com/iPRoBe-lab/Continuous_Learning_FE_DM.

Paper Structure

This paper contains 10 sections, 7 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The overall idea of the task-conditioned ensemble of models of continuous learning. Task 1 inference utilizes its only expert model, whereas task 2 inference utilizes both expert models with the help of in-domain models that provide membership information. In-domain model consists of two components: Feature Extractor (FE) and Distance Measure (DM).
  • Figure 2: Illustration of a local outlier concept, motivation for defining feature space. Blue-colored data points belong to one training set; C is the center of the training set; and red-colored data point P is a probe sample. There are two classes (Class 1 and Class 2) in the blue-colored training set. If we consider the global outlier concept, the red-colored probe sample would be an inlier. However, if the local outlier concept is used, the probe sample is an outlier to both Class 1 and Class 2 as well as to the blue-colored training set. The figure is better viewed in color.
  • Figure 3: The histogram of membership scores assigned to all test samples (tasks 1 and 2) corresponding to (a) Clarkson, (b) Warsaw, (c) Notre Dame, and (d) IIIT-WVU subsets of the LivDet-Iris-2017 setup. In the case of Warsaw and Notre Dame, 'Known' test splits are used for illustration. Membership values toward '0' on the x-axis symbolize higher priority given to the task 1 expert model, whereas membership values toward '1' on the x-axis denote higher priority given to the task 2 expert model. The figure is better viewed in color.
  • Figure 4: 3-D t-sne plots correspond to five sub-tasks of the Split MNIST dataset using (a) pre-trained ViT and (b) our trained ViT embeddings. Pre-trained ViT embeddings of different classes overlap with each other, whereas trained ViT embeddings form clusters of different classes. The figure is better viewed in color.