Table of Contents
Fetching ...

Meta-forests: Domain generalization on random forests with meta-learning

Yuyang Sun, Panagiotis Kosmas

TL;DR

Domain shifts limit generalization to unseen domains. Meta-forests integrate meta-learning and an MMD-based regularizer into random forests to increase inter-tree diversity while preserving strength. Empirical results on glucose monitoring and object-recognition benchmarks show meta-forests outperform state-of-the-art domain generalization methods. The approach offers a data-efficient, interpretable strategy for DG in settings with limited data and broad biomedical and multimodal applicability.

Abstract

Domain generalization is a popular machine learning technique that enables models to perform well on the unseen target domain, by learning from multiple source domains. Domain generalization is useful in cases where data is limited, difficult, or expensive to collect, such as in object recognition and biomedicine. In this paper, we propose a novel domain generalization algorithm called "meta-forests", which builds upon the basic random forests model by incorporating the meta-learning strategy and maximum mean discrepancy measure. The aim of meta-forests is to enhance the generalization ability of classifiers by reducing the correlation among trees and increasing their strength. More specifically, meta-forests conducts meta-learning optimization during each meta-task, while also utilizing the maximum mean discrepancy as a regularization term to penalize poor generalization performance in the meta-test process. To evaluate the effectiveness of our algorithm, we test it on two publicly object recognition datasets and a glucose monitoring dataset that we have used in a previous study. Our results show that meta-forests outperforms state-of-the-art approaches in terms of generalization performance on both object recognition and glucose monitoring datasets.

Meta-forests: Domain generalization on random forests with meta-learning

TL;DR

Domain shifts limit generalization to unseen domains. Meta-forests integrate meta-learning and an MMD-based regularizer into random forests to increase inter-tree diversity while preserving strength. Empirical results on glucose monitoring and object-recognition benchmarks show meta-forests outperform state-of-the-art domain generalization methods. The approach offers a data-efficient, interpretable strategy for DG in settings with limited data and broad biomedical and multimodal applicability.

Abstract

Domain generalization is a popular machine learning technique that enables models to perform well on the unseen target domain, by learning from multiple source domains. Domain generalization is useful in cases where data is limited, difficult, or expensive to collect, such as in object recognition and biomedicine. In this paper, we propose a novel domain generalization algorithm called "meta-forests", which builds upon the basic random forests model by incorporating the meta-learning strategy and maximum mean discrepancy measure. The aim of meta-forests is to enhance the generalization ability of classifiers by reducing the correlation among trees and increasing their strength. More specifically, meta-forests conducts meta-learning optimization during each meta-task, while also utilizing the maximum mean discrepancy as a regularization term to penalize poor generalization performance in the meta-test process. To evaluate the effectiveness of our algorithm, we test it on two publicly object recognition datasets and a glucose monitoring dataset that we have used in a previous study. Our results show that meta-forests outperforms state-of-the-art approaches in terms of generalization performance on both object recognition and glucose monitoring datasets.
Paper Structure (13 sections, 3 equations, 2 figures, 5 tables, 1 algorithm)

This paper contains 13 sections, 3 equations, 2 figures, 5 tables, 1 algorithm.

Figures (2)

  • Figure 1: An overview of our proposed domain generalization algorithm meta-forests.
  • Figure 2: Structure of meta-forests domain generalization algorithm, which are details about the 'Meta-learning' box in Figure \ref{['fig:fig_label_0']}. The source domains are denoted as $D_s$, and the target domain is denoted as $D_t$. At each meta-task training iteration, one of the source domains is randomly selected as the meta-test set, while the remaining domains serve as the meta-train set. For instance, at the $N^{th}$ iteration, we select $D_s\_2$ as the meta-test set, while in previous iterations, it was randomly selected from the other domains. Within each meta-task iteration, meta-learning and weights update are performed, resulting in (M-2) weighted forests. The meta-task is repeated N times, and all N(M-2) weighted forests generated are collected to construct the meta-forests model. Finally, the produced meta-forests model is evaluated on the target domain $D_t$.