Table of Contents
Fetching ...

CiTrus: Squeezing Extra Performance out of Low-data Bio-signal Transfer Learning

Eloy Geenjaar, Lie Lu

TL;DR

CiTrus introduces a convolution–transformer hybrid for bio-signal transfer learning that is particularly effective in low-data regimes. By combining a residual CNN encoder with a channel-independent PatchTST transformer and employing masked auto-encoding, frequency-based pre-training, and multimodal pre-training, the approach yields strong transfer performance across diverse biosignals. A key contribution is a resampling-based transfer technique that aligns pre-training and fine-tuning data distributions, improving cross-dataset generalization. The study shows that convolutional models often excel in low-data transfer, transformers gain most from pre-training, and frequency-aware pre-training achieves top performance across extreme data regimes, with multimodal pre-training providing additional gains on several tasks.

Abstract

Transfer learning for bio-signals has recently become an important technique to improve prediction performance on downstream tasks with small bio-signal datasets. Recent works have shown that pre-training a neural network model on a large dataset (e.g. EEG) with a self-supervised task, replacing the self-supervised head with a linear classification head, and fine-tuning the model on different downstream bio-signal datasets (e.g., EMG or ECG) can dramatically improve the performance on those datasets. In this paper, we propose a new convolution-transformer hybrid model architecture with masked auto-encoding for low-data bio-signal transfer learning, introduce a frequency-based masked auto-encoding task, employ a more comprehensive evaluation framework, and evaluate how much and when (multimodal) pre-training improves fine-tuning performance. We also introduce a dramatically more performant method of aligning a downstream dataset with a different temporal length and sampling rate to the original pre-training dataset. Our findings indicate that the convolution-only part of our hybrid model can achieve state-of-the-art performance on some low-data downstream tasks. The performance is often improved even further with our full model. In the case of transformer-based models we find that pre-training especially improves performance on downstream datasets, multimodal pre-training often increases those gains further, and our frequency-based pre-training performs the best on average for the lowest and highest data regimes.

CiTrus: Squeezing Extra Performance out of Low-data Bio-signal Transfer Learning

TL;DR

CiTrus introduces a convolution–transformer hybrid for bio-signal transfer learning that is particularly effective in low-data regimes. By combining a residual CNN encoder with a channel-independent PatchTST transformer and employing masked auto-encoding, frequency-based pre-training, and multimodal pre-training, the approach yields strong transfer performance across diverse biosignals. A key contribution is a resampling-based transfer technique that aligns pre-training and fine-tuning data distributions, improving cross-dataset generalization. The study shows that convolutional models often excel in low-data transfer, transformers gain most from pre-training, and frequency-aware pre-training achieves top performance across extreme data regimes, with multimodal pre-training providing additional gains on several tasks.

Abstract

Transfer learning for bio-signals has recently become an important technique to improve prediction performance on downstream tasks with small bio-signal datasets. Recent works have shown that pre-training a neural network model on a large dataset (e.g. EEG) with a self-supervised task, replacing the self-supervised head with a linear classification head, and fine-tuning the model on different downstream bio-signal datasets (e.g., EMG or ECG) can dramatically improve the performance on those datasets. In this paper, we propose a new convolution-transformer hybrid model architecture with masked auto-encoding for low-data bio-signal transfer learning, introduce a frequency-based masked auto-encoding task, employ a more comprehensive evaluation framework, and evaluate how much and when (multimodal) pre-training improves fine-tuning performance. We also introduce a dramatically more performant method of aligning a downstream dataset with a different temporal length and sampling rate to the original pre-training dataset. Our findings indicate that the convolution-only part of our hybrid model can achieve state-of-the-art performance on some low-data downstream tasks. The performance is often improved even further with our full model. In the case of transformer-based models we find that pre-training especially improves performance on downstream datasets, multimodal pre-training often increases those gains further, and our frequency-based pre-training performs the best on average for the lowest and highest data regimes.

Paper Structure

This paper contains 46 sections, 1 figure, 20 tables.

Figures (1)

  • Figure 1: Subfigure a) shows the general transfer learning framework; the pre-training and fine-tuning structures. Subfigure b) shows the pre-training structure of our proposed model, and sub-figure c) shows how the structure of the model is adapted to accommodate multi-modal pre-training data.