Self-Supervised Siamese Autoencoders

Friederike Baier; Sebastian Mair; Samuel G. Fadel

Self-Supervised Siamese Autoencoders

Friederike Baier, Sebastian Mair, Samuel G. Fadel

TL;DR

This work addresses the challenge of limited labeled data in image classification by developing SidAE, a self-supervised model that merges Siamese networks with a denoising autoencoder. SidAE processes two augmented views of inputs, learns via a dual loss that combines siamese alignment with reconstruction, and balances the two signals with a tunable weight $w$. Across CIFAR-10, MNIST, Fashion-MNIST, and STL-10, SidAE consistently outperforms SimSiam and a denoising autoencoder, particularly when only a small fraction of labeled data is available, and maintains strong performance under fine-tuning. The results demonstrate that integrating representation alignment with denoising reconstruction yields more stable and informative features for downstream classification, with practical implications for data-scarce regimes and transfer learning.

Abstract

In contrast to fully-supervised models, self-supervised representation learning only needs a fraction of data to be labeled and often achieves the same or even higher downstream performance. The goal is to pre-train deep neural networks on a self-supervised task, making them able to extract meaningful features from raw input data afterwards. Previously, autoencoders and Siamese networks have been successfully employed as feature extractors for tasks such as image classification. However, both have their individual shortcomings and benefits. In this paper, we combine their complementary strengths by proposing a new method called SidAE (Siamese denoising autoencoder). Using an image classification downstream task, we show that our model outperforms two self-supervised baselines across multiple data sets and scenarios. Crucially, this includes conditions in which only a small amount of labeled data is available. Empirically, the Siamese component has more impact, but the denoising autoencoder is nevertheless necessary to improve performance.

Self-Supervised Siamese Autoencoders

TL;DR

. Across CIFAR-10, MNIST, Fashion-MNIST, and STL-10, SidAE consistently outperforms SimSiam and a denoising autoencoder, particularly when only a small fraction of labeled data is available, and maintains strong performance under fine-tuning. The results demonstrate that integrating representation alignment with denoising reconstruction yields more stable and informative features for downstream classification, with practical implications for data-scarce regimes and transfer learning.

Abstract

Paper Structure (25 sections, 3 equations, 7 figures)

This paper contains 25 sections, 3 equations, 7 figures.

Introduction
Self-Supervised Representation Learning
Siamese networks.
Denoising autoencoders.
A Siamese Denoising Autoencoder
Motivation
Architecture
Input.
Encoder.
Decoder.
Predictor.
Loss.
Experiments
Experimental Setup
Data.
...and 10 more sections

Figures (7)

Figure 1: Siamese networks (left) and (denoising) autoencoders (middle), compared to our proposed model SidAE (right). The left and right illustrations should be symmetric, but we show only one side of the symmetry for brevity. The middle illustration shows two views due to our experimental setting (for a fair comparison), although a vanilla denoising autoencoder usually has only one.
Figure 2: The components of the models used in our experiments.
Figure 3: Classification accuracy on CIFAR-10 (averaged over 5 runs incl. std. err.) after different pre-training stages using a frozen pre-trained backbone. For downstream training, 1% (left) and 100% (right) of training data are used.
Figure 4: SidAE: The influence of the weight $w$ on CIFAR-10.
Figure 5: Classification accuracies on MNIST (left) and Fashion-MNIST (right) using 1% of the training data for downstream training.
...and 2 more figures

Self-Supervised Siamese Autoencoders

TL;DR

Abstract

Self-Supervised Siamese Autoencoders

Authors

TL;DR

Abstract

Table of Contents

Figures (7)