Self-Supervised Siamese Autoencoders
Friederike Baier, Sebastian Mair, Samuel G. Fadel
TL;DR
This work addresses the challenge of limited labeled data in image classification by developing SidAE, a self-supervised model that merges Siamese networks with a denoising autoencoder. SidAE processes two augmented views of inputs, learns via a dual loss that combines siamese alignment with reconstruction, and balances the two signals with a tunable weight $w$. Across CIFAR-10, MNIST, Fashion-MNIST, and STL-10, SidAE consistently outperforms SimSiam and a denoising autoencoder, particularly when only a small fraction of labeled data is available, and maintains strong performance under fine-tuning. The results demonstrate that integrating representation alignment with denoising reconstruction yields more stable and informative features for downstream classification, with practical implications for data-scarce regimes and transfer learning.
Abstract
In contrast to fully-supervised models, self-supervised representation learning only needs a fraction of data to be labeled and often achieves the same or even higher downstream performance. The goal is to pre-train deep neural networks on a self-supervised task, making them able to extract meaningful features from raw input data afterwards. Previously, autoencoders and Siamese networks have been successfully employed as feature extractors for tasks such as image classification. However, both have their individual shortcomings and benefits. In this paper, we combine their complementary strengths by proposing a new method called SidAE (Siamese denoising autoencoder). Using an image classification downstream task, we show that our model outperforms two self-supervised baselines across multiple data sets and scenarios. Crucially, this includes conditions in which only a small amount of labeled data is available. Empirically, the Siamese component has more impact, but the denoising autoencoder is nevertheless necessary to improve performance.
