Table of Contents
Fetching ...

Unsupervised Deep Learning Image Verification Method

Enoch Solomon, Abraham Woubie, Eyael Solomon Emiru

TL;DR

This work tackles unsupervised face verification by training an autoencoder to reconstruct neighbor face vectors, thereby capturing session variability without labels. Neighbors are selected via cosine similarity, creating $n\times(k-1)$ training samples, and embeddings extracted from the autoencoder are evaluated with cosine scoring. The approach yields up to a 56% relative reduction in $EER$ on LFW compared to a baseline cosine system, and score-level fusion with PLDA results in a highly competitive $EER$ (as low as 6.23%) while using only about 200K unlabeled images. The method demonstrates strong performance under limited labeled data, highlighting practical potential for verification tasks with scarce supervision.

Abstract

Although deep learning are commonly employed for image recognition, usually huge amount of labeled training data is required, which may not always be readily available. This leads to a noticeable performance disparity when compared to state-of-the-art unsupervised face verification techniques. In this work, we propose a method to narrow this gap by leveraging an autoencoder to convert the face image vector into a novel representation. Notably, the autoencoder is trained to reconstruct neighboring face image vectors rather than the original input image vectors. These neighbor face image vectors are chosen through an unsupervised process based on the highest cosine scores with the training face image vectors. The proposed method achieves a relative improvement of 56\% in terms of EER over the baseline system on Labeled Faces in the Wild (LFW) dataset. This has successfully narrowed down the performance gap between cosine and PLDA scoring systems.

Unsupervised Deep Learning Image Verification Method

TL;DR

This work tackles unsupervised face verification by training an autoencoder to reconstruct neighbor face vectors, thereby capturing session variability without labels. Neighbors are selected via cosine similarity, creating training samples, and embeddings extracted from the autoencoder are evaluated with cosine scoring. The approach yields up to a 56% relative reduction in on LFW compared to a baseline cosine system, and score-level fusion with PLDA results in a highly competitive (as low as 6.23%) while using only about 200K unlabeled images. The method demonstrates strong performance under limited labeled data, highlighting practical potential for verification tasks with scarce supervision.

Abstract

Although deep learning are commonly employed for image recognition, usually huge amount of labeled training data is required, which may not always be readily available. This leads to a noticeable performance disparity when compared to state-of-the-art unsupervised face verification techniques. In this work, we propose a method to narrow this gap by leveraging an autoencoder to convert the face image vector into a novel representation. Notably, the autoencoder is trained to reconstruct neighboring face image vectors rather than the original input image vectors. These neighbor face image vectors are chosen through an unsupervised process based on the highest cosine scores with the training face image vectors. The proposed method achieves a relative improvement of 56\% in terms of EER over the baseline system on Labeled Faces in the Wild (LFW) dataset. This has successfully narrowed down the performance gap between cosine and PLDA scoring systems.
Paper Structure (14 sections, 2 figures, 4 tables)