Table of Contents
Fetching ...

Chest X-Rays Image Classification from beta-Variational Autoencoders Latent Features

Leonardo Crespi, Daniele Loiacono, Arturo Chiti

TL;DR

The paper investigates extracting high-level latent features from chest X-rays using beta-VAE models trained on CheXpert and uses lightweight tree-based classifiers to perform multi-label disease classification. By employing embeddings from multiple CNN backbones and two ensemble strategies, the approach achieves meaningful AUROC performance, albeit below task-specific CNN baselines, while offering a computationally efficient alternative. The results suggest that disentangled latent features capture informative image characteristics suitable for classification and that simple ensembling can boost performance without additional training or feature engineering. The work points to future directions in generalization to other datasets, optimizing embedding dimensionality, and exploring more sophisticated ensemble techniques to close the gap with specialized CNN models.

Abstract

Chest X-Ray (CXR) is one of the most common diagnostic techniques used in everyday clinical practice all around the world. We hereby present a work which intends to investigate and analyse the use of Deep Learning (DL) techniques to extract information from such images and allow to classify them, trying to keep our methodology as general as possible and possibly also usable in a real world scenario without much effort, in the future. To move in this direction, we trained several beta-Variational Autoencoder (beta-VAE) models on the CheXpert dataset, one of the largest publicly available collection of labeled CXR images; from these models, latent features have been extracted and used to train other Machine Learning models, able to classify the original images from the features extracted by the beta-VAE. Lastly, tree-based models have been combined together in ensemblings to improve the results without the necessity of further training or models engineering. Expecting some drop in pure performance with the respect to state of the art classification specific models, we obtained encouraging results, which show the viability of our approach and the usability of the high level features extracted by the autoencoders for classification tasks.

Chest X-Rays Image Classification from beta-Variational Autoencoders Latent Features

TL;DR

The paper investigates extracting high-level latent features from chest X-rays using beta-VAE models trained on CheXpert and uses lightweight tree-based classifiers to perform multi-label disease classification. By employing embeddings from multiple CNN backbones and two ensemble strategies, the approach achieves meaningful AUROC performance, albeit below task-specific CNN baselines, while offering a computationally efficient alternative. The results suggest that disentangled latent features capture informative image characteristics suitable for classification and that simple ensembling can boost performance without additional training or feature engineering. The work points to future directions in generalization to other datasets, optimizing embedding dimensionality, and exploring more sophisticated ensemble techniques to close the gap with specialized CNN models.

Abstract

Chest X-Ray (CXR) is one of the most common diagnostic techniques used in everyday clinical practice all around the world. We hereby present a work which intends to investigate and analyse the use of Deep Learning (DL) techniques to extract information from such images and allow to classify them, trying to keep our methodology as general as possible and possibly also usable in a real world scenario without much effort, in the future. To move in this direction, we trained several beta-Variational Autoencoder (beta-VAE) models on the CheXpert dataset, one of the largest publicly available collection of labeled CXR images; from these models, latent features have been extracted and used to train other Machine Learning models, able to classify the original images from the features extracted by the beta-VAE. Lastly, tree-based models have been combined together in ensemblings to improve the results without the necessity of further training or models engineering. Expecting some drop in pure performance with the respect to state of the art classification specific models, we obtained encouraging results, which show the viability of our approach and the usability of the high level features extracted by the autoencoders for classification tasks.

Paper Structure

This paper contains 16 sections, 3 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: A block scheme of the $\beta$-VAE built for our work
  • Figure 2: Output images from $\beta$-VAE with 100 ($1^{st}$ and $3^{rd}$ row) and 200 ($2^{nd}$ and $4^{th}$ row) units latent space, given as input the same two images. The label identifies the backbone used for the encoder. It can be noted that the images have subtle differences between them but they all clearly are a blurred and less detailed version of the original image