Table of Contents
Fetching ...

Covariance Descriptors Meet General Vision Encoders: Riemannian Deep Learning for Medical Image Classification

Josef Mayr, Anna Reithmeir, Maxime Di Folco, Julia A. Schnabel

TL;DR

This work addresses medical image classification on the SPD manifold using covariance descriptors derived from powerful general vision encoders (GVEs). It proposes a three-stage pipeline that extracts handcrafted and GVE features, computes SPD covariance descriptors, and classifies them with manifold-aware methods, including a learning-based SPDNet. The authors demonstrate that GVE-based covariance descriptors outperform handcrafted ones, with DINOv2-derived features and SPDNet yielding the best performance across 11 MedMNIST datasets, approaching or surpassing state-of-the-art baselines. The findings highlight a scalable, geometry-aware framework that leverages pretrained encoders for robust medical imaging analysis and pave the way for extensions to 3D data and multimodal covariance representations.

Abstract

Covariance descriptors capture second-order statistics of image features. They have shown strong performance in general computer vision tasks, but remain underexplored in medical imaging. We investigate their effectiveness for both conventional and learning-based medical image classification, with a particular focus on SPDNet, a classification network specifically designed for symmetric positive definite (SPD) matrices. We propose constructing covariance descriptors from features extracted by pre-trained general vision encoders (GVEs) and comparing them with handcrafted descriptors. Two GVEs - DINOv2 and MedSAM - are evaluated across eleven binary and multi-class datasets from the MedMNSIT benchmark. Our results show that covariance descriptors derived from GVE features consistently outperform those derived from handcrafted features. Moreover, SPDNet yields superior performance to state-of-the-art methods when combined with DINOv2 features. Our findings highlight the potential of combining covariance descriptors with powerful pretrained vision encoders for medical image analysis.

Covariance Descriptors Meet General Vision Encoders: Riemannian Deep Learning for Medical Image Classification

TL;DR

This work addresses medical image classification on the SPD manifold using covariance descriptors derived from powerful general vision encoders (GVEs). It proposes a three-stage pipeline that extracts handcrafted and GVE features, computes SPD covariance descriptors, and classifies them with manifold-aware methods, including a learning-based SPDNet. The authors demonstrate that GVE-based covariance descriptors outperform handcrafted ones, with DINOv2-derived features and SPDNet yielding the best performance across 11 MedMNIST datasets, approaching or surpassing state-of-the-art baselines. The findings highlight a scalable, geometry-aware framework that leverages pretrained encoders for robust medical imaging analysis and pave the way for extensions to 3D data and multimodal covariance representations.

Abstract

Covariance descriptors capture second-order statistics of image features. They have shown strong performance in general computer vision tasks, but remain underexplored in medical imaging. We investigate their effectiveness for both conventional and learning-based medical image classification, with a particular focus on SPDNet, a classification network specifically designed for symmetric positive definite (SPD) matrices. We propose constructing covariance descriptors from features extracted by pre-trained general vision encoders (GVEs) and comparing them with handcrafted descriptors. Two GVEs - DINOv2 and MedSAM - are evaluated across eleven binary and multi-class datasets from the MedMNSIT benchmark. Our results show that covariance descriptors derived from GVE features consistently outperform those derived from handcrafted features. Moreover, SPDNet yields superior performance to state-of-the-art methods when combined with DINOv2 features. Our findings highlight the potential of combining covariance descriptors with powerful pretrained vision encoders for medical image analysis.

Paper Structure

This paper contains 14 sections, 4 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of proposed method. First, general vision encoder (GVE) features are extracted for an image. Then, the covariance descriptors are computed, which are used as input to a learning-based Riemannian classifier.
  • Figure 2: Handcrafted (HC) features capture local edge and gradient properties while features extracted from pretrained general vision encoders (GVEs) capture high-level semantics. Here, the first three principal components are shown for the GVE features.