Covariance Descriptors Meet General Vision Encoders: Riemannian Deep Learning for Medical Image Classification
Josef Mayr, Anna Reithmeir, Maxime Di Folco, Julia A. Schnabel
TL;DR
This work addresses medical image classification on the SPD manifold using covariance descriptors derived from powerful general vision encoders (GVEs). It proposes a three-stage pipeline that extracts handcrafted and GVE features, computes SPD covariance descriptors, and classifies them with manifold-aware methods, including a learning-based SPDNet. The authors demonstrate that GVE-based covariance descriptors outperform handcrafted ones, with DINOv2-derived features and SPDNet yielding the best performance across 11 MedMNIST datasets, approaching or surpassing state-of-the-art baselines. The findings highlight a scalable, geometry-aware framework that leverages pretrained encoders for robust medical imaging analysis and pave the way for extensions to 3D data and multimodal covariance representations.
Abstract
Covariance descriptors capture second-order statistics of image features. They have shown strong performance in general computer vision tasks, but remain underexplored in medical imaging. We investigate their effectiveness for both conventional and learning-based medical image classification, with a particular focus on SPDNet, a classification network specifically designed for symmetric positive definite (SPD) matrices. We propose constructing covariance descriptors from features extracted by pre-trained general vision encoders (GVEs) and comparing them with handcrafted descriptors. Two GVEs - DINOv2 and MedSAM - are evaluated across eleven binary and multi-class datasets from the MedMNSIT benchmark. Our results show that covariance descriptors derived from GVE features consistently outperform those derived from handcrafted features. Moreover, SPDNet yields superior performance to state-of-the-art methods when combined with DINOv2 features. Our findings highlight the potential of combining covariance descriptors with powerful pretrained vision encoders for medical image analysis.
