Automatic rating of incomplete hippocampal inversions evaluated across multiple cohorts

Lisa Hemforth; Baptiste Couvy-Duchesne; Kevin De Matos; Camille Brianceau; Matthieu Joulot; Tobias Banaschewski; Arun L. W. Bokde; Sylvane Desrivières; Herta Flor; Antoine Grigis; Hugh Garavan; Penny Gowland; Andreas Heinz; Rüdiger Brühl; Jean-Luc Martinot; Marie-Laure Paillère Martinot; Eric Artiges; Dimitri Papadopoulos; Herve Lemaitre; Tomas Paus; Luise Poustka; Sarah Hohmann; Nathalie Holz; Juliane H. Fröhner; Michael N. Smolka; Nilakshi Vaidya; Henrik Walter; Robert Whelan; Gunter Schumann; Christian Büchel; JB Poline; Bernd Itterman; Vincent Frouin; Alexandre Martin; IMAGEN study group; Claire Cury; Olivier Colliot

Automatic rating of incomplete hippocampal inversions evaluated across multiple cohorts

Lisa Hemforth, Baptiste Couvy-Duchesne, Kevin De Matos, Camille Brianceau, Matthieu Joulot, Tobias Banaschewski, Arun L. W. Bokde, Sylvane Desrivières, Herta Flor, Antoine Grigis, Hugh Garavan, Penny Gowland, Andreas Heinz, Rüdiger Brühl, Jean-Luc Martinot, Marie-Laure Paillère Martinot, Eric Artiges, Dimitri Papadopoulos, Herve Lemaitre, Tomas Paus, Luise Poustka, Sarah Hohmann, Nathalie Holz, Juliane H. Fröhner, Michael N. Smolka, Nilakshi Vaidya, Henrik Walter, Robert Whelan, Gunter Schumann, Christian Büchel, JB Poline, Bernd Itterman, Vincent Frouin, Alexandre Martin, IMAGEN study group, Claire Cury, Olivier Colliot

TL;DR

This study introduces automatic IHI rating by predicting four anatomical criteria from cropped 3D MRI around the hippocampus, aggregating them into an interpretable IHI score. It compares three deep learning architectures (Conv5-FC3, 3D ResNet, SECNN) and a ridge baseline across four cohorts (IMAGEN, QTIM, QTAB, UKBiobank) with three training strategies to test generalization. Deep learning consistently surpasses ridge regression, with left-hemisphere predictions nearing human inter- and intra-rater reliability, while the right hemisphere remains challenging due to lower IHI prevalence; multi-cohort training substantially improves generalization, particularly for the right hemisphere. The work provides a scalable framework for large-scale IHI annotation, enabling population studies and potentially GWAS, while highlighting limitations and avenues for further improvement, such as data augmentation and semi-automatic annotation tools.

Abstract

Incomplete Hippocampal Inversion (IHI), sometimes called hippocampal malrotation, is an atypical anatomical pattern of the hippocampus found in about 20% of the general population. IHI can be visually assessed on coronal slices of T1 weighted MR images, using a composite score that combines four anatomical criteria. IHI has been associated with several brain disorders (epilepsy, schizophrenia). However, these studies were based on small samples. Furthermore, the factors (genetic or environmental) that contribute to the genesis of IHI are largely unknown. Large-scale studies are thus needed to further understand IHI and their potential relationships to neurological and psychiatric disorders. However, visual evaluation is long and tedious, justifying the need for an automatic method. In this paper, we propose, for the first time, to automatically rate IHI. We proceed by predicting four anatomical criteria, which are then summed up to form the IHI score, providing the advantage of an interpretable score. We provided an extensive experimental investigation of different machine learning methods and training strategies. We performed automatic rating using a variety of deep learning models (conv5-FC3, ResNet and SECNN) as well as a ridge regression. We studied the generalization of our models using different cohorts and performed multi-cohort learning. We relied on a large population of 2,008 participants from the IMAGEN study, 993 and 403 participants from the QTIM/QTAB studies as well as 985 subjects from the UKBiobank. We showed that deep learning models outperformed a ridge regression. We demonstrated that the performances of the conv5-FC3 network were at least as good as more complex networks while maintaining a low complexity and computation time. We showed that training on a single cohort may lack in variability while training on several cohorts improves generalization.

Automatic rating of incomplete hippocampal inversions evaluated across multiple cohorts

TL;DR

Abstract

Paper Structure (34 sections, 10 figures, 2 tables)

This paper contains 34 sections, 10 figures, 2 tables.

Introduction
Data and pre-processing
Manual IHI rating protocol
Cohorts description
Subjects:
MRI acquisition:
MRI preprocessing
IHI annotation on cohorts
Methods and analysis
Test/train sets
Training strategies
Deep learning models
Linear models
Statistical analysis
Results
...and 19 more sections

Figures (10)

Figure 1: Schematic of the visual criteria. 1: Verticality and roundness of the hippocampal body. 2: Verticality and depth of the collateral sulcus. 3: Medial position of the hippocampus. 5: Depth of the collateral sulcus and occipito-temporal sulcus. Reproduced from [1] (CC BY).
Figure 2: Schematic representation of the deep learning models used for prediction.
Figure 3: Results of the predictions of composite scores on pooled independent test sets of the IMAGEN, QTIM, QTAB and UKB cohorts. We show the mean ICC and 95% confidence intervals obtained through bootstrapping. Results are shown for the three assessed deep learning models (Conv5-FC3, ResNet and SECNN) and the ridge regression, alongside inter-rater and intra-rater performances. Three training strategies are compared (IMAGEN strategy, IMAGEN, QTIM, QTAB strategy and ALL strategy). These results are shown for predictions in the left (panel a) and right (panel b) hemispheres.
Figure 4: Results of the predictions of individual criteria on pooled independent test sets of the IMAGEN, QTIM, QTAB and UKB cohorts. We show the mean metrics (weighted kappas for C1 C2 and C3, and an unweighted kappa for C5) and 95% confidence intervals obtained through bootstraping. Results are shown for the Conv5-FC3, alongside inter-rater and intra-rater performances. Three training methods are compared: using only the training set of the IMAGEN cohort, using the training sets of IMAGEN, QTIM and QTAB cohorts and using the training sets of all cohorts (IMAGEN, QTIM, QTAB, UKBiobank).
Figure 5: Saliency maps extracted from the Conv5-FC3 model's predictions on the UKBiobank in the left hemisphere. Plots are shown for all training strategies, for individual criteria and the composite scores. Saliency maps were thresholded to show only the highest weights and overlayed on a T1 weighted MRI image.
...and 5 more figures

Automatic rating of incomplete hippocampal inversions evaluated across multiple cohorts

TL;DR

Abstract

Automatic rating of incomplete hippocampal inversions evaluated across multiple cohorts

Authors

TL;DR

Abstract

Table of Contents

Figures (10)