Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers

Alexander H. Berger; Laurin Lux; Suprosanna Shit; Ivan Ezhov; Georgios Kaissis; Martin J. Menten; Daniel Rueckert; Johannes C. Paetzold

Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers

Alexander H. Berger, Laurin Lux, Suprosanna Shit, Ivan Ezhov, Georgios Kaissis, Martin J. Menten, Daniel Rueckert, Johannes C. Paetzold

TL;DR

This work tackles the problem of direct image-to-graph transformation under data scarcity by proposing cross-domain and cross-dimension inductive transfer learning for transformer-based image-to-graph models. It introduces three key innovations: a regularized edge sampling loss $\mathcal{L}_{\mathrm{Reslt}}$ to handle varying graph densities across domains, a supervised domain adaptation framework with image- and graph-level adversaries, and a simple projection-based framework $\Pi$ that enables training 2D data to perform 3D image-to-graph tasks. The methods are evaluated across six diverse datasets spanning 2D road networks and 3D vascular graphs, showing consistent improvements over no-pretraining, self-supervised pretraining, and in some cases supervised pretraining, including challenging cross-domain and cross-dimension transfer scenarios. The results demonstrate that inductive transfer learning can substantially reduce data requirements for complex geometric prediction tasks and enable direct image-to-graph inference in previously intractable 3D domains, with practical implications for urban planning and biomedical imaging.

Abstract

Direct image-to-graph transformation is a challenging task that involves solving object detection and relationship prediction in a single model. Due to this task's complexity, large training datasets are rare in many domains, making the training of deep-learning methods challenging. This data sparsity necessitates transfer learning strategies akin to the state-of-the-art in general computer vision. In this work, we introduce a set of methods enabling cross-domain and cross-dimension learning for image-to-graph transformers. We propose (1) a regularized edge sampling loss to effectively learn object relations in multiple domains with different numbers of edges, (2) a domain adaptation framework for image-to-graph transformers aligning image- and graph-level features from different domains, and (3) a projection function that allows using 2D data for training 3D transformers. We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we utilize labeled data from 2D road networks for simultaneous learning in vastly different target domains. Our method consistently outperforms standard transfer learning and self-supervised pretraining on challenging benchmarks, such as retinal or whole-brain vessel graph extraction.

Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers

TL;DR

to handle varying graph densities across domains, a supervised domain adaptation framework with image- and graph-level adversaries, and a simple projection-based framework

that enables training 2D data to perform 3D image-to-graph tasks. The methods are evaluated across six diverse datasets spanning 2D road networks and 3D vascular graphs, showing consistent improvements over no-pretraining, self-supervised pretraining, and in some cases supervised pretraining, including challenging cross-domain and cross-dimension transfer scenarios. The results demonstrate that inductive transfer learning can substantially reduce data requirements for complex geometric prediction tasks and enable direct image-to-graph inference in previously intractable 3D domains, with practical implications for urban planning and biomedical imaging.

Abstract

Paper Structure (44 sections, 11 equations, 11 figures, 14 tables)

This paper contains 44 sections, 11 equations, 11 figures, 14 tables.

Introduction
Our contribution.
Related Works
Image-to-graph transformation.
Transfer learning for transformers.
Cross-domain transfer learning.
Cross-dimension transfer learning.
Methodology
Regularized Edge Sampling Loss
Supervised Domain Adaptation
Combined Training Loss
Framework for 2D-to-3D Transfer Learning
Experiments and Results
Datasets.
Training.
...and 29 more sections

Figures (11)

Figure 1: Direct image-to-graph transformation. Whole brain vessel (top) and Agadez road dataset (bottom). The predicted graph is visualized as an overlay on the real image.
Figure 2: Conceptual overview of our framework. We use a transformer for single-stage image-to-graph transformation. Our three methodological contributions enable knowledge transfer between vastly different domains in 2D and 3D.
Figure 3: Qualitative results. From left to right: Image, ground truth graph, no pretraining-baseline, and our method. Datasets in each row are indicated by the letters for the datasets as in \ref{['table:main_results']}. Our method consistently outperforms the no pretraining baseline, which overpredicts the edges and nodes in all datasets but the OCTA-500, where the fine-tuning set is uncharacteristically large.
Figure 4: Training loss curves. The orange line depicts the training loss without our regularized edge sampling loss $\mathcal{L}_{Resln}$ and the blue line with $\mathcal{L}_{Resln}$, respectively. $\mathcal{L}_{Resln}$ shows faster convergence from the beginning on.
Figure 5: cka-similarity kornblith2019similarity (y-axis) between the feature representations of source and target domain during pretraining. $alpha$ must be sufficiently large such that the similarity increases during training. From a certain threshold on, the similarity does not increase further. We associate a high similarity between both domains with the model learning domain-invariant features.
...and 6 more figures

Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers

TL;DR

Abstract

Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers

Authors

TL;DR

Abstract

Table of Contents

Figures (11)