Table of Contents
Fetching ...

Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers

Alexander H. Berger, Laurin Lux, Suprosanna Shit, Ivan Ezhov, Georgios Kaissis, Martin J. Menten, Daniel Rueckert, Johannes C. Paetzold

TL;DR

This work tackles the problem of direct image-to-graph transformation under data scarcity by proposing cross-domain and cross-dimension inductive transfer learning for transformer-based image-to-graph models. It introduces three key innovations: a regularized edge sampling loss $\mathcal{L}_{\mathrm{Reslt}}$ to handle varying graph densities across domains, a supervised domain adaptation framework with image- and graph-level adversaries, and a simple projection-based framework $\Pi$ that enables training 2D data to perform 3D image-to-graph tasks. The methods are evaluated across six diverse datasets spanning 2D road networks and 3D vascular graphs, showing consistent improvements over no-pretraining, self-supervised pretraining, and in some cases supervised pretraining, including challenging cross-domain and cross-dimension transfer scenarios. The results demonstrate that inductive transfer learning can substantially reduce data requirements for complex geometric prediction tasks and enable direct image-to-graph inference in previously intractable 3D domains, with practical implications for urban planning and biomedical imaging.

Abstract

Direct image-to-graph transformation is a challenging task that involves solving object detection and relationship prediction in a single model. Due to this task's complexity, large training datasets are rare in many domains, making the training of deep-learning methods challenging. This data sparsity necessitates transfer learning strategies akin to the state-of-the-art in general computer vision. In this work, we introduce a set of methods enabling cross-domain and cross-dimension learning for image-to-graph transformers. We propose (1) a regularized edge sampling loss to effectively learn object relations in multiple domains with different numbers of edges, (2) a domain adaptation framework for image-to-graph transformers aligning image- and graph-level features from different domains, and (3) a projection function that allows using 2D data for training 3D transformers. We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we utilize labeled data from 2D road networks for simultaneous learning in vastly different target domains. Our method consistently outperforms standard transfer learning and self-supervised pretraining on challenging benchmarks, such as retinal or whole-brain vessel graph extraction.

Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers

TL;DR

This work tackles the problem of direct image-to-graph transformation under data scarcity by proposing cross-domain and cross-dimension inductive transfer learning for transformer-based image-to-graph models. It introduces three key innovations: a regularized edge sampling loss to handle varying graph densities across domains, a supervised domain adaptation framework with image- and graph-level adversaries, and a simple projection-based framework that enables training 2D data to perform 3D image-to-graph tasks. The methods are evaluated across six diverse datasets spanning 2D road networks and 3D vascular graphs, showing consistent improvements over no-pretraining, self-supervised pretraining, and in some cases supervised pretraining, including challenging cross-domain and cross-dimension transfer scenarios. The results demonstrate that inductive transfer learning can substantially reduce data requirements for complex geometric prediction tasks and enable direct image-to-graph inference in previously intractable 3D domains, with practical implications for urban planning and biomedical imaging.

Abstract

Direct image-to-graph transformation is a challenging task that involves solving object detection and relationship prediction in a single model. Due to this task's complexity, large training datasets are rare in many domains, making the training of deep-learning methods challenging. This data sparsity necessitates transfer learning strategies akin to the state-of-the-art in general computer vision. In this work, we introduce a set of methods enabling cross-domain and cross-dimension learning for image-to-graph transformers. We propose (1) a regularized edge sampling loss to effectively learn object relations in multiple domains with different numbers of edges, (2) a domain adaptation framework for image-to-graph transformers aligning image- and graph-level features from different domains, and (3) a projection function that allows using 2D data for training 3D transformers. We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we utilize labeled data from 2D road networks for simultaneous learning in vastly different target domains. Our method consistently outperforms standard transfer learning and self-supervised pretraining on challenging benchmarks, such as retinal or whole-brain vessel graph extraction.
Paper Structure (44 sections, 11 equations, 11 figures, 14 tables)

This paper contains 44 sections, 11 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: Direct image-to-graph transformation. Whole brain vessel (top) and Agadez road dataset (bottom). The predicted graph is visualized as an overlay on the real image.
  • Figure 2: Conceptual overview of our framework. We use a transformer for single-stage image-to-graph transformation. Our three methodological contributions enable knowledge transfer between vastly different domains in 2D and 3D.
  • Figure 3: Qualitative results. From left to right: Image, ground truth graph, no pretraining-baseline, and our method. Datasets in each row are indicated by the letters for the datasets as in \ref{['table:main_results']}. Our method consistently outperforms the no pretraining baseline, which overpredicts the edges and nodes in all datasets but the OCTA-500, where the fine-tuning set is uncharacteristically large.
  • Figure 4: Training loss curves. The orange line depicts the training loss without our regularized edge sampling loss $\mathcal{L}_{Resln}$ and the blue line with $\mathcal{L}_{Resln}$, respectively. $\mathcal{L}_{Resln}$ shows faster convergence from the beginning on.
  • Figure 5: cka-similarity kornblith2019similarity (y-axis) between the feature representations of source and target domain during pretraining. $alpha$ must be sufficiently large such that the similarity increases during training. From a certain threshold on, the similarity does not increase further. We associate a high similarity between both domains with the model learning domain-invariant features.
  • ...and 6 more figures