Transformer Causality Regularization for Dynamic Inverse Problems

Gesa Sarnighausen; Anne Wald; Andreas Hauptmann

Transformer Causality Regularization for Dynamic Inverse Problems

Gesa Sarnighausen, Anne Wald, Andreas Hauptmann

Abstract

We study the concept of including the causality principle as regularizer into the solution of linear time-dependent inverse problems. This is achieved by combining transformer-based predictions with classical variational regularization, resulting in what we call transformer causality regularization (TCR). The causality principle states that an object at time $t'$ depends only on its previous states at $t < t'$ and is independent of future states at $t > t'$. Since the transformer architecture represents sequence-to-sequence functions and can be equipped with a causal attention mask, transformers are the natural choice for a learned causality function that predicts the state of an object at time $t'$ given the previous states at $t < t'$. We combine this with the inductive bias of convolutional neural networks (CNNs) for imaging tasks to treat the spatial variable. The output of the spatial-temporal transformer is then used as a prior for variational regularization, such that classical results on regularization and convergence for solution methods directly transfer to our case. Using the example of dynamic computerized tomography, we compare TCR to a static and dynamic version of the earlier introduced unrolled adversarial regularizer for simulated and measured data. The results show that using TCR within a variational framework improves reconstruction results and data-consistency.

Transformer Causality Regularization for Dynamic Inverse Problems

Abstract

depends only on its previous states at

and is independent of future states at

. Since the transformer architecture represents sequence-to-sequence functions and can be equipped with a causal attention mask, transformers are the natural choice for a learned causality function that predicts the state of an object at time

given the previous states at

. We combine this with the inductive bias of convolutional neural networks (CNNs) for imaging tasks to treat the spatial variable. The output of the spatial-temporal transformer is then used as a prior for variational regularization, such that classical results on regularization and convergence for solution methods directly transfer to our case. Using the example of dynamic computerized tomography, we compare TCR to a static and dynamic version of the earlier introduced unrolled adversarial regularizer for simulated and measured data. The results show that using TCR within a variational framework improves reconstruction results and data-consistency.

Paper Structure (31 sections, 21 equations, 6 figures, 3 tables)

This paper contains 31 sections, 21 equations, 6 figures, 3 tables.

Introduction
Dynamic Inverse Problems
Causality regularization
Minimizing the variational problem
Unrolled adversarial regularizer (UAR) for the dynamic case
Transformer causality function
Architecture of spatial-temporal transformer
Training schemes
Experiments
Model of dynamic computerized tomography
Training data
Training details
Models for UAR
Models for TCR
Training of the refinement model
...and 16 more sections

Figures (6)

Figure 1: Left: Architecture of temporal transformer encoder with rotary position embedding (RoPE) with $h = 8$ heads and $6$ transformer layers. Right: Architecture of next frame prediction/refinement model, Number of patches N=356, transformer model dimension D = 512, T length of input sequence, B = 8 batch size for training
Figure 2: Results for a test phantom with 20 measurement angles for the initial time steps 0 and 1 for filtered backprojection (FBP), the $L^1$ transformer-based reconstruction (causality reconstruction) and corresponding prediction (causality prediction). For the remaining time steps measurements from 3 equidistantly rotating angles are taken. For the UAR reconstructions trained on static 2D phantoms (UAR static) and on dynamic 3D phantoms (UAR dynamic) measurements from 3 angles are taken for all time steps.
Figure 3: Results for last time step of several test phantoms with 3 measurement angles. From top to bottom: groundtruth, the $L^1$-transformer based reconstruction obtained with the learned causality function, the associated prediction and the prediction obtained auto-regressively by only using the transformer with two input frames (Landweber reconstructions obtained from 20 measurement angles).
Figure 4: From top to bottom: Reference solution for the rolling stones data set (reference), $L^1$ transformer based reconstruction for 10 initial angles and $3$ angles for the remaining time steps ($L^1$-reco), corresponding transformer prediction ($L^1$-pred), $L^1$-TV transformer based reconstruction for 10 initial angles and $3$ angles for the remaining time steps ($L^1$-TV-reco), corresponding transformer prediction ($L^1$-TV-pred).
Figure 5: Last time step of the test phantom in Figure \ref{['fig:Phantom4']} of initial baseline generator of UAR optimized using \ref{['eq:lossGenDiscr']} in the static (trained on 2D phantoms) and the dynamic case (trained on 3D phantoms).
...and 1 more figures

Transformer Causality Regularization for Dynamic Inverse Problems

Abstract

Transformer Causality Regularization for Dynamic Inverse Problems

Authors

Abstract

Table of Contents

Figures (6)