Table of Contents
Fetching ...

Improving the Generalisation of Learned Reconstruction Frameworks

Emilien Valat, Ozan Öktem

TL;DR

The paper tackles the generalization challenge in learned CT reconstruction by reframing sinogram processing as a graph-structured problem that encodes acquisition geometry. It introduces GLM, a hybrid network that fuses grid-based CNN filtering with graph-based aggregation, enabling locality in the detector dimension while capturing geometry-dependent relationships across source positions. Empirically, GLM achieves higher quality (SSIM/PSNR) with far fewer trainable parameters and demonstrates robust generalization to unseen acquisition geometries, including sparse-view scenarios, while offering improved memory and training-time scalability. These results suggest that geometry-aware graph representations coupled with targeted aggregation can substantially improve the robustness and efficiency of learned reconstruction in CT, with potential extensions to more complex geometries and weighting schemes.

Abstract

Ensuring proper generalization is a critical challenge in applying data-driven methods for solving inverse problems in imaging, as neural networks reconstructing an image must perform well across varied datasets and acquisition geometries. In X-ray Computed Tomography (CT), convolutional neural networks (CNNs) are widely used to filter the projection data but are ill-suited for this task as they apply grid-based convolutions to the sinogram, which inherently lies on a line manifold, not a regular grid. The CNNs, unaware of the geometry, are implicitly tied to it and require an excessive amount of parameters as they must infer the relations between measurements from the data rather than from prior information. The contribution of this paper is twofold. First, we introduce a graph data structure to represent CT acquisition geometries and tomographic data, providing a detailed explanation of the graph's structure for circular, cone-beam geometries. Second, we propose GLM, a hybrid neural network architecture that leverages both graph and grid convolutions to process tomographic data. We demonstrate that GLM outperforms CNNs when performance is quantified in terms of structural similarity and peak signal-to-noise ratio, despite the fact that GLM uses only a fraction of the trainable parameters. Compared to CNNs, GLM also requires significantly less training time and memory, and its memory requirements scale better. Crucially, GLM demonstrates robust generalization to unseen variations in the acquisition geometry, like when training only on fully sampled CT data and then testing on sparse-view CT data.

Improving the Generalisation of Learned Reconstruction Frameworks

TL;DR

The paper tackles the generalization challenge in learned CT reconstruction by reframing sinogram processing as a graph-structured problem that encodes acquisition geometry. It introduces GLM, a hybrid network that fuses grid-based CNN filtering with graph-based aggregation, enabling locality in the detector dimension while capturing geometry-dependent relationships across source positions. Empirically, GLM achieves higher quality (SSIM/PSNR) with far fewer trainable parameters and demonstrates robust generalization to unseen acquisition geometries, including sparse-view scenarios, while offering improved memory and training-time scalability. These results suggest that geometry-aware graph representations coupled with targeted aggregation can substantially improve the robustness and efficiency of learned reconstruction in CT, with potential extensions to more complex geometries and weighting schemes.

Abstract

Ensuring proper generalization is a critical challenge in applying data-driven methods for solving inverse problems in imaging, as neural networks reconstructing an image must perform well across varied datasets and acquisition geometries. In X-ray Computed Tomography (CT), convolutional neural networks (CNNs) are widely used to filter the projection data but are ill-suited for this task as they apply grid-based convolutions to the sinogram, which inherently lies on a line manifold, not a regular grid. The CNNs, unaware of the geometry, are implicitly tied to it and require an excessive amount of parameters as they must infer the relations between measurements from the data rather than from prior information. The contribution of this paper is twofold. First, we introduce a graph data structure to represent CT acquisition geometries and tomographic data, providing a detailed explanation of the graph's structure for circular, cone-beam geometries. Second, we propose GLM, a hybrid neural network architecture that leverages both graph and grid convolutions to process tomographic data. We demonstrate that GLM outperforms CNNs when performance is quantified in terms of structural similarity and peak signal-to-noise ratio, despite the fact that GLM uses only a fraction of the trainable parameters. Compared to CNNs, GLM also requires significantly less training time and memory, and its memory requirements scale better. Crucially, GLM demonstrates robust generalization to unseen variations in the acquisition geometry, like when training only on fully sampled CT data and then testing on sparse-view CT data.

Paper Structure

This paper contains 19 sections, 22 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Representing 2D tomography geometry: the source moves on the circle $S^1$ and the detector $\mathcal{D}$ is a line detector. The sampling on $S^1$ commonly varies in applications, whereas the sampling on $\mathcal{D}$ only varies when the detector is replaced. This is illustrated by the dotted curved with large black dots at given angular positions and fixed dashed lines on $\mathcal{D}$. A CNN-based approach assumes that training data has the same sampling of $S^1$ and $\mathcal{D}$ in training and inference, which is usually not the case, thus leading to poor generalisation.
  • Figure 2: Source positions on the first quadrant of a polar grid. For each node, the feature vector corresponds to the data acquired at the associated source position.
  • Figure 3: A GLM module architecture. $f_{0}$ is a plain convolutional layer and $f_1$ is a residual convolutional layer, both followed by a ReLU activation function. The kernels dimension is equal to the detector's dimension. The $\bigoplus$ operator designates the $G$-dependent message-passing step.
  • Figure 4: Comparison of the same slice reconstructed using the baseline CNN approach \ref{['fig:conv2d_fixed_geom']}, the proposed graph-based approach \ref{['fig:GLM_fixed_geom']} and the target \ref{['fig:target_fixed_geom']}. We can see that the CNN-based approach produces a grey background compared to the target and the GLM.
  • Figure 5: Evolution of the PSNR and SSIM against the angular subsampling. The two graphs share the same $x$ axis and the same legend. The solid lines represent the performance of the GLM and the dashed lines the CNNs. The networks with $16$ kernels use a $\times$ marker and the ones with $24$ use a $+$ marker.
  • ...and 3 more figures