Table of Contents
Fetching ...

Layer Separation: Adjustable Joint Space Width Images Synthesis in Conventional Radiography

Haolin Wang, Yafei Ou, Prasoon Ambalathankandy, Gen Ota, Pengyu Dai, Masayuki Ikebe, Kenji Suzuki, Tamotsu Kamishima

TL;DR

The paper addresses data quality and annotation bottlenecks in RA JSW analysis by introducing Layer Separation Networks (LSN) to separate soft tissue and bone layers in finger radiographs and synthesize adjustable JSW images with ground-truth annotations. LSN combines a generation network, segmentation supervision, soft-tissue discrimination, random shifting, and radiography-consistent reconstruction, enabling two-stage training with pseudo-images and yielding layer images $L=igl\{L_0,L_1,L_2\bigr\}$. The approach achieves realistic reconstructions, robust layer separation even with overlaps, and improves downstream tasks (JSN progress, JSW quantification, SvdH-like scoring) through synthetic-data pre-training, while maintaining clinical plausibility validated by a Visual Turing Test. The work demonstrates that synthetic data can address imbalanced JSW distributions and annotation scarcity, potentially accelerating RA-CAD development and achieving more robust disease monitoring. Code and dataset availability are anticipated, amplifying practical impact for clinical radiology and machine learning research.

Abstract

Rheumatoid arthritis (RA) is a chronic autoimmune disease characterized by joint inflammation and progressive structural damage. Joint space width (JSW) is a critical indicator in conventional radiography for evaluating disease progression, which has become a prominent research topic in computer-aided diagnostic (CAD) systems. However, deep learning-based radiological CAD systems for JSW analysis face significant challenges in data quality, including data imbalance, limited variety, and annotation difficulties. This work introduced a challenging image synthesis scenario and proposed Layer Separation Networks (LSN) to accurately separate the soft tissue layer, the upper bone layer, and the lower bone layer in conventional radiographs of finger joints. Using these layers, the adjustable JSW images can be synthesized to address data quality challenges and achieve ground truth (GT) generation. Experimental results demonstrated that LSN-based synthetic images closely resemble real radiographs, and significantly enhanced the performance in downstream tasks. The code and dataset will be available.

Layer Separation: Adjustable Joint Space Width Images Synthesis in Conventional Radiography

TL;DR

The paper addresses data quality and annotation bottlenecks in RA JSW analysis by introducing Layer Separation Networks (LSN) to separate soft tissue and bone layers in finger radiographs and synthesize adjustable JSW images with ground-truth annotations. LSN combines a generation network, segmentation supervision, soft-tissue discrimination, random shifting, and radiography-consistent reconstruction, enabling two-stage training with pseudo-images and yielding layer images . The approach achieves realistic reconstructions, robust layer separation even with overlaps, and improves downstream tasks (JSN progress, JSW quantification, SvdH-like scoring) through synthetic-data pre-training, while maintaining clinical plausibility validated by a Visual Turing Test. The work demonstrates that synthetic data can address imbalanced JSW distributions and annotation scarcity, potentially accelerating RA-CAD development and achieving more robust disease monitoring. Code and dataset availability are anticipated, amplifying practical impact for clinical radiology and machine learning research.

Abstract

Rheumatoid arthritis (RA) is a chronic autoimmune disease characterized by joint inflammation and progressive structural damage. Joint space width (JSW) is a critical indicator in conventional radiography for evaluating disease progression, which has become a prominent research topic in computer-aided diagnostic (CAD) systems. However, deep learning-based radiological CAD systems for JSW analysis face significant challenges in data quality, including data imbalance, limited variety, and annotation difficulties. This work introduced a challenging image synthesis scenario and proposed Layer Separation Networks (LSN) to accurately separate the soft tissue layer, the upper bone layer, and the lower bone layer in conventional radiographs of finger joints. Using these layers, the adjustable JSW images can be synthesized to address data quality challenges and achieve ground truth (GT) generation. Experimental results demonstrated that LSN-based synthetic images closely resemble real radiographs, and significantly enhanced the performance in downstream tasks. The code and dataset will be available.

Paper Structure

This paper contains 24 sections, 13 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: The adjustable JSW synthetic images are generated by producing layer images, following random shifting of the bone layers, and reconstruction with soft tissue layer. Original Images: imbalanced distribution of JSW, limited semantic variety, and difficulty in manual annotation. Synthetic Data: balanced distribution, enhanced semantic variety, and generative ground truth (GT) annotations.
  • Figure 2: Layer Separation: it generates the layer images of upper and lower bones and soft tissues based on a single image. The LSN consists of five main components: a generation network $\mathcal{N_G}$, a supervision network $\mathcal{N_S}$, a discrimination network $\mathcal{N_D}$, a random shifting function $f_s$ and a reconstruction function $f_r$. The generation process is performed as follows: (i) The $\mathcal{N_G}$ processes the original joint images and the corresponding bone masks as input to generate the layer images. (ii) The layer images are processed through a $f_r$ to obtain a reconstruction image. (iii) The layer images are processed through functions $f_r$ and $f_s$ to generate a shifted reconstruction image, which serves as input to the $\mathcal{N_S}$, yielding segmentation masks for the upper, lower bones and the soft tissue. (iv) The soft tissue layer image is extracted and input into the $\mathcal{N_D}$, producing a regional segmentation mask for bone shadows. (v) Construct a hybrid loss function including the discrepancy $\mathcal{L}_1$ between the $\mathcal{N_S}$ mask and the GT, the discrepancy $\mathcal{L}_0$ between the reconstructed image and the original image, and the dual discrepancy $\mathcal{L}_2$ between the mask from the soft tissue $\mathcal{N_D}$ and the GT. (vi) The LSN training is conducted in two stages. In the first-training stage, using pseudo-images, the discrepancy $\mathcal{L}_3$ between the pseudo and reconstructed bone layers (with and without random shifting) is incorporated into the original loss function. Subsequently, a second -training stage is performed in both real and pseudo-images using the original loss function.
  • Figure 3: Visualization results of our ablation study. (A) Real Joint image; (B) Shifted Reconstruction Joint Image; (C) Upper Bone Layer; (D) Lower bone layer; (E) Soft Tissue Layer; (F) MSE Spectrum (Reconstruction Joint Image v.s. A).
  • Figure 4: Visualization results of our ablation study. (A) Real Joint image; (B) Shifted Reconstruction Joint Image; (C) Upper Bone Layer; (D) Lower Bone Layer; (E) Soft Tissue Layer; (F) MSE Spectrum (Reconstruction Joint Image v.s. A).
  • Figure 5: Original and Synthetic images in Visual Turing Test.
  • ...and 1 more figures