Transformer with Selective Shuffled Position Embedding and Key-Patch Exchange Strategy for Early Detection of Knee Osteoarthritis

Zhe Wang; Aladine Chetouani; Mohamed Jarraya; Didier Hans; Rachid Jennane

Transformer with Selective Shuffled Position Embedding and Key-Patch Exchange Strategy for Early Detection of Knee Osteoarthritis

Zhe Wang, Aladine Chetouani, Mohamed Jarraya, Didier Hans, Rachid Jennane

TL;DR

KOA early detection in radiographs is challenged by scarce labeled data. The authors propose a ViT-based framework with Selective Shuffled Position Embedding (SSPE), a key-patch exchange augmentation, and a hybrid loss to discriminate between $KL$-0 and $KL$-2 radiographs. On Osteoarthritis Initiative (OAI) knee X-rays, the method outperforms baselines and concentrates on OA-relevant regions, as shown by Grad-CAM analyses, with demonstrated applicability across ViT backbones. This approach offers a data-efficient, scalable route toward computer-aided KOA screening with potential clinical impact.

Abstract

Knee OsteoArthritis (KOA) is a widespread musculoskeletal disorder that can severely impact the mobility of older individuals. Insufficient medical data presents a significant obstacle for effectively training models due to the high cost associated with data labelling. Currently, deep learning-based models extensively utilize data augmentation techniques to improve their generalization ability and alleviate overfitting. However, conventional data augmentation techniques are primarily based on the original data and fail to introduce substantial diversity to the dataset. In this paper, we propose a novel approach based on the Vision Transformer (ViT) model with original Selective Shuffled Position Embedding (SSPE) and key-patch exchange strategies to obtain different input sequences as a method of data augmentation for early detection of KOA (KL-0 vs KL-2). More specifically, we fix and shuffle the position embedding of key and non-key patches, respectively. Then, for the target image, we randomly select other candidate images from the training set to exchange their key patches and thus obtain different input sequences. Finally, a hybrid loss function is developed by incorporating multiple loss functions for different types of the sequences. According to the experimental results, the generated data are considered valid as they lead to a notable improvement in the model's classification performance.

Transformer with Selective Shuffled Position Embedding and Key-Patch Exchange Strategy for Early Detection of Knee Osteoarthritis

TL;DR

-0 and

-2 radiographs. On Osteoarthritis Initiative (OAI) knee X-rays, the method outperforms baselines and concentrates on OA-relevant regions, as shown by Grad-CAM analyses, with demonstrated applicability across ViT backbones. This approach offers a data-efficient, scalable route toward computer-aided KOA screening with potential clinical impact.

Abstract

Paper Structure (22 sections, 8 equations, 9 figures, 5 tables)

This paper contains 22 sections, 8 equations, 9 figures, 5 tables.

Introduction
Proposed Method
Classical ViT model
Selective Shuffled Position Embedding
Key-patch exchange strategy
Hybrid loss strategy
Experiments
Public knee database
Data preprocessing
Experimental details
Results and discussion
Selection of position embedding and key patches
Settings of the proposed key-patch exchange strategy
Selection of the hyper-parameters
Effects of the match number
...and 7 more sections

Figures (9)

Figure 1: The global flowchart of this study. The data-flow is illustrated using black arrows. Green and purple blocks represent the fixed and shuffled embedding positions, respectively. For each target image $I_t$, after applying our proposed SSPE (Section \ref{['SSPE']}), $N$ candidate images ($I_{c1}, I_{c2}, ... I_{cn}$) are introduced for the proposed key-patch exchange operation (Section \ref{['ROI_exchange']}). To simplify the representation, only one candidate image $I_{cn}$ and four resulting sequences are displayed in this flowchart. The encoder module learns these sequences along with their corresponding defined labels using the hybrid loss (Section \ref{['Hybrid_loss']}).
Figure 2: The structure of the classical ViT network.
Figure 3: The structure of the encoder module.
Figure 4: Classical position embedding (row 1), SSPE strategy and highlighted key patches (row 2), and the final position embedding of this sequence as input of ViT (row 3). Key patches $\#4$ and $\#6$ remain unchanged during the process.
Figure 5: Proposed key-patch exchange strategy. For convenience, only the image $I_{c1}$ among $N$ candidates is shown during the key-patch exchange operation along with the target image $I_t$. As shown, blue and red patches are the key ones of the candidate and target images, respectively. After the key-patch exchange operation, four different sequences are obtained by each candidate image. The label for each obtained sequence is defined in the following Section \ref{['Hybrid_loss']}.
...and 4 more figures

Transformer with Selective Shuffled Position Embedding and Key-Patch Exchange Strategy for Early Detection of Knee Osteoarthritis

TL;DR

Abstract

Transformer with Selective Shuffled Position Embedding and Key-Patch Exchange Strategy for Early Detection of Knee Osteoarthritis

Authors

TL;DR

Abstract

Table of Contents

Figures (9)