Joint Learning of Blind Super-Resolution and Crack Segmentation for Realistic Degraded Images

Yuki Kondo; Norimichi Ukita

Joint Learning of Blind Super-Resolution and Crack Segmentation for Realistic Degraded Images

Yuki Kondo, Norimichi Ukita

TL;DR

This article proposes crack segmentation augmented by super-resolution (SR) with deep neural networks, and proposes two extra paths that further encourage the mutual optimization between SR and segmentation.

Abstract

This paper proposes crack segmentation augmented by super resolution (SR) with deep neural networks. In the proposed method, a SR network is jointly trained with a binary segmentation network in an end-to-end manner. This joint learning allows the SR network to be optimized for improving segmentation results. For realistic scenarios, the SR network is extended from non-blind to blind for processing a low-resolution image degraded by unknown blurs. The joint network is improved by our proposed two extra paths that further encourage the mutual optimization between SR and segmentation. Comparative experiments with State of The Art (SoTA) segmentation methods demonstrate the superiority of our joint learning, and various ablation studies prove the effects of our contributions.

Joint Learning of Blind Super-Resolution and Crack Segmentation for Realistic Degraded Images

TL;DR

Abstract

Paper Structure (18 sections, 8 equations, 15 figures, 6 tables)

This paper contains 18 sections, 8 equations, 15 figures, 6 tables.

Introduction
Related Work
Image Segmentation
Super Resolution (SR)
Joint Learning of SR and Other Tasks
Joint Learning of Blind SR and Crack Segmentation
Joint Learning
Boundary Combo Loss
Segmentation-aware Weights for SR
Blur Skip for Blur-reflected Task Learning
Training Strategy
Experimental Results
Pre-training and Training Details
Synthetically-degraded Crack Images
Comparison with SoTA segmentation methods
...and 3 more sections

Figures (15)

Figure 1: Crack segmentation challenges for synthetically-degraded images given by low resolution and anisotropic Gaussian blur (same experimental conditions as in section \ref{['subsection:exp_detail']}). From an input degradated LR image (a), High-Resolution (HR) segmentation results (c), (d), and (e) are acquired. (c) Independent and (d) Multi-task show the results on images enlarged by non-blind SR "trained independently of segmentation" and "trained with segmentation in a multi-task learning manner," respectively. (b) is the manually-annotated ground-truth (GT) HR segmentation image. In (c), the independently optimized non-blind SR model does not allow sufficient image enhancement to make the segmentation model easy to infer, and the segmentation model does not detect cracks. (d) can detect some cracks, but there are undetected cracks. Our method (e) succeeds in detecting cracks in the most detail.
Figure 2: Combinations of SR and segmentation. (a) Independent learning with non-blind SR bib:SrcNet. (b) Multi-task learning with non-blind SR bib:SrcNet. (c) Our joint learning with blind SR and extra paths called CSBSR. While orange arrows indicate data flows, arrows leading out of the loss functions (i.e., $\mathcal{L}_{S}$ and $\mathcal{L}_{C}$) indicate the back-propagation paths for training. Blue and green arrows indicate the back-propagations given by the SR and segmentation tasks, respectively. Each ellipse indicates a loss or weights given to a certain loss. Our CSBSR is illustrated more in detail in Fig. \ref{['fig:network']}.
Figure 3: Proposed joint learning network with blind SR and segmentation. See the caption of Fig. \ref{['fig:joint_learning']} for the explanations of arrows and ellipses. $\odot$ indicates a pixelwise multiplication operator. While $K^{S}$ (i.e., blur kernel remaining in $I^{S}$) is unavailable and unused in our method, $K^{S}$ is shown for explanation of our blur skip scheme proposed in Sec. \ref{['subsection:method_skip']}.
Figure 4: The structure of our blur skip module using SFT DBLP:conf/cvpr/WangYDL18. Each 3D box and rectangle depict a feature set and a process, respectively. $\odot$ indicates a pixelwise multiplication operator. Conv means convolution layer.
Figure 5: Sample images in the Khanhha dataset bib:Khanhha. The top row is the RGB image treated as $\boldsymbol{I}^H$ in this paper, and the bottom row is the GT of the segmentation.
...and 10 more figures

Joint Learning of Blind Super-Resolution and Crack Segmentation for Realistic Degraded Images

TL;DR

Abstract

Joint Learning of Blind Super-Resolution and Crack Segmentation for Realistic Degraded Images

Authors

TL;DR

Abstract

Table of Contents

Figures (15)