Table of Contents
Fetching ...

A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning

Yuelin Zhang, Pengyu Zheng, Wanquan Yan, Chengyu Fang, Shing Shin Cheng

TL;DR

This paper tackles defocus blur in microscopy by addressing two core challenges: long-range cross-scale attention and data deficiency. It introduces a unified framework that combines a Multi-Pyramid Transformer (MPT) with cross-scale window attention (CSWA), intra-scale channel attention (ISCA), and a feature-enhancing feed-forward network (FEFN), together with Extended Frequency Contrastive Regularization (EFCR) to learn from frequency bands and enable cross-domain knowledge transfer. The method is validated on diverse cell and surgical microscopy datasets, including new CaDISBlur and CataBlur datasets, achieving state-of-the-art performance in supervised and unsupervised settings and improving downstream tasks such as cell detection and surgical scene segmentation. The results demonstrate the practical impact of leveraging explicit multi-scale pyramids and frequency-domain contrastive learning for robust microscopy deblurring and cross-domain knowledge transfer, with substantial gains in restoration quality and downstream applicability.

Abstract

Defocus blur is a persistent problem in microscope imaging that poses harm to pathology interpretation and medical intervention in cell microscopy and microscope surgery. To address this problem, a unified framework including the multi-pyramid transformer (MPT) and extended frequency contrastive regularization (EFCR) is proposed to tackle two outstanding challenges in microscopy deblur: longer attention span and data deficiency. The MPT employs an explicit pyramid structure at each network stage that integrates the cross-scale window attention (CSWA), the intra-scale channel attention (ISCA), and the feature-enhancing feed-forward network (FEFN) to capture long-range cross-scale spatial interaction and global channel context. The EFCR addresses the data deficiency problem by exploring latent deblur signals from different frequency bands. It also enables deblur knowledge transfer to learn cross-domain information from extra data, improving deblur performance for labeled and unlabeled data. Extensive experiments and downstream task validation show the framework achieves state-of-the-art performance across multiple datasets. Project page: https://github.com/PieceZhang/MPT-CataBlur.

A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning

TL;DR

This paper tackles defocus blur in microscopy by addressing two core challenges: long-range cross-scale attention and data deficiency. It introduces a unified framework that combines a Multi-Pyramid Transformer (MPT) with cross-scale window attention (CSWA), intra-scale channel attention (ISCA), and a feature-enhancing feed-forward network (FEFN), together with Extended Frequency Contrastive Regularization (EFCR) to learn from frequency bands and enable cross-domain knowledge transfer. The method is validated on diverse cell and surgical microscopy datasets, including new CaDISBlur and CataBlur datasets, achieving state-of-the-art performance in supervised and unsupervised settings and improving downstream tasks such as cell detection and surgical scene segmentation. The results demonstrate the practical impact of leveraging explicit multi-scale pyramids and frequency-domain contrastive learning for robust microscopy deblurring and cross-domain knowledge transfer, with substantial gains in restoration quality and downstream applicability.

Abstract

Defocus blur is a persistent problem in microscope imaging that poses harm to pathology interpretation and medical intervention in cell microscopy and microscope surgery. To address this problem, a unified framework including the multi-pyramid transformer (MPT) and extended frequency contrastive regularization (EFCR) is proposed to tackle two outstanding challenges in microscopy deblur: longer attention span and data deficiency. The MPT employs an explicit pyramid structure at each network stage that integrates the cross-scale window attention (CSWA), the intra-scale channel attention (ISCA), and the feature-enhancing feed-forward network (FEFN) to capture long-range cross-scale spatial interaction and global channel context. The EFCR addresses the data deficiency problem by exploring latent deblur signals from different frequency bands. It also enables deblur knowledge transfer to learn cross-domain information from extra data, improving deblur performance for labeled and unlabeled data. Extensive experiments and downstream task validation show the framework achieves state-of-the-art performance across multiple datasets. Project page: https://github.com/PieceZhang/MPT-CataBlur.
Paper Structure (33 sections, 7 equations, 12 figures, 10 tables)

This paper contains 33 sections, 7 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: Normalized average attention distance of different datasets. The distance of real-world datasets (shown in blue) is significantly smaller than that of microscopy datasets (shown in red), showing the inter-domain feature difference.
  • Figure 2: Overview of MPT. MPT constructs an explicit pyramid block at each stage. Inside the pyramid block, CSWAs constitute a coarse-to-fine pyramid, exploring cross-scale spatial interaction for each scale. The ISCA is built beside each CSWA to provide global channel context. Information from CSWA and ISCA is aggregated by FEFN using the asymmetric activation mechanism.
  • Figure 3: Qualitative evaluation on microscopy deblur. Our method achieves the best restoration of different types of defocus blur.
  • Figure 4: Qualitative evaluation on unsupervised deblur with WNLO (top) and CataBlur (bottom).
  • Figure 5: Downstream tasks result on BBBC006 (top) and CaDISBlur (bottom). Our method leads to less false segmentation.
  • ...and 7 more figures