Table of Contents
Fetching ...

Self-STORM: Deep Unrolled Self-Supervised Learning for Super-Resolution Microscopy

Yair Ben Sahel, Yonina C. Eldar

TL;DR

The proposed method exceeds the performance of its supervised counterparts, thus allowing for robust, dynamic imaging well below the diffraction limit without any labeled training samples, and can be utilized to enhance generalization in any sparse recovery framework, without the need for external training data.

Abstract

The use of fluorescent molecules to create long sequences of low-density, diffraction-limited images enables highly-precise molecule localization. However, this methodology requires lengthy imaging times, which limits the ability to view dynamic interactions of live cells on short time scales. Many techniques have been developed to reduce the number of frames needed for localization, from classic iterative optimization to deep neural networks. Particularly, deep algorithm unrolling utilizes both the structure of iterative sparse recovery algorithms and the performance gains of supervised deep learning. However, the robustness of this approach is highly dependant on having sufficient training data. In this paper we introduce deep unrolled self-supervised learning, which alleviates the need for such data by training a sequence-specific, model-based autoencoder that learns only from given measurements. Our proposed method exceeds the performance of its supervised counterparts, thus allowing for robust, dynamic imaging well below the diffraction limit without any labeled training samples. Furthermore, the suggested model-based autoencoder scheme can be utilized to enhance generalization in any sparse recovery framework, without the need for external training data.

Self-STORM: Deep Unrolled Self-Supervised Learning for Super-Resolution Microscopy

TL;DR

The proposed method exceeds the performance of its supervised counterparts, thus allowing for robust, dynamic imaging well below the diffraction limit without any labeled training samples, and can be utilized to enhance generalization in any sparse recovery framework, without the need for external training data.

Abstract

The use of fluorescent molecules to create long sequences of low-density, diffraction-limited images enables highly-precise molecule localization. However, this methodology requires lengthy imaging times, which limits the ability to view dynamic interactions of live cells on short time scales. Many techniques have been developed to reduce the number of frames needed for localization, from classic iterative optimization to deep neural networks. Particularly, deep algorithm unrolling utilizes both the structure of iterative sparse recovery algorithms and the performance gains of supervised deep learning. However, the robustness of this approach is highly dependant on having sufficient training data. In this paper we introduce deep unrolled self-supervised learning, which alleviates the need for such data by training a sequence-specific, model-based autoencoder that learns only from given measurements. Our proposed method exceeds the performance of its supervised counterparts, thus allowing for robust, dynamic imaging well below the diffraction limit without any labeled training samples. Furthermore, the suggested model-based autoencoder scheme can be utilized to enhance generalization in any sparse recovery framework, without the need for external training data.
Paper Structure (15 sections, 7 equations, 7 figures, 2 algorithms)

This paper contains 15 sections, 7 equations, 7 figures, 2 algorithms.

Figures (7)

  • Figure 1: Architecture of the model-based autoencoder. First, the input $\mathbf{y}_i$ is upsampled according to the given scale factor. Then, it is fed into a LISTA-based encoder, which performs sparse recovery to approximate $\mathbf{x}_i$. It is comprised of convolutional layers with trainable weights $\{\mathbf{W_{0}}^{(i)}, \mathbf{W}^{(i)}\}_{i=1,2}$, and activation layers $S^{+}_{\alpha_0, \beta_0}(\cdot)$ with trainable parameters $\{\alpha_{0}^{(i)}, \beta_{0}^{(i)}\}_{i=1...3}$. The first three layers (shown in blue) produce an initial approximation $\mathbf{\hat{x}^{(0)}}_i$, which is then iteratively modified via the latter four layers (shown in orange). After $k_{max}$ iterations, the approximated sparse code $\mathbf{\hat{x}^{(k_{max})}}_i$ is fed through a decoder that mimics the physical measurement process, via convolution with learned filters. It has two blocks of convolutional layers with trainable weights $\{\mathbf{W_{D}}^{(i)}\}_{i=1,2}$, followed by ReLU activations. The decoder outputs $\mathbf{\hat{y}}_i$, which is an approximation of the input image.
  • Figure 2: Super-resolved reconstruction of a simulated microtubules dataset smlm_rev, composed of 361 high density frames. (a) Ground truth. (b) Self-STORM reconstruction. (c) Deep-STORM reconstruction. (d) DECODE reconstruction. (e) ZSSR reconstruction. (f) LSPARCOM reconstruction. (g) SPARCOM reconstruction executed over 100 iterations with $\lambda = 0.0105$. SNR is shown in the upper-left corner of each reconstructed image. It is evident that LSPARCOM gives the best results, as it was trained on data with identical ground-truth structure. Self-STORM and Deep-STORM achieve similar visual reconstruction quality, while all other methods yield far less accurate results.
  • Figure 3: Super-resolved reconstruction of a simulated microtubules dataset smlm_rev, composed of 2500 high density frames. (a) Ground truth. (b) Self-STORM reconstruction. (c) Deep-STORM reconstruction. (d) DECODE reconstruction. (e) ZSSR reconstruction. (f) LSPARCOM reconstruction. (g) SPARCOM reconstruction executed over 100 iterations with $\lambda = 0.003$. SNR is shown in the upper-left corner of each reconstructed image. In this case, Self-STORM provides an accurate reconstruction of the ground-truth, while all other methods fail to achieve similar reconstruction quality.
  • Figure 4: Super-resolved reconstruction of a simulated microtubules dataset smlm_rev, composed of 36 ultra-high density frames, generated by summing every 10 consecutive frames of the original dataset. (a) Ground truth. (b) Self-STORM reconstruction. (c) Deep-STORM reconstruction. (d) DECODE reconstruction. (e) ZSSR reconstruction. (f) LSPARCOM reconstruction. (g) SPARCOM reconstruction executed over 100 iterations with $\lambda = 0.00105$. SNR is shown in the upper-left corner of each reconstructed image. Even with as few as 36 frames of ultra-high emitter density, Self-STORM manages to successfully reconstruct the underlying structure of the image without using any prior training data. Similar to Figure \ref{['fig:BTHD']}, it outperforms every other method besides LSPARCOM, which was on data with identical ground-truth structure.
  • Figure 5: Super-resolved reconstruction of a simulated microtubules dataset smlm_rev, composed of 250 ultra-high density frames, generated by summing every 10 consecutive frames of the original dataset. (a) Ground truth. (b) Self-STORM reconstruction. (c) Deep-STORM reconstruction. (d) DECODE reconstruction. (e) ZSSR reconstruction. (f) LSPARCOM reconstruction. (g) SPARCOM reconstruction executed over 100 iterations with $\lambda = 0.0003$. SNR is shown in the upper-left corner of each reconstructed image. Depsite the ultra-high density of emitters, Self-STORM provides a fairly accurate reconstruction of the ground-truth, suffering from slight degradation in quality compared to the results in Figure \ref{['fig:MTO']}. All other methods perform far worse, both in terms of SNR and visual resemblance of the ground-truth image.
  • ...and 2 more figures