Table of Contents
Fetching ...

MID: A Self-supervised Multimodal Iterative Denoising Framework

Chang Nie, Tianchen Deng, Zhe Liu, Hesheng Wang

TL;DR

MID addresses denoising in real-world data with complex, non-linear noise without requiring paired clean references. It introduces a self-supervised, multimodal framework with two networks: a Step Prediction Network $\Psi_\vartheta$ and a Noise Prediction Network $\Phi_\theta$, coupled with a first-order Taylor expansion to linearize non-linear noise accumulation and enable iterative noise subtraction via $s_t \rightarrow s_{t-1}$. Across computer vision tasks, biomedical signals, MRI, and bioinformatics, MID demonstrates robust, detail-preserving denoising and improves downstream tasks without clean ground truth, achieving state-of-the-art or strong cross-domain performance. This approach enables scalable noise removal in domains where acquiring clean data is impractical, offering broad impact for imaging, sensing, and sequence analysis.

Abstract

Data denoising is a persistent challenge across scientific and engineering domains. Real-world data is frequently corrupted by complex, non-linear noise, rendering traditional rule-based denoising methods inadequate. To overcome these obstacles, we propose a novel self-supervised multimodal iterative denoising (MID) framework. MID models the collected noisy data as a state within a continuous process of non-linear noise accumulation. By iteratively introducing further noise, MID learns two neural networks: one to estimate the current noise step and another to predict and subtract the corresponding noise increment. For complex non-linear contamination, MID employs a first-order Taylor expansion to locally linearize the noise process, enabling effective iterative removal. Crucially, MID does not require paired clean-noisy datasets, as it learns noise characteristics directly from the noisy inputs. Experiments across four classic computer vision tasks demonstrate MID's robustness, adaptability, and consistent state-of-the-art performance. Moreover, MID exhibits strong performance and adaptability in tasks within the biomedical and bioinformatics domains.

MID: A Self-supervised Multimodal Iterative Denoising Framework

TL;DR

MID addresses denoising in real-world data with complex, non-linear noise without requiring paired clean references. It introduces a self-supervised, multimodal framework with two networks: a Step Prediction Network and a Noise Prediction Network , coupled with a first-order Taylor expansion to linearize non-linear noise accumulation and enable iterative noise subtraction via . Across computer vision tasks, biomedical signals, MRI, and bioinformatics, MID demonstrates robust, detail-preserving denoising and improves downstream tasks without clean ground truth, achieving state-of-the-art or strong cross-domain performance. This approach enables scalable noise removal in domains where acquiring clean data is impractical, offering broad impact for imaging, sensing, and sequence analysis.

Abstract

Data denoising is a persistent challenge across scientific and engineering domains. Real-world data is frequently corrupted by complex, non-linear noise, rendering traditional rule-based denoising methods inadequate. To overcome these obstacles, we propose a novel self-supervised multimodal iterative denoising (MID) framework. MID models the collected noisy data as a state within a continuous process of non-linear noise accumulation. By iteratively introducing further noise, MID learns two neural networks: one to estimate the current noise step and another to predict and subtract the corresponding noise increment. For complex non-linear contamination, MID employs a first-order Taylor expansion to locally linearize the noise process, enabling effective iterative removal. Crucially, MID does not require paired clean-noisy datasets, as it learns noise characteristics directly from the noisy inputs. Experiments across four classic computer vision tasks demonstrate MID's robustness, adaptability, and consistent state-of-the-art performance. Moreover, MID exhibits strong performance and adaptability in tasks within the biomedical and bioinformatics domains.

Paper Structure

This paper contains 24 sections, 19 equations, 21 figures, 1 table, 2 algorithms.

Figures (21)

  • Figure 1: Overview of the MID denoising framework. MID processes raw noisy data from diverse modalities and domains. The framework first estimates the noise severity and then executes iterative denoising steps to progressively restore the clean data.
  • Figure 2: The MID training and denoising pipeline, illustrated with a model-fitting outlier task. The denoising of outliers in the model fitting task serves as an example. (a) During self-supervised training, raw data is treated as an initial state in a continuous, non-linear noise addition process. By linearizing this process with a Taylor expansion, MID learns to recognize noise steps and features by repeatedly adding noise. (b) For denoising, the step prediction network estimates the noise step of input. The reversible linearized process is then used to iteratively predict and subtract noise, effectively denoising the data.
  • Figure 3: Neural network architectures for MID.(a) For image data, a CNN-based architecture is used. The network $\Psi_\vartheta$ (CNN backbone + FC layers) processes the input image to estimate the noise level ($t$). Subsequently, the network $\Phi_\theta$ (CNN encoder-decoder) is applied iteratively, starting from the estimated step $\hat{t}$, to predict and remove noise. (b) For point cloud data and 1D signals, a Transformer-based architecture is employed. The network $\Psi_\vartheta$ (FC layers) estimates the noise level ($t$), and the network $\Phi_\theta$ (Transformer encoder-decoder) performs the iterative noise prediction and removal starting from $\hat{t}$.
  • Figure 4: Denoising performance on the BSD300 dataset. Quantitative evaluation using Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) demonstrates that MID effectively denoises the BSD300 dataset and achieves superior performance compared to other methods.
  • Figure 5: Denoising performance on BSD68 grayscale images. This figure presents a quantitative comparison (PSNR and SSIM) of denoising performance for various methods applied to the BSD68 dataset. MID demonstrates significantly superior denoising performance compared to Blind2Unblind.
  • ...and 16 more figures