MID: A Self-supervised Multimodal Iterative Denoising Framework
Chang Nie, Tianchen Deng, Zhe Liu, Hesheng Wang
TL;DR
MID addresses denoising in real-world data with complex, non-linear noise without requiring paired clean references. It introduces a self-supervised, multimodal framework with two networks: a Step Prediction Network $\Psi_\vartheta$ and a Noise Prediction Network $\Phi_\theta$, coupled with a first-order Taylor expansion to linearize non-linear noise accumulation and enable iterative noise subtraction via $s_t \rightarrow s_{t-1}$. Across computer vision tasks, biomedical signals, MRI, and bioinformatics, MID demonstrates robust, detail-preserving denoising and improves downstream tasks without clean ground truth, achieving state-of-the-art or strong cross-domain performance. This approach enables scalable noise removal in domains where acquiring clean data is impractical, offering broad impact for imaging, sensing, and sequence analysis.
Abstract
Data denoising is a persistent challenge across scientific and engineering domains. Real-world data is frequently corrupted by complex, non-linear noise, rendering traditional rule-based denoising methods inadequate. To overcome these obstacles, we propose a novel self-supervised multimodal iterative denoising (MID) framework. MID models the collected noisy data as a state within a continuous process of non-linear noise accumulation. By iteratively introducing further noise, MID learns two neural networks: one to estimate the current noise step and another to predict and subtract the corresponding noise increment. For complex non-linear contamination, MID employs a first-order Taylor expansion to locally linearize the noise process, enabling effective iterative removal. Crucially, MID does not require paired clean-noisy datasets, as it learns noise characteristics directly from the noisy inputs. Experiments across four classic computer vision tasks demonstrate MID's robustness, adaptability, and consistent state-of-the-art performance. Moreover, MID exhibits strong performance and adaptability in tasks within the biomedical and bioinformatics domains.
