Table of Contents
Fetching ...

LoFi: Neural Local Fields for Scalable Image Reconstruction

AmirEhsan Khorashadizadeh, Tobías I. Liaudat, Tianlin Liu, Jason D. McEwen, Ivan Dokmanić

TL;DR

LoFi addresses scalable inverse-problem imaging by using a coordinate-based local-field approach that predicts each pixel from a local neighborhood via an MLP. It achieves continuous, resolution-agnostic outputs with memory usage largely independent of image size, enabling high-resolution training on modest hardware. Architectural components include learnable patch geometry, differentiable patch extraction, coordinate-conditioned patch localization (CCPG), and a learnable Fourier-domain noise filter, with extensions INR-LoFi and LoFi-ADMM. Empirically, LoFi matches or surpasses CNNs and ViTs on LDCT denoising and dark matter mapping, generalizes well to out-of-distribution data, and supports high-resolution reconstruction with reduced memory and flexible plug-and-play priors.

Abstract

Neural fields or implicit neural representations (INRs) have attracted significant attention in computer vision and imaging due to their efficient coordinate-based representation of images and 3D volumes. In this work, we introduce a coordinate-based framework for solving imaging inverse problems, termed LoFi (Local Field). Unlike conventional methods for image reconstruction, LoFi processes local information at each coordinate separately by multi-layer perceptrons (MLPs), recovering the object at that specific coordinate. Similar to INRs, LoFi can recover images at any continuous coordinate, enabling image reconstruction at multiple resolutions. With comparable or better performance than standard deep learning models like convolutional neural networks (CNNs) and vision transformers (ViTs), LoFi achieves excellent generalization to out-of-distribution data with memory usage almost independent of image resolution. Remarkably, training on 1024x1024 images requires less than 200MB of memory -- much below standard CNNs and ViTs. Additionally, LoFi's local design allows it to train on extremely small datasets with 10 samples or fewer, without overfitting and without the need for explicit regularization or early stopping.

LoFi: Neural Local Fields for Scalable Image Reconstruction

TL;DR

LoFi addresses scalable inverse-problem imaging by using a coordinate-based local-field approach that predicts each pixel from a local neighborhood via an MLP. It achieves continuous, resolution-agnostic outputs with memory usage largely independent of image size, enabling high-resolution training on modest hardware. Architectural components include learnable patch geometry, differentiable patch extraction, coordinate-conditioned patch localization (CCPG), and a learnable Fourier-domain noise filter, with extensions INR-LoFi and LoFi-ADMM. Empirically, LoFi matches or surpasses CNNs and ViTs on LDCT denoising and dark matter mapping, generalizes well to out-of-distribution data, and supports high-resolution reconstruction with reduced memory and flexible plug-and-play priors.

Abstract

Neural fields or implicit neural representations (INRs) have attracted significant attention in computer vision and imaging due to their efficient coordinate-based representation of images and 3D volumes. In this work, we introduce a coordinate-based framework for solving imaging inverse problems, termed LoFi (Local Field). Unlike conventional methods for image reconstruction, LoFi processes local information at each coordinate separately by multi-layer perceptrons (MLPs), recovering the object at that specific coordinate. Similar to INRs, LoFi can recover images at any continuous coordinate, enabling image reconstruction at multiple resolutions. With comparable or better performance than standard deep learning models like convolutional neural networks (CNNs) and vision transformers (ViTs), LoFi achieves excellent generalization to out-of-distribution data with memory usage almost independent of image resolution. Remarkably, training on 1024x1024 images requires less than 200MB of memory -- much below standard CNNs and ViTs. Additionally, LoFi's local design allows it to train on extremely small datasets with 10 samples or fewer, without overfitting and without the need for explicit regularization or early stopping.

Paper Structure

This paper contains 42 sections, 21 equations, 17 figures, 1 table, 1 algorithm.

Figures (17)

  • Figure 1: The memory and time requirements during training for different models; LoFi is significantly faster and more memory-efficient than CNNs and ViTs. Notably, LoFi's memory usage remains almost independent of image resolution, making it an ideal choice for high-dimensional image reconstruction. All experiments are conducted using a single A100 GPU with 80GB memory. Missing data points indicate that the corresponding model exceeds the GPU memory capacity for the given resolution.
  • Figure 2: LoFi; the neural network $\text{NN}_\theta$, typically composed of MLP modules, processes the local information extracted from the observed image around the given pixel $(x,y)$. LoFi's inductive bias for image reconstruction brings strong generalization on OOD data, it requires less training data and uses small memory when training on high-resolution images.
  • Figure 3: MultiMLP architecture; the input information is split into smaller chunks each processed with a separate MLP, the extracted information is then mixed by another MLP.
  • Figure 4: Performance comparison on LDCT (30dB noise) at resolution $128 \times 128$ for in-distribution (chest) and OOD (brain) data.
  • Figure 5: Comparative analysis for image denoising at resolution $512 \times 512$ where different models are trained on a tiny dataset with 9 training samples. The PSNR of the reconstructed test samples is demonstrated per iterations during training. CNNs, in particular multiscale versions, show severe overfitting while LoFi shows a robust convergence and significantly outperforms CNNs thanks to its locality design.
  • ...and 12 more figures