Table of Contents
Fetching ...

LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding

Amirhossein Kazerouni, Soroush Mehraban, Michael Brudno, Babak Taati

TL;DR

The paper tackles the problem of encoding diverse signals into continuous representations without task-specific architectures by leveraging implicit neural representations (INRs). It introduces LIFT, a two-stage framework that uses parallel localized implicit functions and a hierarchical latent generator learned through meta-learning to capture local, intermediate, and global information for generation and classification. An enhanced variant, ReLIFT, adds residual connections and frequency scaling to bridge the convergence-capacity gap and better model high-frequency details. Across 2D and 3D benchmarks, LIFT achieves state-of-the-art reconstruction and competitive generation with substantially reduced computational cost and parameter counts, while ReLIFT demonstrates rapid convergence and strong performance for single-task signal representations and inverse problems.

Abstract

Implicit Neural Representations (INRs) are proving to be a powerful paradigm in unifying task modeling across diverse data domains, offering key advantages such as memory efficiency and resolution independence. Conventional deep learning models are typically modality-dependent, often requiring custom architectures and objectives for different types of signals. However, existing INR frameworks frequently rely on global latent vectors or exhibit computational inefficiencies that limit their broader applicability. We introduce LIFT, a novel, high-performance framework that addresses these challenges by capturing multiscale information through meta-learning. LIFT leverages multiple parallel localized implicit functions alongside a hierarchical latent generator to produce unified latent representations that span local, intermediate, and global features. This architecture facilitates smooth transitions across local regions, enhancing expressivity while maintaining inference efficiency. Additionally, we introduce ReLIFT, an enhanced variant of LIFT that incorporates residual connections and expressive frequency encodings. With this straightforward approach, ReLIFT effectively addresses the convergence-capacity gap found in comparable methods, providing an efficient yet powerful solution to improve capacity and speed up convergence. Empirical results show that LIFT achieves state-of-the-art (SOTA) performance in generative modeling and classification tasks, with notable reductions in computational costs. Moreover, in single-task settings, the streamlined ReLIFT architecture proves effective in signal representations and inverse problem tasks.

LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding

TL;DR

The paper tackles the problem of encoding diverse signals into continuous representations without task-specific architectures by leveraging implicit neural representations (INRs). It introduces LIFT, a two-stage framework that uses parallel localized implicit functions and a hierarchical latent generator learned through meta-learning to capture local, intermediate, and global information for generation and classification. An enhanced variant, ReLIFT, adds residual connections and frequency scaling to bridge the convergence-capacity gap and better model high-frequency details. Across 2D and 3D benchmarks, LIFT achieves state-of-the-art reconstruction and competitive generation with substantially reduced computational cost and parameter counts, while ReLIFT demonstrates rapid convergence and strong performance for single-task signal representations and inverse problems.

Abstract

Implicit Neural Representations (INRs) are proving to be a powerful paradigm in unifying task modeling across diverse data domains, offering key advantages such as memory efficiency and resolution independence. Conventional deep learning models are typically modality-dependent, often requiring custom architectures and objectives for different types of signals. However, existing INR frameworks frequently rely on global latent vectors or exhibit computational inefficiencies that limit their broader applicability. We introduce LIFT, a novel, high-performance framework that addresses these challenges by capturing multiscale information through meta-learning. LIFT leverages multiple parallel localized implicit functions alongside a hierarchical latent generator to produce unified latent representations that span local, intermediate, and global features. This architecture facilitates smooth transitions across local regions, enhancing expressivity while maintaining inference efficiency. Additionally, we introduce ReLIFT, an enhanced variant of LIFT that incorporates residual connections and expressive frequency encodings. With this straightforward approach, ReLIFT effectively addresses the convergence-capacity gap found in comparable methods, providing an efficient yet powerful solution to improve capacity and speed up convergence. Empirical results show that LIFT achieves state-of-the-art (SOTA) performance in generative modeling and classification tasks, with notable reductions in computational costs. Moreover, in single-task settings, the streamlined ReLIFT architecture proves effective in signal representations and inverse problem tasks.

Paper Structure

This paper contains 52 sections, 40 equations, 18 figures, 12 tables.

Figures (18)

  • Figure 1: Overview of the LIFT architecture, illustrating its progressive latent modulation stages. (1) A hierarchy of learnable latents—global, intermediate, and local—is initialized to capture multiscale features. (2) These latents are integrated through hierarchical latent modulation, ensuring smooth transitions across local patch boundaries by sharing information between regions. (3) Shift modulations are produced for each Parallel MLP (P-MLP). (4) P-MLPs, modulated by their respective shift parameters and utilizing sinusoidal activation functions, generate high-fidelity reconstructions.
  • Figure 2: 2D latent interpolation showing gradual attribute changes (zoom in for details).
  • Figure 3: Generated samples along with reconstructed and corresponding ground truth images, with a zoomed-in view for detail.
  • Figure 4: 3D latent interpolation showing continuous object shape transformations.
  • Figure 5: Generated, reconstructed, and ground truth 3D shapes.
  • ...and 13 more figures