Table of Contents
Fetching ...

PARMESAN: Parameter-Free Memory Search and Transduction for Dense Prediction Tasks

Philip Matthias Winter, Maria Wimmer, David Major, Dimitrios Lenis, Astrid Berg, Theresa Neubauer, Gaia Romana De Paolis, Johannes Novotny, Sophia Ulonska, Katja Bühler

TL;DR

PARMESAN addresses the need for flexible adaptation of dense prediction models without continual parameter updates. It introduces a parameter-free transduction framework that uses a frozen feature extractor, a memory module, and hierarchical memory search to predict labels, with an intra-query message passing refinement and a novelty-based sparsity scheme for memory efficiency. The approach enables rapid learning and unlearning via memory consolidation, demonstrated in continual learning setups where it achieves 3-4 orders of magnitude faster learning than baselines while maintaining competitive predictive performance and stable knowledge retention. The method generalizes across 1D, 2D, and 3D grid data and is demonstrated on semantic segmentation and depth estimation tasks, highlighting its practical impact for test-time learning, domain adaptation, and few-shot scenarios.

Abstract

This work addresses flexibility in deep learning by means of transductive reasoning. For adaptation to new data and tasks, e.g., in continual learning, existing methods typically involve tuning learnable parameters or complete re-training from scratch, rendering such approaches unflexible in practice. We argue that the notion of separating computation from memory by the means of transduction can act as a stepping stone for solving these issues. We therefore propose PARMESAN (parameter-free memory search and transduction), a scalable method which leverages a memory module for solving dense prediction tasks. At inference, hidden representations in memory are being searched to find corresponding patterns. In contrast to other methods that rely on continuous training of learnable parameters, PARMESAN learns via memory consolidation simply by modifying stored contents. Our method is compatible with commonly used architectures and canonically transfers to 1D, 2D, and 3D grid-based data. The capabilities of our approach are demonstrated at the complex task of continual learning. PARMESAN learns by 3-4 orders of magnitude faster than established baselines while being on par in terms of predictive performance, hardware-efficiency, and knowledge retention.

PARMESAN: Parameter-Free Memory Search and Transduction for Dense Prediction Tasks

TL;DR

PARMESAN addresses the need for flexible adaptation of dense prediction models without continual parameter updates. It introduces a parameter-free transduction framework that uses a frozen feature extractor, a memory module, and hierarchical memory search to predict labels, with an intra-query message passing refinement and a novelty-based sparsity scheme for memory efficiency. The approach enables rapid learning and unlearning via memory consolidation, demonstrated in continual learning setups where it achieves 3-4 orders of magnitude faster learning than baselines while maintaining competitive predictive performance and stable knowledge retention. The method generalizes across 1D, 2D, and 3D grid data and is demonstrated on semantic segmentation and depth estimation tasks, highlighting its practical impact for test-time learning, domain adaptation, and few-shot scenarios.

Abstract

This work addresses flexibility in deep learning by means of transductive reasoning. For adaptation to new data and tasks, e.g., in continual learning, existing methods typically involve tuning learnable parameters or complete re-training from scratch, rendering such approaches unflexible in practice. We argue that the notion of separating computation from memory by the means of transduction can act as a stepping stone for solving these issues. We therefore propose PARMESAN (parameter-free memory search and transduction), a scalable method which leverages a memory module for solving dense prediction tasks. At inference, hidden representations in memory are being searched to find corresponding patterns. In contrast to other methods that rely on continuous training of learnable parameters, PARMESAN learns via memory consolidation simply by modifying stored contents. Our method is compatible with commonly used architectures and canonically transfers to 1D, 2D, and 3D grid-based data. The capabilities of our approach are demonstrated at the complex task of continual learning. PARMESAN learns by 3-4 orders of magnitude faster than established baselines while being on par in terms of predictive performance, hardware-efficiency, and knowledge retention.
Paper Structure (15 sections, 4 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 15 sections, 4 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of our parameter-free transduction method for dense prediction tasks. A feature extractor $F$ is applied to a query $x_q$ to obtain hidden representations $(h)_q^l$. A memory $M$ stores labeled samples along with their representations $(h)_j^l$. Our approach then performs a hierarchical search in $M$, where the objective is to find globally and locally similar nodes w.r.t. $x_q$. In each level $l$, we keep the top-$k$ most similar nodes and retrieve their child nodes in level $l-1$. Labels from most similar nodes are retrieved to obtain a raw prediction. Then, message passing (MP) is used to get a final prediction $y_q$.
  • Figure 1: Annotations of Cityscapes dataset (original image, depth map, fine semantic map, coarse semantic map). While coarse annotations only outline the most prominent objects via polygons, pixel-exact fine annotations are regarded as the ground truth.
  • Figure 2: Memory sample feature pyramid with $n_{sp}=2$ sparse levels obtained via iterative, local novelty search (indicated by circles). Filled nodes are kept in $M$.
  • Figure 2: Model predictions for all ablation studies with two different query inputs.
  • Figure 3: Model predictions and analysis of PARMESAN. Left: We study top-$1$ nearest neighbors from experiment A1. The idx panel visualizes indices of retrieved nearest neighbor nodes from $M$. Similar colors indicate labels retrieved from close-by memory locations, most likely from within same samples. The sim panel visualizes $s_{acc}^1$, where bright regions refer to high similarities. We also show 3 query pixels and their matches in $M$, indicating both global and local semantic correspondence. Right: Flexible use of PARMESAN in various setups. Bottom: Depth estimation on CITY and semantic segmentation on JSRT.
  • ...and 2 more figures