Table of Contents
Fetching ...

HyperFlow: Gradient-Free Emulation of Few-Shot Fine-Tuning

Donggyun Kim, Chanwoo Kim, Seunghoon Hong

TL;DR

HyperFlow tackles the high compute and memory cost of test-time fine-tuning in few-shot classification by learning a gradient-free surrogate for gradient descent. It trains a conditional drift network to predict the parameter drift conditioned on a small support set, and performs adaptation via a short ODE integration, updating only a compact subset of target parameters through bias-tuning. The method yields substantial improvements in out-of-domain performance on Meta-Dataset and CDFSL benchmarks while incurring only a small fraction of the memory and time costs of standard fine-tuning. This establishes a practical middle ground between direct transfer and full fine-tuning, enabling real-time or resource-constrained adaptation for cross-domain FSC tasks.

Abstract

While test-time fine-tuning is beneficial in few-shot learning, the need for multiple backpropagation steps can be prohibitively expensive in real-time or low-resource scenarios. To address this limitation, we propose an approach that emulates gradient descent without computing gradients, enabling efficient test-time adaptation. Specifically, we formulate gradient descent as an Euler discretization of an ordinary differential equation (ODE) and train an auxiliary network to predict the task-conditional drift using only the few-shot support set. The adaptation then reduces to a simple numerical integration (e.g., via the Euler method), which requires only a few forward passes of the auxiliary network -- no gradients or forward passes of the target model are needed. In experiments on cross-domain few-shot classification using the Meta-Dataset and CDFSL benchmarks, our method significantly improves out-of-domain performance over the non-fine-tuned baseline while incurring only 6\% of the memory cost and 0.02\% of the computation time of standard fine-tuning, thus establishing a practical middle ground between direct transfer and fully fine-tuned approaches.

HyperFlow: Gradient-Free Emulation of Few-Shot Fine-Tuning

TL;DR

HyperFlow tackles the high compute and memory cost of test-time fine-tuning in few-shot classification by learning a gradient-free surrogate for gradient descent. It trains a conditional drift network to predict the parameter drift conditioned on a small support set, and performs adaptation via a short ODE integration, updating only a compact subset of target parameters through bias-tuning. The method yields substantial improvements in out-of-domain performance on Meta-Dataset and CDFSL benchmarks while incurring only a small fraction of the memory and time costs of standard fine-tuning. This establishes a practical middle ground between direct transfer and full fine-tuning, enabling real-time or resource-constrained adaptation for cross-domain FSC tasks.

Abstract

While test-time fine-tuning is beneficial in few-shot learning, the need for multiple backpropagation steps can be prohibitively expensive in real-time or low-resource scenarios. To address this limitation, we propose an approach that emulates gradient descent without computing gradients, enabling efficient test-time adaptation. Specifically, we formulate gradient descent as an Euler discretization of an ordinary differential equation (ODE) and train an auxiliary network to predict the task-conditional drift using only the few-shot support set. The adaptation then reduces to a simple numerical integration (e.g., via the Euler method), which requires only a few forward passes of the auxiliary network -- no gradients or forward passes of the target model are needed. In experiments on cross-domain few-shot classification using the Meta-Dataset and CDFSL benchmarks, our method significantly improves out-of-domain performance over the non-fine-tuned baseline while incurring only 6\% of the memory cost and 0.02\% of the computation time of standard fine-tuning, thus establishing a practical middle ground between direct transfer and fully fine-tuned approaches.

Paper Structure

This paper contains 29 sections, 7 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: An overview of HyperFlow. (1) For scalable computation in the parameter space, we select the bias parameters of the target model $f_\theta$ to be updated. (2) We collect $T$-step fine-tuning trajectories by simulating gradient descent on the bias parameters, using the base dataset where the target model has been trained. (3) After collecting the trajectories, we smoothly interpolate them using either linear or piecewise-cubic flow objective, then train the conditional drift network $h_\phi$ on the continuous time interval $[0, T]$. (4) At test time, we employ a numerical ODE solver (e.g., Euler method) with a few forward passes of $h_\phi$ to adapt the bias parameters on the support set.
  • Figure 2: The architecture of the conditional drift network $h_\phi$ of HyperFlow.
  • Figure 3: Intermediate losses on CDFSL tasks (ten tasks for each domain) during the parameter adaptation of HyperFlow-C. Curves with different colors indicate different tasks within each domain, the x-axis corresponds to the number of steps taken in the Euler solver, and the y-axis corresponds to the relative loss normalized by the initial value. This shows that the conditional drift network of HyperFlow generalizes well to the out-of-domain classification tasks by consistently moving the parameters to the regions with lower loss values.
  • Figure 4: Average 5-shot CDFSL performance versus computation cost. For HyperFlow and the fine-tuning baselines, we plot the total time spent for inference and adaptation with three different number of update steps: 1, 20, and 50 (for Euler solver and gradient descent, respectively). HyperFlow offers an effective middle ground between direct-transfer and fine-tuning approaches.