Table of Contents
Fetching ...

Test-Time Instance-Specific Parameter Composition: A New Paradigm for Adaptive Generative Modeling

Minh-Tuan Tran, Xuan-May Le, Quan Hung Tran, Mehrtash Harandi, Dinh Phung, Trung Le

Abstract

Existing generative models, such as diffusion and auto-regressive networks, are inherently static, relying on a fixed set of pretrained parameters to handle all inputs. In contrast, humans flexibly adapt their internal generative representations to each perceptual or imaginative context. Inspired by this capability, we introduce Composer, a new paradigm for adaptive generative modeling based on test-time instance-specific parameter composition. Composer generates input-conditioned parameter adaptations at inference time, which are injected into the pretrained model's weights, enabling per-input specialization without fine-tuning or retraining. Adaptation occurs once prior to multi-step generation, yielding higher-quality, context-aware outputs with minimal computational and memory overhead. Experiments show that Composer substantially improves performance across diverse generative models and use cases, including lightweight/quantized models and test-time scaling. By leveraging input-aware parameter composition, Composer establishes a new paradigm for designing generative models that dynamically adapt to each input, moving beyond static parameterization.

Test-Time Instance-Specific Parameter Composition: A New Paradigm for Adaptive Generative Modeling

Abstract

Existing generative models, such as diffusion and auto-regressive networks, are inherently static, relying on a fixed set of pretrained parameters to handle all inputs. In contrast, humans flexibly adapt their internal generative representations to each perceptual or imaginative context. Inspired by this capability, we introduce Composer, a new paradigm for adaptive generative modeling based on test-time instance-specific parameter composition. Composer generates input-conditioned parameter adaptations at inference time, which are injected into the pretrained model's weights, enabling per-input specialization without fine-tuning or retraining. Adaptation occurs once prior to multi-step generation, yielding higher-quality, context-aware outputs with minimal computational and memory overhead. Experiments show that Composer substantially improves performance across diverse generative models and use cases, including lightweight/quantized models and test-time scaling. By leveraging input-aware parameter composition, Composer establishes a new paradigm for designing generative models that dynamically adapt to each input, moving beyond static parameterization.

Paper Structure

This paper contains 23 sections, 15 equations, 5 figures, 15 tables.

Figures (5)

  • Figure 1: Comparison of static versus adaptive parameterization. Composer dynamically composes instance-specific parameter updates, allowing per-input adaptation without fine-tuning.
  • Figure 2: Overview of Composer. Given any weight matrix $W$ from the backbone, Composer generates a low-rank update $W' = W + AB$ conditioned on the input. Specifically, the query and value matrices $W_Q$ and $W_V$ from the pretrained model are linearly projected from $\mathbb{R}^{d \times d}$ to $\mathbb{R}^{2r \times d_{\text{model}}}$. The projected representations are then separated to initialize tokens $A^0_i$ and $B^0_i \in \mathbb{R}^{1 \times d_{\text{model}}}$. During training, these tokens are combined with prompt tokens $P_i \in \mathbb{R}^{1 \times d_{\text{model}}}$ and processed by a transformer to produce $W^* = AB$. The adapted parameters $W' = W + W^*$ are used for generation. At inference, the first projection layers are removed, while $A^0_i$ and $B^0_i$ are stored for fast instance-specific adaptation.
  • Figure 3: Illustration of the attention scheme. Component tokens attend to prompt tokens for context, maintain local block-wise attention, and the first token of each block captures inter-block correlations.
  • Figure 4: Impact of low-rank dimension $r$ and context-aware sampling parameter $\alpha$ on ImageNet $256\times256$ class-conditional image generation. (a) Inception Score (IS) vs. $r$, (b) FID vs. $r$, (c) IS vs. $\alpha$, (d) FID vs. $\alpha$. Higher IS and lower FID indicate better generation quality.
  • Figure 5: Qualitative comparison on additional text-to-image examples. For each prompt, we show images produced by baseline methods and by our approach.