Table of Contents
Fetching ...

HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing

Tencent HY Team

TL;DR

HY-WU (Weight Unleashing), a memory-first adaptation framework that shifts adaptation pressure away from overwriting a single shared parameter point, is proposed, which implements functional (operator-level) memory as a neural module: a generator that synthesizes weight updates on-the-fly from the instance condition, yielding instance-specific operators without test-time optimization.

Abstract

Foundation models are transitioning from offline predictors to deployed systems expected to operate over long time horizons. In real deployments, objectives are not fixed: domains drift, user preferences evolve, and new tasks appear after the model has shipped. This elevates continual learning and instant personalization from optional features to core architectural requirements. Yet most adaptation pipelines still follow a static weight paradigm: after training (or after any adaptation step), inference executes a single parameter vector regardless of user intent, domain, or instance-specific constraints. This treats the trained or adapted model as a single point in parameter space. In heterogeneous and continually evolving regimes, distinct objectives can induce separated feasible regions over parameters, forcing any single shared update into compromise, interference, or overspecialization. As a result, continual learning and personalization are often implemented as repeated overwriting of shared weights, risking degradation of previously learned behaviors. We propose HY-WU (Weight Unleashing), a memory-first adaptation framework that shifts adaptation pressure away from overwriting a single shared parameter point. HY-WU implements functional (operator-level) memory as a neural module: a generator that synthesizes weight updates on-the-fly from the instance condition, yielding instance-specific operators without test-time optimization.

HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing

TL;DR

HY-WU (Weight Unleashing), a memory-first adaptation framework that shifts adaptation pressure away from overwriting a single shared parameter point, is proposed, which implements functional (operator-level) memory as a neural module: a generator that synthesizes weight updates on-the-fly from the instance condition, yielding instance-specific operators without test-time optimization.

Abstract

Foundation models are transitioning from offline predictors to deployed systems expected to operate over long time horizons. In real deployments, objectives are not fixed: domains drift, user preferences evolve, and new tasks appear after the model has shipped. This elevates continual learning and instant personalization from optional features to core architectural requirements. Yet most adaptation pipelines still follow a static weight paradigm: after training (or after any adaptation step), inference executes a single parameter vector regardless of user intent, domain, or instance-specific constraints. This treats the trained or adapted model as a single point in parameter space. In heterogeneous and continually evolving regimes, distinct objectives can induce separated feasible regions over parameters, forcing any single shared update into compromise, interference, or overspecialization. As a result, continual learning and personalization are often implemented as repeated overwriting of shared weights, risking degradation of previously learned behaviors. We propose HY-WU (Weight Unleashing), a memory-first adaptation framework that shifts adaptation pressure away from overwriting a single shared parameter point. HY-WU implements functional (operator-level) memory as a neural module: a generator that synthesizes weight updates on-the-fly from the instance condition, yielding instance-specific operators without test-time optimization.
Paper Structure (91 sections, 11 equations, 17 figures, 7 tables)

This paper contains 91 sections, 11 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: Failure modes of static adaptation vs. conditional parameter generation. (a) Infeasible sharing. When heterogeneous objectives induce separated feasible regions in parameter space, a single shared update (Shared LoRA or SFT) is forced into compromise, instability, or dominance by high-frequency modes. (b) Over-specialization. Training a separate static adapter per domain avoids direct conflict but collapses into a narrow subspace and generalizes poorly under domain shifts. (c) Conditional generation. A generator routes each instance to an update $\Delta\theta(x)=g_{\phi}(c(x))$, enabling inference over a family of parameter points rather than a single point.
  • Figure 2: Comparison of training paradigms for hypernetwork-based parameter generation. (a) Learning to reconstruct parameters from pre-collected checkpoints via reconstruction loss. (b) Learning from pre-collected checkpoints with additional downstream task loss as auxiliary supervision. (c) Our approach: on-the-fly optimization of the parameter generator using only downstream task loss, without relying on pre-collected checkpoints.
  • Figure 3: Overview of the HY-WU pipeline. The framework extracts conditions from the source image and edit prompt, which are processed by a trainable Neural Network Transformer to synthesize instance-specific parameter tokens. These tokens are then detokenized into LoRA adapters and integrated into a frozen foundation model with $\theta_1\cdots\theta_L$, where $\theta_l$ indicates $l$-th layer. The entire pipeline is optimized end-to-end, where the generator is updated via backpropagation of diffusion loss.
  • Figure 4: Parameter tokenization and detokenization. LoRA adapters are reorganized into a unified tensor $\mathcal{T}_{w}$. The red path reverses this process to reconstruct parameters from a tensor.
  • Figure 5: Architecture of the Neural Network Transformer. The left panel illustrates the overall pipeline, where parameter embeddings and extracted conditions (text/image) are processed through $N$ transformer blocks to generate LoRA parameters $\mathcal{T}$. The right panel details the internal structure of each block, featuring factorized self-attention to capture structural correlations, and cross-attention for condition injection. The final LoRA $B$ projection is zero-initialized to ensure training stability.
  • ...and 12 more figures