Table of Contents
Fetching ...

ParamReL: Learning Parameter Space Representation via Progressively Encoding Bayesian Flow Networks

Zhangkai Wu, Xuhui Fan, Jin Li, Zhilin Zhao, Hui Chen, Longbing Cao

TL;DR

ParamReL addresses the gap in BFNs by learning semantic representations directly from parameter space. It introduces a self-encoder that produces time-aware latent semantics and integrates these into the parameter-driven generator, augmented by a mutual information regularization to promote disentanglement. The approach supports both discrete and continuous data, enabling conditional generation, progressive reconstruction, latent interpolation, and disentanglement. Empirical results show that ParamReL yields meaningful, transferable semantic latents, smooth latent spaces, and competitive generation quality across diverse benchmarks, highlighting its potential for unified parameter-space representation learning.

Abstract

The recently proposed Bayesian Flow Networks~(BFNs) show great potential in modeling parameter spaces, offering a unified strategy for handling continuous, discretized, and discrete data. However, BFNs cannot learn high-level semantic representation from the parameter space since {common encoders, which encode data into one static representation, cannot capture semantic changes in parameters.} This motivates a new direction: learning semantic representations hidden in the parameter spaces to characterize mixed-typed noisy data. {Accordingly, we propose a representation learning framework named ParamReL, which operates in the parameter space to obtain parameter-wise latent semantics that exhibit progressive structures. Specifically, ParamReL proposes a \emph{self-}encoder to learn latent semantics directly from parameters, rather than from observations. The encoder is then integrated into BFNs, enabling representation learning with various formats of observations. Mutual information terms further promote the disentanglement of latent semantics and capture meaningful semantics simultaneously.} We illustrate {conditional generation and reconstruction} in ParamReL via expanding BFNs, and extensive {quantitative} experimental results demonstrate the {superior effectiveness} of ParamReL in learning parameter representation.

ParamReL: Learning Parameter Space Representation via Progressively Encoding Bayesian Flow Networks

TL;DR

ParamReL addresses the gap in BFNs by learning semantic representations directly from parameter space. It introduces a self-encoder that produces time-aware latent semantics and integrates these into the parameter-driven generator, augmented by a mutual information regularization to promote disentanglement. The approach supports both discrete and continuous data, enabling conditional generation, progressive reconstruction, latent interpolation, and disentanglement. Empirical results show that ParamReL yields meaningful, transferable semantic latents, smooth latent spaces, and competitive generation quality across diverse benchmarks, highlighting its potential for unified parameter-space representation learning.

Abstract

The recently proposed Bayesian Flow Networks~(BFNs) show great potential in modeling parameter spaces, offering a unified strategy for handling continuous, discretized, and discrete data. However, BFNs cannot learn high-level semantic representation from the parameter space since {common encoders, which encode data into one static representation, cannot capture semantic changes in parameters.} This motivates a new direction: learning semantic representations hidden in the parameter spaces to characterize mixed-typed noisy data. {Accordingly, we propose a representation learning framework named ParamReL, which operates in the parameter space to obtain parameter-wise latent semantics that exhibit progressive structures. Specifically, ParamReL proposes a \emph{self-}encoder to learn latent semantics directly from parameters, rather than from observations. The encoder is then integrated into BFNs, enabling representation learning with various formats of observations. Mutual information terms further promote the disentanglement of latent semantics and capture meaningful semantics simultaneously.} We illustrate {conditional generation and reconstruction} in ParamReL via expanding BFNs, and extensive {quantitative} experimental results demonstrate the {superior effectiveness} of ParamReL in learning parameter representation.
Paper Structure (29 sections, 13 equations, 14 figures, 5 tables)

This paper contains 29 sections, 13 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: The relationships between distributions in BFNs (a) and ParamReL (b).
  • Figure 2: The framework of ParamReL, consisting of a semantic encoder $q_{\phi}$ and a parameter decoder $p_{\Psi}$. During the reverse stage, parameters of data distribution $\theta_{t}$ will encode a time-specific semantic latent ${\boldsymbol{\mathbf{z}}}_{t}$ and then decode the $t-1$-th parameters $\theta_{t-1}$.
  • Figure 3: The left panel (a-b) shows high-level semantic latent captured by ${\boldsymbol{\mathbf{z}}}_{\mathrm{sem}}$ from ParamReL's encoders. By fixing ${\boldsymbol{\mathbf{z}}}_{\mathrm{sem}}$, the global characters of the images are invariant. By varying the stochastic subcodes ${\boldsymbol{\mathbf{x}}}_{0}$, the local attributes in the corresponding generated images may vary, such as the Narrow_Eyes attribute in (a), the Blond_Hair attribute in (b), and the Mouth_Slightly_Open attribute in (c). The right panel (d-f) illustrates the time-varying changes that ParamReL's progressive encodes interfaced. By varying time encodes at 100,200,300 time steps, more attributes will be influenced in the reconstruction stage: the Big_Lips, Pointy_Nose attributes in (d), the Blond_Hair, Bald attributes in (e) and the Wavy_Hair, High_Cheekbones attributes in (f).
  • Figure 4: Comparisons of latent space interpolation among sample-based models and parameter-based models on dataset CelebA. Only our ParamRel model (e) can learn a continuous, smooth latent space while ensuring near-exact image reconstruction. Specifically, while sample-based generative models can learn a continuous but unsmooth latent space, this leads to incomplete reconstructions. For example, in (a-d), the attribute of eyeglasses is frequently omitted. Moreover, VAEs (a,b) tend to produce blurry images. Additionally, it is observable that sample-based models often compromise reconstruction in favour of representation learning, as evidenced by the failure of diffusion model variants (c-d) to accurately reconstruct background characters in imageB.
  • Figure 5: Traversals of latent by ParamReL on CelebA. The interpretable traversal directions are displayed by traversing the encodings ranging from $[-3,3]$.
  • ...and 9 more figures