Learning to Scale Logits for Temperature-Conditional GFlowNets

Minsu Kim; Joohwan Ko; Taeyoung Yun; Dinghuai Zhang; Ling Pan; Woochang Kim; Jinkyoo Park; Emmanuel Bengio; Yoshua Bengio

Learning to Scale Logits for Temperature-Conditional GFlowNets

Minsu Kim, Joohwan Ko, Taeyoung Yun, Dinghuai Zhang, Ling Pan, Woochang Kim, Jinkyoo Park, Emmanuel Bengio, Yoshua Bengio

TL;DR

The paper addresses instability in temperature-conditional GFlowNets by introducing Logit-GFN, which employs a learned logit-scaling network to map inverse temperature $\beta$ to a softmax temperature $T$ that scales the policy logits directly. This architectural change yields more stable training, stronger offline generalization, and improved online mode discovery across biochemical design tasks, while enabling flexible online exploration via varied $P_{\text{exp}}(\beta)$ distributions, including simulated annealing. The approach is backed by TB-based training, an online discovery algorithm, and extensive ablations showing robustness to conditioning choices and temperature distributions. Overall, Logit-GFN advances temperature-conditioned GFlowNets as a practical tool for multi-temperature sampling and scientific discovery in molecular and sequence design tasks.

Abstract

GFlowNets are probabilistic models that sequentially generate compositional structures through a stochastic policy. Among GFlowNets, temperature-conditional GFlowNets can introduce temperature-based controllability for exploration and exploitation. We propose \textit{Logit-scaling GFlowNets} (Logit-GFN), a novel architectural design that greatly accelerates the training of temperature-conditional GFlowNets. It is based on the idea that previously proposed approaches introduced numerical challenges in the deep network training, since different temperatures may give rise to very different gradient profiles as well as magnitudes of the policy's logits. We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy's logits directly. Also, using Logit-GFN, GFlowNets can be improved by having better generalization capabilities in offline learning and mode discovery capabilities in online learning, which is empirically verified in various biological and chemical tasks. Our code is available at \url{https://github.com/dbsxodud-11/logit-gfn}

Learning to Scale Logits for Temperature-Conditional GFlowNets

TL;DR

The paper addresses instability in temperature-conditional GFlowNets by introducing Logit-GFN, which employs a learned logit-scaling network to map inverse temperature

to a softmax temperature

that scales the policy logits directly. This architectural change yields more stable training, stronger offline generalization, and improved online mode discovery across biochemical design tasks, while enabling flexible online exploration via varied

distributions, including simulated annealing. The approach is backed by TB-based training, an online discovery algorithm, and extensive ablations showing robustness to conditioning choices and temperature distributions. Overall, Logit-GFN advances temperature-conditioned GFlowNets as a practical tool for multi-temperature sampling and scientific discovery in molecular and sequence design tasks.

Abstract

Paper Structure (40 sections, 7 equations, 21 figures, 5 tables, 1 algorithm)

This paper contains 40 sections, 7 equations, 21 figures, 5 tables, 1 algorithm.

Introduction
Related Works
Preliminaries
Temperature conditional GFlowNets
Methodology
Logit scaling
Training objective
Online discovery algorithm with Logit-GFN
Experiments
Evaluation of training stability
Evaluation of offline generalization
Evaluation of online mode seeking capability
Different distribution for sampling temperatures
Ablation studies
Conclusion
...and 25 more sections

Figures (21)

Figure 1: Illustration of GFlowNets and temperature conditional GFlowNets.
Figure 2: Architecture design of vanilla temperature conditional GFN and our Logit-GFN. The vanilla implementation of the temperature-conditional GFN integrates the embedding vector from $\beta$ by concatenating it with the layer embedding of the policy network. In contrast, the proposed method directly modulates the logit Softmax temperature.
Figure 3: Loss of Temperature-conditional GFlowNets and unconditional GFlowNets as a function of a number of training steps on the TFBind8 and RNA-Binding tasks. Logit-GFN yields more stable training curves and converges faster. We draw curves with three different random seeds and highlight the mean over seeds.
Figure 4: Performance of Temperature-conditional GFlowNets and unconditional GFlowNet in offline generalization. Shaded regions denote the temperature range used in training. Logit-GFN generates high-rewarding samples that surpass the offline datasets when conditioned on high $\beta$ values.
Figure 5: Reward distribution of samples from Temperature-conditional GFlowNets and unconditional GFlowNet in offline generalization. Logit-GFN dynamically shifts its reward distribution towards a high-reward region when conditioned on high $\beta$ values.
...and 16 more figures

Learning to Scale Logits for Temperature-Conditional GFlowNets

TL;DR

Abstract

Learning to Scale Logits for Temperature-Conditional GFlowNets

Authors

TL;DR

Abstract

Table of Contents

Figures (21)