Table of Contents
Fetching ...

Sculpting Features from Noise: Reward-Guided Hierarchical Diffusion for Task-Optimal Feature Transformation

Nanxu Gong, Zijun Li, Sixun Dong, Haoyue Bai, Wangyang Ying, Xinyuan Wang, Yanjie Fu

TL;DR

DIFFT reframes feature transformation as a reward-guided generative task that learns a compact embedding of feature sets via a VAE, samples task-relevant embeddings with a Latent Diffusion Model conditioned on tabular structure, and uses a reward-guided evaluator to steer diffusion toward high-performing transformations. A semi-autoregressive decoder then reconstructs discrete feature sets from the embeddings, enabling efficient parallel generation across features. Across 14 benchmark datasets, DIFFT delivers consistent improvements in predictive accuracy and robustness while reducing training and inference time compared with state-of-the-art baselines. By combining global distribution learning with task-specific optimization, it overcomes the limitations of discrete and continuous FT searches and offers a scalable, task-aware solution for feature engineering in tabular data.

Abstract

Feature Transformation (FT) crafts new features from original ones via mathematical operations to enhance dataset expressiveness for downstream models. However, existing FT methods exhibit critical limitations: discrete search struggles with enormous combinatorial spaces, impeding practical use; and continuous search, being highly sensitive to initialization and step sizes, often becomes trapped in local optima, restricting global exploration. To overcome these limitations, DIFFT redefines FT as a reward-guided generative task. It first learns a compact and expressive latent space for feature sets using a Variational Auto-Encoder (VAE). A Latent Diffusion Model (LDM) then navigates this space to generate high-quality feature embeddings, its trajectory guided by a performance evaluator towards task-specific optima. This synthesis of global distribution learning (from LDM) and targeted optimization (reward guidance) produces potent embeddings, which a novel semi-autoregressive decoder efficiently converts into structured, discrete features, preserving intra-feature dependencies while allowing parallel inter-feature generation. Extensive experiments on 14 benchmark datasets show DIFFT consistently outperforms state-of-the-art baselines in predictive accuracy and robustness, with significantly lower training and inference times.

Sculpting Features from Noise: Reward-Guided Hierarchical Diffusion for Task-Optimal Feature Transformation

TL;DR

DIFFT reframes feature transformation as a reward-guided generative task that learns a compact embedding of feature sets via a VAE, samples task-relevant embeddings with a Latent Diffusion Model conditioned on tabular structure, and uses a reward-guided evaluator to steer diffusion toward high-performing transformations. A semi-autoregressive decoder then reconstructs discrete feature sets from the embeddings, enabling efficient parallel generation across features. Across 14 benchmark datasets, DIFFT delivers consistent improvements in predictive accuracy and robustness while reducing training and inference time compared with state-of-the-art baselines. By combining global distribution learning with task-specific optimization, it overcomes the limitations of discrete and continuous FT searches and offers a scalable, task-aware solution for feature engineering in tabular data.

Abstract

Feature Transformation (FT) crafts new features from original ones via mathematical operations to enhance dataset expressiveness for downstream models. However, existing FT methods exhibit critical limitations: discrete search struggles with enormous combinatorial spaces, impeding practical use; and continuous search, being highly sensitive to initialization and step sizes, often becomes trapped in local optima, restricting global exploration. To overcome these limitations, DIFFT redefines FT as a reward-guided generative task. It first learns a compact and expressive latent space for feature sets using a Variational Auto-Encoder (VAE). A Latent Diffusion Model (LDM) then navigates this space to generate high-quality feature embeddings, its trajectory guided by a performance evaluator towards task-specific optima. This synthesis of global distribution learning (from LDM) and targeted optimization (reward guidance) produces potent embeddings, which a novel semi-autoregressive decoder efficiently converts into structured, discrete features, preserving intra-feature dependencies while allowing parallel inter-feature generation. Extensive experiments on 14 benchmark datasets show DIFFT consistently outperforms state-of-the-art baselines in predictive accuracy and robustness, with significantly lower training and inference times.

Paper Structure

This paper contains 29 sections, 13 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Motivation example. Discrete search methods explore various feature combinations directly, but are often challenged by the sheer scale of the resulting combinatorial space. Continuous search methods, on the other hand, iteratively refine solutions from initial or current points, yet frequently converge to local optima. In contrast, our proposed reward-guided generation paradigm, by leveraging the global sampling ability of diffusion models, aims to discover solutions closer to the global optimum.
  • Figure 2: Framework overview. The framework consists of three key components: 1) a VAE that encodes feature sequences into latent embeddings via a semi-autoregressive decoder; 2) a LDM trained to model the distribution of effective embeddings conditioned on tabular semantics; and 3) a reward-guided sampling process that leverages gradients from a performance evaluator to steer the generation of high-quality feature embeddings, which are then decoded into final feature sets.
  • Figure 3: Time analysis of generating 2000 tokens using different methods.
  • Figure 4: Comparison of continuous search method and optimized generation method.