Two-Stage Feature Generation with Transformer and Reinforcement Learning

Wanfu Gao; Zengyao Man; Zebin He; Yuhao Tang; Jun Gao; Kunpeng Liu

Two-Stage Feature Generation with Transformer and Reinforcement Learning

Wanfu Gao, Zengyao Man, Zebin He, Yuhao Tang, Jun Gao, Kunpeng Liu

TL;DR

This work tackles the challenge of automated feature generation by introducing TSFG, a two-stage framework that couples a Transformer-based encoder–decoder with Proximal Policy Optimization. The pre-training phase learns stable feature-generation strategies, while the PPO fine-tuning aligns generation with downstream task rewards, enabling dynamic adaptation across datasets. Empirical results on 13 datasets show TSFG consistently improves feature quality and downstream performance, with ablations confirming the essential roles of both stages. The approach offers a scalable, adaptable pathway for high-quality feature generation that enhances predictive accuracy while controlling redundancy and exploration overhead.

Abstract

Feature generation is a critical step in machine learning, aiming to enhance model performance by capturing complex relationships within the data and generating meaningful new features. Traditional feature generation methods heavily rely on domain expertise and manual intervention, making the process labor-intensive and challenging to adapt to different scenarios. Although automated feature generation techniques address these issues to some extent, they often face challenges such as feature redundancy, inefficiency in feature space exploration, and limited adaptability to diverse datasets and tasks. To address these problems, we propose a Two-Stage Feature Generation (TSFG) framework, which integrates a Transformer-based encoder-decoder architecture with Proximal Policy Optimization (PPO). The encoder-decoder model in TSFG leverages the Transformer's self-attention mechanism to efficiently represent and transform features, capturing complex dependencies within the data. PPO further enhances TSFG by dynamically adjusting the feature generation strategy based on task-specific feedback, optimizing the process for improved performance and adaptability. TSFG dynamically generates high-quality feature sets, significantly improving the predictive performance of machine learning models. Experimental results demonstrate that TSFG outperforms existing state-of-the-art methods in terms of feature quality and adaptability.

Two-Stage Feature Generation with Transformer and Reinforcement Learning

TL;DR

Abstract

Two-Stage Feature Generation with Transformer and Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)