FastFT: Accelerating Reinforced Feature Transformation via Advanced Exploration Strategies
Tianqi He, Xiaohan Huang, Yi Du, Qingqing Long, Ziyue Qiao, Min Wu, Yanjie Fu, Yuanchun Zhou, Meng Xiao
TL;DR
FastFT addresses runtime bottlenecks and sparse rewards in automated feature transformation by decoupling evaluation from data generation through a Performance Predictor, and by driving exploration with a Novelty Estimator and a prioritized memory buffer. It leverages cascading reinforcement learning to perform feature cross operations, guided by efficient reward signals that blend empirical performance with novelty, reducing dependence on expensive downstream-task evaluations. Empirical results across 23 datasets show FastFT achieves superior performance and notable time savings, with robust ablations confirming the contributions of the predictor, novelty, and memory replay mechanisms. The framework emphasizes traceability and scalability, offering a practical pathway to data-centric feature engineering in large-scale settings.
Abstract
Feature Transformation is crucial for classic machine learning that aims to generate feature combinations to enhance the performance of downstream tasks from a data-centric perspective. Current methodologies, such as manual expert-driven processes, iterative-feedback techniques, and exploration-generative tactics, have shown promise in automating such data engineering workflow by minimizing human involvement. However, three challenges remain in those frameworks: (1) It predominantly depends on downstream task performance metrics, as assessment is time-consuming, especially for large datasets. (2) The diversity of feature combinations will hardly be guaranteed after random exploration ends. (3) Rare significant transformations lead to sparse valuable feedback that hinders the learning processes or leads to less effective results. In response to these challenges, we introduce FastFT, an innovative framework that leverages a trio of advanced strategies.We first decouple the feature transformation evaluation from the outcomes of the generated datasets via the performance predictor. To address the issue of reward sparsity, we developed a method to evaluate the novelty of generated transformation sequences. Incorporating this novelty into the reward function accelerates the model's exploration of effective transformations, thereby improving the search productivity. Additionally, we combine novelty and performance to create a prioritized memory buffer, ensuring that essential experiences are effectively revisited during exploration. Our extensive experimental evaluations validate the performance, efficiency, and traceability of our proposed framework, showcasing its superiority in handling complex feature transformation tasks.
