Table of Contents
Fetching ...

Enhancing Tabular Data Optimization with a Flexible Graph-based Reinforced Exploration Strategy

Xiaohan Huang, Dongjie Wang, Zhiyuan Ning, Ziyue Qiao, Qingqing Long, Haowei Zhu, Min Wu, Yuanchun Zhou, Meng Xiao

TL;DR

This paper addresses automatic feature transformation for tabular data by shifting from a feature-centric to a transformation-centric view. It introduces a Flexible Transformation-Centric Tabular Data Optimization framework (TCTO) that builds a dynamic feature-state transformation graph and employs cascading multi-agent reinforcement learning to select clusters, operations, and operands, enabling backtracking via graph pruning. A dual-layer Relational Graph Convolutional Network encodes cluster states, and rewards balance downstream task performance with transformation complexity, guiding efficient exploration. Empirical results show TCTO outperforms baselines across diverse datasets and models, demonstrating robust and adaptable feature engineering with traceability of transformations for insight and reuse.

Abstract

Tabular data optimization methods aim to automatically find an optimal feature transformation process that generates high-value features and improves the performance of downstream machine learning tasks. Current frameworks for automated feature transformation rely on iterative sequence generation tasks, optimizing decision strategies through performance feedback from downstream tasks. However, these approaches fail to effectively utilize historical decision-making experiences and overlook potential relationships among generated features, thus limiting the depth of knowledge extraction. Moreover, the granularity of the decision-making process lacks dynamic backtracking capabilities for individual features, leading to insufficient adaptability when encountering inefficient pathways, adversely affecting overall robustness and exploration efficiency. To address the limitations observed in current automatic feature engineering frameworks, we introduce a novel method that utilizes a feature-state transformation graph to effectively preserve the entire feature transformation journey, where each node represents a specific transformation state. During exploration, three cascading agents iteratively select nodes and idea mathematical operations to generate new transformation states. This strategy leverages the inherent properties of the graph structure, allowing for the preservation and reuse of valuable transformations. It also enables backtracking capabilities through graph pruning techniques, which can rectify inefficient transformation paths. To validate the efficacy and flexibility of our approach, we conducted comprehensive experiments and detailed case studies, demonstrating superior performance in diverse scenarios.

Enhancing Tabular Data Optimization with a Flexible Graph-based Reinforced Exploration Strategy

TL;DR

This paper addresses automatic feature transformation for tabular data by shifting from a feature-centric to a transformation-centric view. It introduces a Flexible Transformation-Centric Tabular Data Optimization framework (TCTO) that builds a dynamic feature-state transformation graph and employs cascading multi-agent reinforcement learning to select clusters, operations, and operands, enabling backtracking via graph pruning. A dual-layer Relational Graph Convolutional Network encodes cluster states, and rewards balance downstream task performance with transformation complexity, guiding efficient exploration. Empirical results show TCTO outperforms baselines across diverse datasets and models, demonstrating robust and adaptable feature engineering with traceability of transformations for insight and reuse.

Abstract

Tabular data optimization methods aim to automatically find an optimal feature transformation process that generates high-value features and improves the performance of downstream machine learning tasks. Current frameworks for automated feature transformation rely on iterative sequence generation tasks, optimizing decision strategies through performance feedback from downstream tasks. However, these approaches fail to effectively utilize historical decision-making experiences and overlook potential relationships among generated features, thus limiting the depth of knowledge extraction. Moreover, the granularity of the decision-making process lacks dynamic backtracking capabilities for individual features, leading to insufficient adaptability when encountering inefficient pathways, adversely affecting overall robustness and exploration efficiency. To address the limitations observed in current automatic feature engineering frameworks, we introduce a novel method that utilizes a feature-state transformation graph to effectively preserve the entire feature transformation journey, where each node represents a specific transformation state. During exploration, three cascading agents iteratively select nodes and idea mathematical operations to generate new transformation states. This strategy leverages the inherent properties of the graph structure, allowing for the preservation and reuse of valuable transformations. It also enables backtracking capabilities through graph pruning techniques, which can rectify inefficient transformation paths. To validate the efficacy and flexibility of our approach, we conducted comprehensive experiments and detailed case studies, demonstrating superior performance in diverse scenarios.
Paper Structure (24 sections, 8 equations, 6 figures, 4 tables)

This paper contains 24 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Motivation of this study. (a) Illustration of classic machine learning versus machine learning with optimized features in diabetes diagnosis. (b) A conceptual view of feature-centric and transformation-centric perspectives.
  • Figure 2: An example of feature-state transformation graph update: the feature $f_h$ conducts $sin$ operation generating the feature $f_t$. The embedding of node $v_t$ can be derived from the statistic description of generated feature $f_t$ .
  • Figure 3: An overview of our framework: (a) construct the feature-state transformation graph based on the previous step; (b) cluster the transformation graph and reinforce multi-agent iterative feature transformation decision generation; (c) update the feature-state transformation graph; (d) details for the process of (transformed) tabular data to feature-state transformation graph; (e) the graph nodes clustering process to form cohesive clusters; (f) illustration of step-wise backtracking and node-wise graph pruning techniques for feature space exploration while maintaining robustness of the pipeline.
  • Figure 4: Comparison of TCTO and its variants in Regression and Classification tasks.
  • Figure 5: Stability comparison of TCTO and ${\textsc{TCTO}}^{-g}$ in four different datasets.
  • ...and 1 more figures