Feature Interaction Aware Automated Data Representation Transformation

Ehtesamul Azim; Dongjie Wang; Kunpeng Liu; Wei Zhang; Yanjie Fu

Feature Interaction Aware Automated Data Representation Transformation

Ehtesamul Azim, Dongjie Wang, Kunpeng Liu, Wei Zhang, Yanjie Fu

TL;DR

This work addresses the interpretability and efficiency gaps in automated feature engineering by introducing InHRecon, an Interaction-aware Hierarchical Reinforced Feature Space Reconstruction framework. It formulates feature reconstruction as nested generation and selection managed by three MDPs (one operation agent and two feature agents) within a hierarchical RL setup, and it uses an expanded operator set along with H-statistics to reward informative feature interactions. The approach yields a traceable, explainable feature generation process and demonstrates strong empirical results across 24 datasets, with ablations confirming the value of interaction-aware rewards and hierarchical coordination. Overall, InHRecon advances scalable, interpretable AutoFE by aligning automated feature construction with human-like cognition and statistical guidance, showing robustness across downstream models and tasks.

Abstract

Creating an effective representation space is crucial for mitigating the curse of dimensionality, enhancing model generalization, addressing data sparsity, and leveraging classical models more effectively. Recent advancements in automated feature engineering (AutoFE) have made significant progress in addressing various challenges associated with representation learning, issues such as heavy reliance on intensive labor and empirical experiences, lack of explainable explicitness, and inflexible feature space reconstruction embedded into downstream tasks. However, these approaches are constrained by: 1) generation of potentially unintelligible and illogical reconstructed feature spaces, stemming from the neglect of expert-level cognitive processes; 2) lack of systematic exploration, which subsequently results in slower model convergence for identification of optimal feature space. To address these, we introduce an interaction-aware reinforced generation perspective. We redefine feature space reconstruction as a nested process of creating meaningful features and controlling feature set size through selection. We develop a hierarchical reinforcement learning structure with cascading Markov Decision Processes to automate feature and operation selection, as well as feature crossing. By incorporating statistical measures, we reward agents based on the interaction strength between selected features, resulting in intelligent and efficient exploration of the feature space that emulates human decision-making. Extensive experiments are conducted to validate our proposed approach.

Feature Interaction Aware Automated Data Representation Transformation

TL;DR

Abstract

Paper Structure (17 sections, 7 equations, 10 figures, 1 table)

This paper contains 17 sections, 7 equations, 10 figures, 1 table.

Introduction
Defintions and Problem Formulation
Methodology
Hierarchical Reinforced Feature Selection and Generation
Feature Generation and Post-processing
Experiments
Experimental Setup
Baseline Algorithms
Overall Performance
Ablation Study
Study of Impact of H-statistics
Robustness check of InHRecon under different ML models
Parameter Sensitivity Analysis of InHRecon
Case Study: Rationality and Interpretability Analysis
Related Work
...and 2 more sections

Figures (10)

Figure 1: We aim to iteratively reconstruct the feature space for an optimal representation space for improved performance in downstream ML task.
Figure 2: One major drawback of existing AutoFE methods: generation of irrational features.
Figure 3: Overview of the proposed framework. Feature classification step categorizes features into continuous and categorical types, along with an enhanced operation set. Hierarchical agents select an operation and two features, followed by statistically aware feature interaction to generate new features. Responsible agents are penalized for invalid operation-feature pairs. The updated feature set evaluated in a downstream task. Feature selection is applied to control the feature set size, with iterations continuing until optimization or set limit.
Figure 4: Proposed hierarchical agent structure
Figure 5: Illustration of state representation extraction
...and 5 more figures

Feature Interaction Aware Automated Data Representation Transformation

TL;DR

Abstract

Feature Interaction Aware Automated Data Representation Transformation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)