Table of Contents
Fetching ...

FAMOSE: A ReAct Approach to Automated Feature Discovery

Keith Burghardt, Jienan Liu, Sadman Sakib, Yuning Hao, Bo Li

TL;DR

FAMOSE tackles the bottleneck of feature engineering for tabular data by introducing a ReAct-driven autonomous feature discovery framework. It iteratively proposes, tests, and refines candidate features with data-driven evaluation and concludes with a compact, non-redundant feature set via $mRMR$. On a diverse banner of 20 classification and 7 regression tasks, FAMOSE achieves near-state-of-the-art ROC-AUC on large-classification datasets (average gain $0.23\%$) and state-of-the-art RMSE reductions of $2.0\%$, with robustness across multiple LLMs and predictive models. The work demonstrates that AI agents can perform inventive, interpretable feature engineering in an automated loop, potentially accelerating end-to-end ML pipelines while reducing reliance on deep domain expertise.

Abstract

Feature engineering remains a critical yet challenging bottleneck in machine learning, particularly for tabular data, as identifying optimal features from an exponentially large feature space traditionally demands substantial domain expertise. To address this challenge, we introduce FAMOSE (Feature AugMentation and Optimal Selection agEnt), a novel framework that leverages the ReAct paradigm to autonomously explore, generate, and refine features while integrating feature selection and evaluation tools within an agent architecture. To our knowledge, FAMOSE represents the first application of an agentic ReAct framework to automated feature engineering, especially for both regression and classification tasks. Extensive experiments demonstrate that FAMOSE is at or near the state-of-the-art on classification tasks (especially tasks with more than 10K instances, where ROC-AUC increases 0.23% on average), and achieves the state-of-the-art for regression tasks by reducing RMSE by 2.0% on average, while remaining more robust to errors than other algorithms. We hypothesize that FAMOSE's strong performance is because ReAct allows the LLM context window to record (via iterative feature discovery and evaluation steps) what features did or did not work. This is similar to a few-shot prompt and guides the LLM to invent better, more innovative features. Our work offers evidence that AI agents are remarkably effective in solving problems that require highly inventive solutions, such as feature engineering.

FAMOSE: A ReAct Approach to Automated Feature Discovery

TL;DR

FAMOSE tackles the bottleneck of feature engineering for tabular data by introducing a ReAct-driven autonomous feature discovery framework. It iteratively proposes, tests, and refines candidate features with data-driven evaluation and concludes with a compact, non-redundant feature set via . On a diverse banner of 20 classification and 7 regression tasks, FAMOSE achieves near-state-of-the-art ROC-AUC on large-classification datasets (average gain ) and state-of-the-art RMSE reductions of , with robustness across multiple LLMs and predictive models. The work demonstrates that AI agents can perform inventive, interpretable feature engineering in an automated loop, potentially accelerating end-to-end ML pipelines while reducing reliance on deep domain expertise.

Abstract

Feature engineering remains a critical yet challenging bottleneck in machine learning, particularly for tabular data, as identifying optimal features from an exponentially large feature space traditionally demands substantial domain expertise. To address this challenge, we introduce FAMOSE (Feature AugMentation and Optimal Selection agEnt), a novel framework that leverages the ReAct paradigm to autonomously explore, generate, and refine features while integrating feature selection and evaluation tools within an agent architecture. To our knowledge, FAMOSE represents the first application of an agentic ReAct framework to automated feature engineering, especially for both regression and classification tasks. Extensive experiments demonstrate that FAMOSE is at or near the state-of-the-art on classification tasks (especially tasks with more than 10K instances, where ROC-AUC increases 0.23% on average), and achieves the state-of-the-art for regression tasks by reducing RMSE by 2.0% on average, while remaining more robust to errors than other algorithms. We hypothesize that FAMOSE's strong performance is because ReAct allows the LLM context window to record (via iterative feature discovery and evaluation steps) what features did or did not work. This is similar to a few-shot prompt and guides the LLM to invent better, more innovative features. Our work offers evidence that AI agents are remarkably effective in solving problems that require highly inventive solutions, such as feature engineering.
Paper Structure (15 sections, 2 equations, 6 figures, 12 tables, 1 algorithm)

This paper contains 15 sections, 2 equations, 6 figures, 12 tables, 1 algorithm.

Figures (6)

  • Figure 1: Types of AutoML. In traditional AutoML, feature engineering would include feature discovery, such as within OpenFE zhang2023openfe, or iterative feature modification, such as the method of Piramuthu et al. piramuthu2009iterative. Although LLM methods, such as CAAFE, often excel at feature generation, iterative feature modification has not been explored as often before now. FAMOSE offers a way to learn through trial-and-error which features work and which ones do not until a better feature is developed.
  • Figure 2: An illustration of FAMOSE applied to the balance-scale task balance_scale_12. FAMOSE first invents features and observes their performance, in this case using the difference in arm weights to see whether a scale is balanced, tilting right, or tilting left. If performance is insufficient, FAMOSE then thinks about how to create better features, ultimately discovering the feature, moment, that exactly predicts whether a scale is balanced. Because the performance cannot be improved further in this task, the feature discovery step eventually stops. The feature selection step then selects this feature using mRMR ding2005minimum and removes the four extraneous features in the task dataset: $W_1$, $W_2$, $L_1$, and $L_2$, thus instead of an error-prone model with four features, we discover a perfect predictor with only one feature.
  • Figure S1: Number of functions used in CAAFE and FAMOSE code across all task folds (5$\times$ number of tasks).
  • Figure S2: Frequency each function used in CAAFE code across all task folds (5$\times$ number of tasks).
  • Figure S3: Frequency each function used in FAMOSE code across all task folds (5$\times$ number of tasks).
  • ...and 1 more figures