CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization

Beicheng Xu; Keyao Ding; Wei Liu; Yupeng Lu; Bin Cui

CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization

Beicheng Xu, Keyao Ding, Wei Liu, Yupeng Lu, Bin Cui

TL;DR

CoFEH tackles the AutoML bottleneck in feature engineering by interleaving LLM-driven FE with Bayesian HPO, using a Tree-of- Thought FE optimizer, a mutual conditioning mechanism, and a PUCB-based dynamic optimizer selector to allocate budget adaptively. The framework enables truly free-form FE pipelines while leveraging BO for model configuration, and it demonstrates superior end-to-end performance across 28 public datasets with multiple downstream models. Key contributions include the mutual conditioning between FE and HPO, a memory-driven steerable FE expansion, and a principled budget equilibrium that balances exploration and exploitation. The results suggest CoFEH provides a scalable, model-agnostic, and cost-efficient pathway to robust AutoML pipelines with strong FE–HPO synergy.

Abstract

Feature Engineering (FE) is pivotal in automated machine learning (AutoML) but remains a bottleneck for traditional methods, which treat it as a black-box search, operating within rigid, predefined search spaces and lacking domain awareness. While Large Language Models (LLMs) offer a promising alternative by leveraging semantic reasoning to generate unbounded operators, existing methods fail to construct free-form FE pipelines, remaining confined to isolated subtasks such as feature generation. Most importantly, they are rarely optimized jointly with hyperparameter optimization (HPO) of the ML model, leading to greedy "FE-then-HPO" workflows that cannot capture strong FE-HPO interactions. In this paper, we present CoFEH, a collaborative framework that interleaves LLM-based FE and Bayesian HPO for robust end-to-end AutoML. CoFEH uses an LLM-driven FE optimizer powered by Tree of Thought (ToT) to explore flexible FE pipelines, a Bayesian optimization (BO) module to solve HPO, and a dynamic optimizer selector that realizes interleaved optimization by adaptively scheduling FE and HPO steps. Crucially, we introduce a mutual conditioning mechanism that shares context between LLM and BO, enabling mutually informed decisions. Experiments show that CoFEH not only outperforms traditional and LLM-based FE baselines, but also achieves superior end-to-end performance under joint optimization.

CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization

TL;DR

Abstract

Paper Structure (41 sections, 1 theorem, 10 equations, 12 figures, 8 tables)

This paper contains 41 sections, 1 theorem, 10 equations, 12 figures, 8 tables.

Introduction
Background and Motivation
The Formal Machine Learning Pipeline
Feature Engineering
Hyperparameter Optimization
Joint Optimization of FE and HPO
Method
LLM-based Feature Engineering
Selection Down the MCTS Tree
Expansion Through Steerable Reasoning
Global Memory and Operation Retrieval
Collaborative Tuning with HPO
BO-based HPO Conditioned on FE
LLM-based FE Conditioned on HPO
Dynamic Optimizer Selector
...and 26 more sections

Key Result

theorem 1

Consider the rule in Equation eq:puct_selector under a neutral reward signal ($Q(a) = \text{const}$). Let $M \in \mathbb{Z}^+$ be the total budget. If the initial bias satisfies $0.5 \le p_1 <\frac{M+1.5}{M+3}$, the linear scheduling of $\omega_{\text{FE}}$ and $\omega_{\text{HPO}}$, which converges

Figures (12)

Figure 1: Comparison of optimization workflows: existing methods vs. CoFEH.
Figure 2: AutoML vs. human expert: The freedom of FE.
Figure 3: FE workflow of CoFEH.
Figure 4: Ablation study of collaborative tuning.
Figure 5: FE prop. driven by the dynamic optimizer selector
...and 7 more figures

Theorems & Definitions (1)

theorem 1: Budget Equilibrium

CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization

TL;DR

Abstract

CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (1)