Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation

Xinyuan Wang; Kunpeng Liu; Arun Vignesh Malarkkan; Yanjie Fu

Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation

Xinyuan Wang, Kunpeng Liu, Arun Vignesh Malarkkan, Yanjie Fu

TL;DR

This work proposes a framework that optimizes context data for LLM-driven FT by evolving trajectory-level experiences in a closed loop and generalizes across API-based and open-source LLMs and remains robust across downstream evaluators.

Abstract

Feature Transformation (FT) is a core data-centric AI task that improves feature space quality to advance downstream predictive performance. However, discovering effective transformations remains challenging due to the large space of feature-operator combinations. Existing solutions rely on discrete search or latent generation, but they are frequently limited by sample inefficiency, invalid candidates, and redundant generations with limited coverage. Large Language Models (LLMs) offer strong priors for producing valid transformations, but current LLM-based FT methods typically rely on static demonstrations, resulting in limited diversity, redundant outputs, and weak alignment with downstream objectives. We propose a framework that optimizes context data for LLM-driven FT by evolving trajectory-level experiences in a closed loop. Starting from high-performing feature transportation sequences explored by reinforcement learning, we construct and continuously update an experience library of downstream task-verified transformation trajectories, and use a diversity-aware selector to form contexts along with a chain-of-thought and guide transformed feature generation toward higher performance. Experiments on diverse tabular benchmarks show that our method outperforms classical and LLM-based baselines and is more stable than one-shot generation. The framework generalizes across API-based and open-source LLMs and remains robust across downstream evaluators.

Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation

TL;DR

Abstract

Paper Structure (28 sections, 8 equations, 11 figures, 3 tables)

This paper contains 28 sections, 8 equations, 11 figures, 3 tables.

Introduction
Preliminaries and Problem Statement
Important Concepts
Problem Statement
Method
Overview of Proposed Method
Stage I: RL Exploration for High-performing Sequences
Context-as-Data: Experience Library and Context Policy
Stage II: Three-level Refinement for Few-shot Context Construction
Sequence Validation Check (Local Reliability)
CoT Trajectory Construction and Enhancement
Entropy-guided Diversity Selection (Coverage vs. Redundancy)
Stage III: Experience-conditioned Sequence Generation and Write-back
Context construction
LLM generation as a single sequence
...and 13 more sections

Figures (11)

Figure 1: Empirical motivations.
Figure 2: Comparison of different solutions.
Figure 3: A feature transformation sequence example.
Figure 4: Different expressions of transformation sequences.
Figure 5: Data-centric closed-loop optimization of context experiences for LLM-driven feature transformation. Stage I explores high-performing sequences with downstream rewards and stores them in an experience library. Stage II refines the library through validation checks, enhancement, and entropy-based diversity control, and constructs selected experiences into few-shot CoT-style examples. Stage III uses these CoT-style examples to guide the LLM to generate a single transformation sequence, which is then verified and written back to the library.
...and 6 more figures

Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation

TL;DR

Abstract

Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)