Table of Contents
Fetching ...

Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation

Xinyuan Wang, Kunpeng Liu, Arun Vignesh Malarkkan, Yanjie Fu

TL;DR

This work proposes a framework that optimizes context data for LLM-driven FT by evolving trajectory-level experiences in a closed loop and generalizes across API-based and open-source LLMs and remains robust across downstream evaluators.

Abstract

Feature Transformation (FT) is a core data-centric AI task that improves feature space quality to advance downstream predictive performance. However, discovering effective transformations remains challenging due to the large space of feature-operator combinations. Existing solutions rely on discrete search or latent generation, but they are frequently limited by sample inefficiency, invalid candidates, and redundant generations with limited coverage. Large Language Models (LLMs) offer strong priors for producing valid transformations, but current LLM-based FT methods typically rely on static demonstrations, resulting in limited diversity, redundant outputs, and weak alignment with downstream objectives. We propose a framework that optimizes context data for LLM-driven FT by evolving trajectory-level experiences in a closed loop. Starting from high-performing feature transportation sequences explored by reinforcement learning, we construct and continuously update an experience library of downstream task-verified transformation trajectories, and use a diversity-aware selector to form contexts along with a chain-of-thought and guide transformed feature generation toward higher performance. Experiments on diverse tabular benchmarks show that our method outperforms classical and LLM-based baselines and is more stable than one-shot generation. The framework generalizes across API-based and open-source LLMs and remains robust across downstream evaluators.

Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation

TL;DR

This work proposes a framework that optimizes context data for LLM-driven FT by evolving trajectory-level experiences in a closed loop and generalizes across API-based and open-source LLMs and remains robust across downstream evaluators.

Abstract

Feature Transformation (FT) is a core data-centric AI task that improves feature space quality to advance downstream predictive performance. However, discovering effective transformations remains challenging due to the large space of feature-operator combinations. Existing solutions rely on discrete search or latent generation, but they are frequently limited by sample inefficiency, invalid candidates, and redundant generations with limited coverage. Large Language Models (LLMs) offer strong priors for producing valid transformations, but current LLM-based FT methods typically rely on static demonstrations, resulting in limited diversity, redundant outputs, and weak alignment with downstream objectives. We propose a framework that optimizes context data for LLM-driven FT by evolving trajectory-level experiences in a closed loop. Starting from high-performing feature transportation sequences explored by reinforcement learning, we construct and continuously update an experience library of downstream task-verified transformation trajectories, and use a diversity-aware selector to form contexts along with a chain-of-thought and guide transformed feature generation toward higher performance. Experiments on diverse tabular benchmarks show that our method outperforms classical and LLM-based baselines and is more stable than one-shot generation. The framework generalizes across API-based and open-source LLMs and remains robust across downstream evaluators.
Paper Structure (28 sections, 8 equations, 11 figures, 3 tables)

This paper contains 28 sections, 8 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Empirical motivations.
  • Figure 2: Comparison of different solutions.
  • Figure 3: A feature transformation sequence example.
  • Figure 4: Different expressions of transformation sequences.
  • Figure 5: Data-centric closed-loop optimization of context experiences for LLM-driven feature transformation. Stage I explores high-performing sequences with downstream rewards and stores them in an experience library. Stage II refines the library through validation checks, enhancement, and entropy-based diversity control, and constructs selected experiences into few-shot CoT-style examples. Stage III uses these CoT-style examples to guide the LLM to generate a single transformation sequence, which is then verified and written back to the library.
  • ...and 6 more figures