Table of Contents
Fetching ...

ICE-SEARCH: A Language Model-Driven Feature Selection Approach

Tianze Yang, Tianyi Yang, Fuyuan Lyu, Shaoshan Liu, Xue, Liu

TL;DR

ICE-SEARCH introduces a novel framework that fuses large language models with evolutionary search to perform feature selection in Medical Predictive Analytics. By using the LLM as the crossover and mutation operator and coupling it with traditional FS initialization, ICE-SEARCH reduces the effective search space and leverages domain knowledge across roles to identify high-impact feature subsets. Across stroke, cardiovascular disease, and diabetes prediction tasks, the method achieves state-of-the-art or competitive performance, demonstrates robustness to initialization and distribution shifts, and exhibits fast convergence with a lightweight training footprint. The work advances AI-assisted feature selection by enabling dynamic prompt evolution, role-based reasoning, and cross-domain knowledge transfer, with meaningful implications for medical data preprocessing and predictive analytics pipelines.

Abstract

This study unveils the In-Context Evolutionary Search (ICE-SEARCH) method, which is among the first works that melds large language models (LLMs) with evolutionary algorithms for feature selection (FS) tasks and demonstrates its effectiveness in Medical Predictive Analytics (MPA) applications. ICE-SEARCH harnesses the crossover and mutation capabilities inherent in LLMs within an evolutionary framework, significantly improving FS through the model's comprehensive world knowledge and its adaptability to a variety of roles. Our evaluation of this methodology spans three crucial MPA tasks: stroke, cardiovascular disease, and diabetes, where ICE-SEARCH outperforms traditional FS methods in pinpointing essential features for medical applications. ICE-SEARCH achieves State-of-the-Art (SOTA) performance in stroke prediction and diabetes prediction; the Decision-Randomized ICE-SEARCH ranks as SOTA in cardiovascular disease prediction. The study emphasizes the critical role of incorporating domain-specific insights, illustrating ICE-SEARCH's robustness, generalizability, and convergence. This opens avenues for further research into comprehensive and intricate FS landscapes, marking a significant stride in the application of artificial intelligence in medical predictive analytics.

ICE-SEARCH: A Language Model-Driven Feature Selection Approach

TL;DR

ICE-SEARCH introduces a novel framework that fuses large language models with evolutionary search to perform feature selection in Medical Predictive Analytics. By using the LLM as the crossover and mutation operator and coupling it with traditional FS initialization, ICE-SEARCH reduces the effective search space and leverages domain knowledge across roles to identify high-impact feature subsets. Across stroke, cardiovascular disease, and diabetes prediction tasks, the method achieves state-of-the-art or competitive performance, demonstrates robustness to initialization and distribution shifts, and exhibits fast convergence with a lightweight training footprint. The work advances AI-assisted feature selection by enabling dynamic prompt evolution, role-based reasoning, and cross-domain knowledge transfer, with meaningful implications for medical data preprocessing and predictive analytics pipelines.

Abstract

This study unveils the In-Context Evolutionary Search (ICE-SEARCH) method, which is among the first works that melds large language models (LLMs) with evolutionary algorithms for feature selection (FS) tasks and demonstrates its effectiveness in Medical Predictive Analytics (MPA) applications. ICE-SEARCH harnesses the crossover and mutation capabilities inherent in LLMs within an evolutionary framework, significantly improving FS through the model's comprehensive world knowledge and its adaptability to a variety of roles. Our evaluation of this methodology spans three crucial MPA tasks: stroke, cardiovascular disease, and diabetes, where ICE-SEARCH outperforms traditional FS methods in pinpointing essential features for medical applications. ICE-SEARCH achieves State-of-the-Art (SOTA) performance in stroke prediction and diabetes prediction; the Decision-Randomized ICE-SEARCH ranks as SOTA in cardiovascular disease prediction. The study emphasizes the critical role of incorporating domain-specific insights, illustrating ICE-SEARCH's robustness, generalizability, and convergence. This opens avenues for further research into comprehensive and intricate FS landscapes, marking a significant stride in the application of artificial intelligence in medical predictive analytics.
Paper Structure (23 sections, 5 figures, 13 tables, 1 algorithm)

This paper contains 23 sections, 5 figures, 13 tables, 1 algorithm.

Figures (5)

  • Figure 1: Feature Selection 4-step Process (adapted from liu2005toward)
  • Figure 2: An illustration of ICE-SEARCH Architecture
  • Figure 3: An illustration of ICE-SEARCH Promptings
  • Figure 4: Geometric Patterns of DALYs Due to HHD in 2019. (Source of Data: yang2023global)
  • Figure 5: Comparative convergence analysis of XGBoost model accuracies across stroke, cardiovascular, and diabetes datasets over 10-fold cross-validation. The top row displays the convergence trends for the stroke dataset, the middle for the cardiovascular dataset, and the bottom figure for the diabetes dataset. This figure highlights the model's performance and convergence behavior in handling different medical datasets through cross-validation.