Table of Contents
Fetching ...

Towards Feature Engineering with Human and AI's Knowledge: Understanding Data Science Practitioners' Perceptions in Human&AI-Assisted Feature Engineering Design

Qian Zhu, Dakuo Wang, Shuai Ma, April Yi Wang, Zixin Chen, Udayan Khurana, Xiaojuan Ma

TL;DR

This work investigates how data science practitioners perceive and use feature suggestions when FE is supported by both human and AI inputs. A Jupyter Notebook plugin prototype combines human-generated features and AI-generated features (via external knowledge augmentation and Deep Feature Synthesis) and presents them in an integrated, interactive UI. In a user study with 14 practitioners, participants largely used features from both sources, valued semantic explainability, and reported complementary benefits of human and AI inputs, though AI features were seen as less explainable and trustworthy. The study yields design recommendations for transparency, contextual information, and adaptive, creator-aware feature recommendations to enhance collaborative FE in real-world DS workflows.

Abstract

As AI technology continues to advance, the importance of human-AI collaboration becomes increasingly evident, with numerous studies exploring its potential in various fields. One vital field is data science, including feature engineering (FE), where both human ingenuity and AI capabilities play pivotal roles. Despite the existence of AI-generated recommendations for FE, there remains a limited understanding of how to effectively integrate and utilize humans' and AI's knowledge. To address this gap, we design a readily-usable prototype, human\&AI-assisted FE in Jupyter notebooks. It harnesses the strengths of humans and AI to provide feature suggestions to users, seamlessly integrating these recommendations into practical workflows. Using the prototype as a research probe, we conducted an exploratory study to gain valuable insights into data science practitioners' perceptions, usage patterns, and their potential needs when presented with feature suggestions from both humans and AI. Through qualitative analysis, we discovered that the Creator of the feature (i.e., AI or human) significantly influences users' feature selection, and the semantic clarity of the suggested feature greatly impacts its adoption rate. Furthermore, our findings indicate that users perceive both differences and complementarity between features generated by humans and those generated by AI. Lastly, based on our study results, we derived a set of design recommendations for future human&AI FE design. Our findings show the collaborative potential between humans and AI in the field of FE.

Towards Feature Engineering with Human and AI's Knowledge: Understanding Data Science Practitioners' Perceptions in Human&AI-Assisted Feature Engineering Design

TL;DR

This work investigates how data science practitioners perceive and use feature suggestions when FE is supported by both human and AI inputs. A Jupyter Notebook plugin prototype combines human-generated features and AI-generated features (via external knowledge augmentation and Deep Feature Synthesis) and presents them in an integrated, interactive UI. In a user study with 14 practitioners, participants largely used features from both sources, valued semantic explainability, and reported complementary benefits of human and AI inputs, though AI features were seen as less explainable and trustworthy. The study yields design recommendations for transparency, contextual information, and adaptive, creator-aware feature recommendations to enhance collaborative FE in real-world DS workflows.

Abstract

As AI technology continues to advance, the importance of human-AI collaboration becomes increasingly evident, with numerous studies exploring its potential in various fields. One vital field is data science, including feature engineering (FE), where both human ingenuity and AI capabilities play pivotal roles. Despite the existence of AI-generated recommendations for FE, there remains a limited understanding of how to effectively integrate and utilize humans' and AI's knowledge. To address this gap, we design a readily-usable prototype, human\&AI-assisted FE in Jupyter notebooks. It harnesses the strengths of humans and AI to provide feature suggestions to users, seamlessly integrating these recommendations into practical workflows. Using the prototype as a research probe, we conducted an exploratory study to gain valuable insights into data science practitioners' perceptions, usage patterns, and their potential needs when presented with feature suggestions from both humans and AI. Through qualitative analysis, we discovered that the Creator of the feature (i.e., AI or human) significantly influences users' feature selection, and the semantic clarity of the suggested feature greatly impacts its adoption rate. Furthermore, our findings indicate that users perceive both differences and complementarity between features generated by humans and those generated by AI. Lastly, based on our study results, we derived a set of design recommendations for future human&AI FE design. Our findings show the collaborative potential between humans and AI in the field of FE.
Paper Structure (49 sections, 6 figures, 1 table)

This paper contains 49 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: A 10-stage DS/ML lifecycle, starting at the top -- Requirement Gathering wang2021autods
  • Figure 2: An overview of the human&AI FE design that synthesizes human- and AI-generated features collectively in Jupyter notebooks. The whole pipeline runs from left to right. Taking the given data table as input, we first process it (i.e., data cleaning), and then input the table into the human-assisted module and AI-assisted module separately. Each module generates a list of features, which can be organized in online notebooks using a plugin and displayed with an interactive tabular view for feature recommendation.
  • Figure 3: The interactive tabular interface embedded in the plugin of Jupyter Notebook. Users could interact with the view to check, filter, and select features suggested by other humans and/or AI.
  • Figure 4: The experimental notebook interface with certain helper code snippets (e.g., data loader and model training) that each participant used in our study. They followed the workflow and experienced the human&AI FE to choose, analyze or compare the suitable feature set for the given task.
  • Figure 5: Participants' final selection of features generated from humans and AI. The stacked chart shows the percentage of features selected by each participant. The table presents details of the selection results, including the number of selected features, the number of containing data attributes in the input data tables, and the percentage of human-generated features.
  • ...and 1 more figures