Towards Feature Engineering with Human and AI's Knowledge: Understanding Data Science Practitioners' Perceptions in Human&AI-Assisted Feature Engineering Design

Qian Zhu; Dakuo Wang; Shuai Ma; April Yi Wang; Zixin Chen; Udayan Khurana; Xiaojuan Ma

Towards Feature Engineering with Human and AI's Knowledge: Understanding Data Science Practitioners' Perceptions in Human&AI-Assisted Feature Engineering Design

Qian Zhu, Dakuo Wang, Shuai Ma, April Yi Wang, Zixin Chen, Udayan Khurana, Xiaojuan Ma

TL;DR

This work investigates how data science practitioners perceive and use feature suggestions when FE is supported by both human and AI inputs. A Jupyter Notebook plugin prototype combines human-generated features and AI-generated features (via external knowledge augmentation and Deep Feature Synthesis) and presents them in an integrated, interactive UI. In a user study with 14 practitioners, participants largely used features from both sources, valued semantic explainability, and reported complementary benefits of human and AI inputs, though AI features were seen as less explainable and trustworthy. The study yields design recommendations for transparency, contextual information, and adaptive, creator-aware feature recommendations to enhance collaborative FE in real-world DS workflows.

Abstract

As AI technology continues to advance, the importance of human-AI collaboration becomes increasingly evident, with numerous studies exploring its potential in various fields. One vital field is data science, including feature engineering (FE), where both human ingenuity and AI capabilities play pivotal roles. Despite the existence of AI-generated recommendations for FE, there remains a limited understanding of how to effectively integrate and utilize humans' and AI's knowledge. To address this gap, we design a readily-usable prototype, human\&AI-assisted FE in Jupyter notebooks. It harnesses the strengths of humans and AI to provide feature suggestions to users, seamlessly integrating these recommendations into practical workflows. Using the prototype as a research probe, we conducted an exploratory study to gain valuable insights into data science practitioners' perceptions, usage patterns, and their potential needs when presented with feature suggestions from both humans and AI. Through qualitative analysis, we discovered that the Creator of the feature (i.e., AI or human) significantly influences users' feature selection, and the semantic clarity of the suggested feature greatly impacts its adoption rate. Furthermore, our findings indicate that users perceive both differences and complementarity between features generated by humans and those generated by AI. Lastly, based on our study results, we derived a set of design recommendations for future human&AI FE design. Our findings show the collaborative potential between humans and AI in the field of FE.

Towards Feature Engineering with Human and AI's Knowledge: Understanding Data Science Practitioners' Perceptions in Human&AI-Assisted Feature Engineering Design

TL;DR

Abstract

Paper Structure (49 sections, 6 figures, 1 table)

This paper contains 49 sections, 6 figures, 1 table.

Introduction
Related Work
Data Science Project Lifecycle and Feature Engineering
Automated Machine Learning (AutoML) and AI-Assisted Feature Engineering
Interactive Feature Engineering Systems
Human&AI Feature Engineering Design
AI-assisted Feature Engineering Module
Automated Data Augmentation with External Knowledge
Automated Feature Generation with Deep Feature Synthesis
Human-Assisted Feature Engineering Module
Two Methods for Collecting Human Created Features
Feature Format and Information Collection
Feature Recommendation User Interface
User Study
Participants
...and 34 more sections

Figures (6)

Figure 1: A 10-stage DS/ML lifecycle, starting at the top -- Requirement Gathering wang2021autods
Figure 2: An overview of the human&AI FE design that synthesizes human- and AI-generated features collectively in Jupyter notebooks. The whole pipeline runs from left to right. Taking the given data table as input, we first process it (i.e., data cleaning), and then input the table into the human-assisted module and AI-assisted module separately. Each module generates a list of features, which can be organized in online notebooks using a plugin and displayed with an interactive tabular view for feature recommendation.
Figure 3: The interactive tabular interface embedded in the plugin of Jupyter Notebook. Users could interact with the view to check, filter, and select features suggested by other humans and/or AI.
Figure 4: The experimental notebook interface with certain helper code snippets (e.g., data loader and model training) that each participant used in our study. They followed the workflow and experienced the human&AI FE to choose, analyze or compare the suitable feature set for the given task.
Figure 5: Participants' final selection of features generated from humans and AI. The stacked chart shows the percentage of features selected by each participant. The table presents details of the selection results, including the number of selected features, the number of containing data attributes in the input data tables, and the percentage of human-generated features.
...and 1 more figures

Towards Feature Engineering with Human and AI's Knowledge: Understanding Data Science Practitioners' Perceptions in Human&AI-Assisted Feature Engineering Design

TL;DR

Abstract

Towards Feature Engineering with Human and AI's Knowledge: Understanding Data Science Practitioners' Perceptions in Human&AI-Assisted Feature Engineering Design

Authors

TL;DR

Abstract

Table of Contents

Figures (6)