Decoding AI: The inside story of data analysis in ChatGPT

Ozan Evkaya; Miguel de Carvalho

Decoding AI: The inside story of data analysis in ChatGPT

Ozan Evkaya, Miguel de Carvalho

TL;DR

This paper evaluates ChatGPT's Data Analysis extension as a tool for data exploration and modeling. It documents a workflow where prompts are translated into Python code executed in a sandbox, enabling descriptive analytics, visualization, and supervised/unsupervised learning demonstrations. Key findings show strong utility for exploratory tasks and stepwise modeling guidance, but highlight limitations in model diagnostics, metric choices for nonlinear models, and occasional misinterpretations by the DA system. The authors advocate human oversight, careful prompting, and domain expertise to avoid misleading conclusions, while highlighting the reproducibility of background Python code and the potential for LLM-driven augmentation of traditional statistical tools.

Abstract

As a result of recent advancements in generative AI, the field of Data Science is prone to various changes. This review critically examines the Data Analysis (DA) capabilities of ChatGPT assessing its performance across a wide range of tasks. While DA provides researchers and practitioners with unprecedented analytical capabilities, it is far from being perfect, and it is important to recognize and address its limitations.

Decoding AI: The inside story of data analysis in ChatGPT

TL;DR

Abstract

Paper Structure (11 sections, 6 figures)

This paper contains 11 sections, 6 figures.

Introduction
Seeing Through Data
Getting Started
Loading and Preprocessing Data
Data Visualization
Learning from Supervised Data
Warm-up
Regression, I: From Linear to Nonlinear Models
Regression, II: Deep Neural Network
Learning from Unsupervised Data
Closing Remarks

Figures (6)

Figure 1: ChatGPT window to turn on the Data Analysis feature.
Figure 2: laptop data: Frequency per company, type, and CPU--GPU brands.
Figure 3: Histogram of laptop prices.
Figure 4: Side-by-side boxplots for laptop data.
Figure 5: Diagnostic plots for linear model for duke_forest data (response: price; predictor: area).
...and 1 more figures

Decoding AI: The inside story of data analysis in ChatGPT

TL;DR

Abstract

Decoding AI: The inside story of data analysis in ChatGPT

Authors

TL;DR

Abstract

Table of Contents

Figures (6)