Towards Automatic Translation of Machine Learning Visual Insights to Analytical Assertions

Arumoy Shome; Luis Cruz; Arie van Deursen

Towards Automatic Translation of Machine Learning Visual Insights to Analytical Assertions

Arumoy Shome, Luis Cruz, Arie van Deursen

TL;DR

This paper proposes an automated tool to translate ML visualisations into Python assertions, addressing the need for verification as data and assumptions evolve after deployment. It builds on a catalogue of $269$ visually tied assertion pairs from $54{,}070$ Jupyter notebooks and introduces a taxonomy to organize VA pairs by ML verification tasks. An empirical evaluation will compare NLP4Code models and open-source LLMs using translation metrics such as $BLEU$, $Meteor$, and $Rouge-L$, supplemented by qualitative human studies, and will extend the dataset with VA pairs from Kaggle while benchmarking against commercial models like ChatGPT. If successful, the tool would reduce manual verification effort, improve scalability and consistency of ML validation, and potentially extend to other domains like scientific computing, enhancing interpretability and human-AI collaboration in software engineering for ML.

Abstract

We present our vision for developing an automated tool capable of translating visual properties observed in Machine Learning (ML) visualisations into Python assertions. The tool aims to streamline the process of manually verifying these visualisations in the ML development cycle, which is critical as real-world data and assumptions often change post-deployment. In a prior study, we mined $54,070$ Jupyter notebooks from Github and created a catalogue of $269$ semantically related visualisation-assertion (VA) pairs. Building on this catalogue, we propose to build a taxonomy that organises the VA pairs based on ML verification tasks. The input feature space comprises of a rich source of information mined from the Jupyter notebooks -- visualisations, Python source code, and associated markdown text. The effectiveness of various AI models, including traditional NLP4Code models and modern Large Language Models, will be compared using established machine translation metrics and evaluated through a qualitative study with human participants. The paper also plans to address the challenge of extending the existing VA pair dataset with additional pairs from Kaggle and to compare the tool's effectiveness with commercial generative AI models like ChatGPT. This research not only contributes to the field of ML system validation but also explores novel ways to leverage AI for automating and enhancing software engineering practices in ML.

Towards Automatic Translation of Machine Learning Visual Insights to Analytical Assertions

TL;DR

visually tied assertion pairs from

Jupyter notebooks and introduces a taxonomy to organize VA pairs by ML verification tasks. An empirical evaluation will compare NLP4Code models and open-source LLMs using translation metrics such as

, and

, supplemented by qualitative human studies, and will extend the dataset with VA pairs from Kaggle while benchmarking against commercial models like ChatGPT. If successful, the tool would reduce manual verification effort, improve scalability and consistency of ML validation, and potentially extend to other domains like scientific computing, enhancing interpretability and human-AI collaboration in software engineering for ML.

Abstract

Jupyter notebooks from Github and created a catalogue of

semantically related visualisation-assertion (VA) pairs. Building on this catalogue, we propose to build a taxonomy that organises the VA pairs based on ML verification tasks. The input feature space comprises of a rich source of information mined from the Jupyter notebooks -- visualisations, Python source code, and associated markdown text. The effectiveness of various AI models, including traditional NLP4Code models and modern Large Language Models, will be compared using established machine translation metrics and evaluated through a qualitative study with human participants. The paper also plans to address the challenge of extending the existing VA pair dataset with additional pairs from Kaggle and to compare the tool's effectiveness with commercial generative AI models like ChatGPT. This research not only contributes to the field of ML system validation but also explores novel ways to leverage AI for automating and enhancing software engineering practices in ML.

Paper Structure (13 sections, 1 figure)

This paper contains 13 sections, 1 figure.

Introduction
Our Vision
RQ1: How are VA pairs used to perform ML verification tasks?
RQ2: What kind of input features enables AI models to generate assertions from visualisations?
RQ3: What kind of AI models generate the best assertions from visualisations?
RQ4: How does our solution compare to commercial generative AI models?
Challenges
Dataset Limitations and Bias
Complexity of Visual Analytics
Integration into Existing Workflows
Expected Outcomes
Expected Outcomes Beyond ML Testing
Conclusion

Figures (1)

Figure 1: Vision for the automated tool proposed in this paper to generate analytical assertions from ML visualisations.

Towards Automatic Translation of Machine Learning Visual Insights to Analytical Assertions

TL;DR

Abstract

Towards Automatic Translation of Machine Learning Visual Insights to Analytical Assertions

Authors

TL;DR

Abstract

Table of Contents

Figures (1)