Table of Contents
Fetching ...

Towards Automatic Translation of Machine Learning Visual Insights to Analytical Assertions

Arumoy Shome, Luis Cruz, Arie van Deursen

TL;DR

This paper proposes an automated tool to translate ML visualisations into Python assertions, addressing the need for verification as data and assumptions evolve after deployment. It builds on a catalogue of $269$ visually tied assertion pairs from $54{,}070$ Jupyter notebooks and introduces a taxonomy to organize VA pairs by ML verification tasks. An empirical evaluation will compare NLP4Code models and open-source LLMs using translation metrics such as $BLEU$, $Meteor$, and $Rouge-L$, supplemented by qualitative human studies, and will extend the dataset with VA pairs from Kaggle while benchmarking against commercial models like ChatGPT. If successful, the tool would reduce manual verification effort, improve scalability and consistency of ML validation, and potentially extend to other domains like scientific computing, enhancing interpretability and human-AI collaboration in software engineering for ML.

Abstract

We present our vision for developing an automated tool capable of translating visual properties observed in Machine Learning (ML) visualisations into Python assertions. The tool aims to streamline the process of manually verifying these visualisations in the ML development cycle, which is critical as real-world data and assumptions often change post-deployment. In a prior study, we mined $54,070$ Jupyter notebooks from Github and created a catalogue of $269$ semantically related visualisation-assertion (VA) pairs. Building on this catalogue, we propose to build a taxonomy that organises the VA pairs based on ML verification tasks. The input feature space comprises of a rich source of information mined from the Jupyter notebooks -- visualisations, Python source code, and associated markdown text. The effectiveness of various AI models, including traditional NLP4Code models and modern Large Language Models, will be compared using established machine translation metrics and evaluated through a qualitative study with human participants. The paper also plans to address the challenge of extending the existing VA pair dataset with additional pairs from Kaggle and to compare the tool's effectiveness with commercial generative AI models like ChatGPT. This research not only contributes to the field of ML system validation but also explores novel ways to leverage AI for automating and enhancing software engineering practices in ML.

Towards Automatic Translation of Machine Learning Visual Insights to Analytical Assertions

TL;DR

This paper proposes an automated tool to translate ML visualisations into Python assertions, addressing the need for verification as data and assumptions evolve after deployment. It builds on a catalogue of visually tied assertion pairs from Jupyter notebooks and introduces a taxonomy to organize VA pairs by ML verification tasks. An empirical evaluation will compare NLP4Code models and open-source LLMs using translation metrics such as , , and , supplemented by qualitative human studies, and will extend the dataset with VA pairs from Kaggle while benchmarking against commercial models like ChatGPT. If successful, the tool would reduce manual verification effort, improve scalability and consistency of ML validation, and potentially extend to other domains like scientific computing, enhancing interpretability and human-AI collaboration in software engineering for ML.

Abstract

We present our vision for developing an automated tool capable of translating visual properties observed in Machine Learning (ML) visualisations into Python assertions. The tool aims to streamline the process of manually verifying these visualisations in the ML development cycle, which is critical as real-world data and assumptions often change post-deployment. In a prior study, we mined Jupyter notebooks from Github and created a catalogue of semantically related visualisation-assertion (VA) pairs. Building on this catalogue, we propose to build a taxonomy that organises the VA pairs based on ML verification tasks. The input feature space comprises of a rich source of information mined from the Jupyter notebooks -- visualisations, Python source code, and associated markdown text. The effectiveness of various AI models, including traditional NLP4Code models and modern Large Language Models, will be compared using established machine translation metrics and evaluated through a qualitative study with human participants. The paper also plans to address the challenge of extending the existing VA pair dataset with additional pairs from Kaggle and to compare the tool's effectiveness with commercial generative AI models like ChatGPT. This research not only contributes to the field of ML system validation but also explores novel ways to leverage AI for automating and enhancing software engineering practices in ML.
Paper Structure (13 sections, 1 figure)