Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines

Ekaterina Trofimova; Emil Sataev; Andrey E. Ustyuzhanin

Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines

Ekaterina Trofimova, Emil Sataev, Andrey E. Ustyuzhanin

TL;DR

The fine-tuning process is detailed and light is shed on how natural language descriptions can be translated into functional code, effectively bridging the gap between task descriptions and executable code.

Abstract

In the ever-evolving landscape of machine learning, seamless translation of natural language descriptions into executable code remains a formidable challenge. This paper introduces Linguacodus, an innovative framework designed to tackle this challenge by deploying a dynamic pipeline that iteratively transforms natural language task descriptions into code through high-level data-shaping instructions. The core of Linguacodus is a fine-tuned large language model (LLM), empowered to evaluate diverse solutions for various problems and select the most fitting one for a given task. This paper details the fine-tuning process, and sheds light on how natural language descriptions can be translated into functional code. Linguacodus represents a substantial leap towards automated code generation, effectively bridging the gap between task descriptions and executable code. It holds great promise for advancing machine learning applications across diverse domains. Additionally, we propose an algorithm capable of transforming a natural description of an ML task into code with minimal human interaction. In extensive experiments on a vast machine learning code dataset originating from Kaggle, we showcase the effectiveness of Linguacodus. The investigations highlight its potential applications across diverse domains, emphasizing its impact on applied machine learning in various scientific fields.

Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines

TL;DR

Abstract

Paper Structure (4 sections, 11 figures, 19 tables)

This paper contains 4 sections, 11 figures, 19 tables.

Llama 2 fine-tuning details
Sample instructions inferred by Code Llama - Instruct and fine-tuned Llama 2
Sample code generated by GPT-3.5 using task descriptions and our refined instructions.
List of the competitions used for validation.

Figures (11)

Figure 1: Linguacodus takes in the user-provided description of a machine learning task and generates an optimal solution instruction. This instruction is then optionally refined using Multi-role LLM. Another LLM is employed to infer executable ML code based on the enhanced instruction. The resulting code represents the most effective solution for the specified task.
Figure 2: Code4ML taxonomy tree. Reproduced from code4ml, with permission of the authors.
Figure 3: Overall Linguacodus training framework.
Figure 4: Prompt for ML instructions retrieving.
Figure 5: Llama 2 fine-tune input.
...and 6 more figures

Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines

TL;DR

Abstract

Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines

Authors

TL;DR

Abstract

Table of Contents

Figures (11)