A Machine Learning Approach Towards SKILL Code Autocompletion

Enrique Dehaerne; Bappaditya Dey; Wannes Meert

A Machine Learning Approach Towards SKILL Code Autocompletion

Enrique Dehaerne, Bappaditya Dey, Wannes Meert

TL;DR

This study is the first to apply transformers to SKILL code autocompletion towards improving the productivity of hardware design engineers and shows that models trained using the proposed methodology outperform baselines in terms of human-judgment score and BLEU score.

Abstract

As Moore's Law continues to increase the complexity of electronic systems, Electronic Design Automation (EDA) must advance to meet global demand. An important example of an EDA technology is SKILL, a scripting language used to customize and extend EDA software. Recently, code generation models using the transformer architecture have achieved impressive results in academic settings and have even been used in commercial developer tools to improve developer productivity. To the best of our knowledge, this study is the first to apply transformers to SKILL code autocompletion towards improving the productivity of hardware design engineers. In this study, a novel, data-efficient methodology for generating SKILL code is proposed and experimentally validated. More specifically, we propose a novel methodology for (i) creating a high-quality SKILL dataset with both unlabeled and labeled data, (ii) a training strategy where T5 models pre-trained on general programming language code are fine-tuned on our custom SKILL dataset using unsupervised and supervised learning, and (iii) evaluating synthesized SKILL code. We show that models trained using the proposed methodology outperform baselines in terms of human-judgment score and BLEU score. A major challenge faced was the extremely small amount of available SKILL code data that can be used to train a transformer model to generate SKILL code. Despite our validated improvements, the extremely small dataset available to us was still not enough to train a model that can reliably autocomplete SKILL code. We discuss this and other limitations as well as future work that could address these limitations.

A Machine Learning Approach Towards SKILL Code Autocompletion

TL;DR

Abstract

Paper Structure (23 sections, 11 figures, 6 tables)

This paper contains 23 sections, 11 figures, 6 tables.

Introduction
Related Work
Methodology
Custom SKILL Dataset
Proprietary SKILL Data
Open-source SKILL Data
SKILL Input-Output Pairs
Dataset Filtering, Deduplication, & Split
Models & Training
Evaluation of Synthesized SKILL Code
Static-Analysis Metric
BLEU Implementation
Correlation of Metrics with Human Judgement
Experimental Setup & Pre-processing
Results & Discussion
...and 8 more sections

Figures (11)

Figure 1: On the left, an example of a parameterized cell (PCell) in the SKILL integrated design environment. The code in red is comments which describe the code underneath it. On the right, instantiations of the PCell with different values of parameters are shown as they appear in the Virtuoso layout editor cadence_virtuoso. The top instantiation uses all default parameter values, the middle instantiation has the height and width values set to 3, and the bottom instantiation has the layer type set to "metal2".
Figure 2: Flowchart showing the sources used and steps taken to create self-supervised and supervised SKILL data. Note that a small amount of data from primary proprietary sources was added to the training split (see Section \ref{['filtering']}) which is not depicted in this flowchart.
Figure 3: Three example SKILL programs with annotations showing which parts of the program would belong to the input (green) and output (purple) of a pair. The example on the left is equivalent to the program shown in Figure \ref{['fig:skill_run_example']} and can be split into a comment-function pair (see Section \ref{['sect:supervised_dataset']}). The top-right and bottom-right pairs are function-completion and comment-code pairs, respectively. Note that these SKILL programs were manually written and were not included in the SKILL dataset.
Figure 4: Flowchart showing the high-level training and evaluation steps taken. The "Best BLEU" condition is what decides, for each model type, which training strategy resulted in the best-trained model. This model that achieved the best BLEU score was chosen for final evaluation. Note that certain models that did not achieve the best BLEU score for a given model type were also selected for final evaluation (see Section \ref{['results_training']}).
Figure 5: Graphical depiction of MLM and autoregressive modeling for an example SKILL statement.
...and 6 more figures

A Machine Learning Approach Towards SKILL Code Autocompletion

TL;DR

Abstract

A Machine Learning Approach Towards SKILL Code Autocompletion

Authors

TL;DR

Abstract

Table of Contents

Figures (11)