Table of Contents
Fetching ...

Data-Driven Evidence-Based Syntactic Sugar Design

David OBrien, Robert Dyer, Tien N. Nguyen, Hridesh Rajan

TL;DR

This paper argues that data-driven programming language design can better align language evolution with real-world developer idioms rather than relying on intuition alone. It introduces a generalized control-flow graph framework and uses frequent subgraph mining on a massive Java code corpus to identify sugarable patterns that could motivate new syntactic sugars. Through large-scale empirical evaluation, the authors uncover 241 sugarable subgraphs and catalog 32 named sugars, including seven highlighted candidates, and they validate ideas via a human-subject survey. The results demonstrate the viability and relevance of data-driven design for guiding language evolution and provide a replicable blueprint for mining code to inform design decisions.

Abstract

Programming languages are essential tools for developers, and their evolution plays a crucial role in supporting the activities of developers. One instance of programming language evolution is the introduction of syntactic sugars, which are additional syntax elements that provide alternative, more readable code constructs. However, the process of designing and evolving a programming language has traditionally been guided by anecdotal experiences and intuition. Recent advances in tools and methodologies for mining open-source repositories have enabled developers to make data-driven software engineering decisions. In light of this, this paper proposes an approach for motivating data-driven programming evolution by applying frequent subgraph mining techniques to a large dataset of 166,827,154 open-source Java methods. The dataset is mined by generalizing Java control-flow graphs to capture broad programming language usages and instances of duplication. Frequent subgraphs are then extracted to identify potentially impactful opportunities for new syntactic sugars. Our diverse results demonstrate the benefits of the proposed technique by identifying new syntactic sugars involving a variety of programming constructs that could be implemented in Java, thus simplifying frequent code idioms. This approach can potentially provide valuable insights for Java language designers, and serve as a proof-of-concept for data-driven programming language design and evolution.

Data-Driven Evidence-Based Syntactic Sugar Design

TL;DR

This paper argues that data-driven programming language design can better align language evolution with real-world developer idioms rather than relying on intuition alone. It introduces a generalized control-flow graph framework and uses frequent subgraph mining on a massive Java code corpus to identify sugarable patterns that could motivate new syntactic sugars. Through large-scale empirical evaluation, the authors uncover 241 sugarable subgraphs and catalog 32 named sugars, including seven highlighted candidates, and they validate ideas via a human-subject survey. The results demonstrate the viability and relevance of data-driven design for guiding language evolution and provide a replicable blueprint for mining code to inform design decisions.

Abstract

Programming languages are essential tools for developers, and their evolution plays a crucial role in supporting the activities of developers. One instance of programming language evolution is the introduction of syntactic sugars, which are additional syntax elements that provide alternative, more readable code constructs. However, the process of designing and evolving a programming language has traditionally been guided by anecdotal experiences and intuition. Recent advances in tools and methodologies for mining open-source repositories have enabled developers to make data-driven software engineering decisions. In light of this, this paper proposes an approach for motivating data-driven programming evolution by applying frequent subgraph mining techniques to a large dataset of 166,827,154 open-source Java methods. The dataset is mined by generalizing Java control-flow graphs to capture broad programming language usages and instances of duplication. Frequent subgraphs are then extracted to identify potentially impactful opportunities for new syntactic sugars. Our diverse results demonstrate the benefits of the proposed technique by identifying new syntactic sugars involving a variety of programming constructs that could be implemented in Java, thus simplifying frequent code idioms. This approach can potentially provide valuable insights for Java language designers, and serve as a proof-of-concept for data-driven programming language design and evolution.
Paper Structure (20 sections, 12 figures, 10 tables)

This paper contains 20 sections, 12 figures, 10 tables.

Figures (12)

  • Figure 1: Example syntactic sugar: ternary operator
  • Figure 2: Potential syntax for Java multiple assignment
  • Figure 3: Potential syntax for Java multiple ++
  • Figure 4: Potential syntax for Java unless
  • Figure 5: Potential syntax for Java any/all
  • ...and 7 more figures