Table of Contents
Fetching ...

Language-Parametric Reference Synthesis (Extended)

Daniel A. A. Pelsmaeker, Aron Zwaan, Casper Bach, Arjan J. Mooij

TL;DR

This paper tackles preserving name bindings during refactorings in languages with sophisticated binding rules by introducing language-parametric reference synthesis based on locked references. It derives a synthesizer for concrete references from declarative type specifications written in Statix, using scope graphs to guide the generation of valid, target-resolving references. The approach is evaluated on 2773 test programs across Java, ChocoPy, and FGJ, demonstrating soundness (synthetic references type-check and resolve to intended declarations) and substantial practical completeness, with most references synthesized within milliseconds to seconds. The work enables principled, language-parameterizable editor services and large-scale transformations by automating reference reconstruction, at the cost of higher computational effort in some complex cases. Data and artifacts are published to enable replication and further exploration of language-parametric reference synthesis.

Abstract

Modern Integrated Development Environments (IDEs) offer automated refactorings to aid programmers in developing and maintaining software. However, implementing sound automated refactorings is challenging, as refactorings may inadvertently introduce name-binding errors or cause references to resolve to incorrect declarations. To address these issues, previous work by Schäfer et al. proposed replacing concrete references with locked references to separate binding preservation from transformation. Locked references vacuously resolve to a specific declaration, and after transformation must be replaced with concrete references that also resolve to that declaration. Synthesizing these references requires a faithful inverse of the name lookup functions of the underlying language. Manually implementing such inverse lookup functions is challenging due to the complex name-binding features in modern programming languages. Instead, we propose to automatically derive this function from type system specifications written in the Statix meta-DSL. To guide the synthesis of qualified references we use scope graphs, which represent the binding structure of a program, to infer their names and discover their syntactic structure. We evaluate our approach by synthesizing concrete references for locked references in 2528 Java, 196 ChocoPy, and 49 Featherweight Generic Java test programs. Our approach yields a principled language-parametric method for synthesizing references.

Language-Parametric Reference Synthesis (Extended)

TL;DR

This paper tackles preserving name bindings during refactorings in languages with sophisticated binding rules by introducing language-parametric reference synthesis based on locked references. It derives a synthesizer for concrete references from declarative type specifications written in Statix, using scope graphs to guide the generation of valid, target-resolving references. The approach is evaluated on 2773 test programs across Java, ChocoPy, and FGJ, demonstrating soundness (synthetic references type-check and resolve to intended declarations) and substantial practical completeness, with most references synthesized within milliseconds to seconds. The work enables principled, language-parameterizable editor services and large-scale transformations by automating reference reconstruction, at the cost of higher computational effort in some complex cases. Data and artifacts are published to enable replication and further exploration of language-parametric reference synthesis.

Abstract

Modern Integrated Development Environments (IDEs) offer automated refactorings to aid programmers in developing and maintaining software. However, implementing sound automated refactorings is challenging, as refactorings may inadvertently introduce name-binding errors or cause references to resolve to incorrect declarations. To address these issues, previous work by Schäfer et al. proposed replacing concrete references with locked references to separate binding preservation from transformation. Locked references vacuously resolve to a specific declaration, and after transformation must be replaced with concrete references that also resolve to that declaration. Synthesizing these references requires a faithful inverse of the name lookup functions of the underlying language. Manually implementing such inverse lookup functions is challenging due to the complex name-binding features in modern programming languages. Instead, we propose to automatically derive this function from type system specifications written in the Statix meta-DSL. To guide the synthesis of qualified references we use scope graphs, which represent the binding structure of a program, to infer their names and discover their syntactic structure. We evaluate our approach by synthesizing concrete references for locked references in 2528 Java, 196 ChocoPy, and 49 Featherweight Generic Java test programs. Our approach yields a principled language-parametric method for synthesizing references.

Paper Structure

This paper contains 50 sections, 10 theorems, 38 equations, 14 figures.

Key Result

theorem 1

Programs with synthesized references are well-typed.

Figures (14)

  • Figure 1: Rename refactoring of a small Java program, where renaming the field to on line 2 requires the reference on line 7 to be appropriately qualified.
  • Figure 2: Intermediate steps for performing the Rename refactoring from \ref{['fig:java-rename-example']} using locked references. After locking the relevant reference $\texttt{y}$ to declaration $\texttt{y}_{2}$ (\ref{['fig:java-rename-example-intermediate-steps-before']}) and performing the transformation (\ref{['fig:java-rename-example-intermediate-steps-after']}), our approach would synthesize a solution for the locked reference and obtain \ref{['fig:java-rename-example-after']}.
  • Figure 3: An example LM NeronTVW15 program and its scope graph, where boxes and arrows represent scopes and reachability relations between scopes, respectively. The dashed box represents a query and the dashed arrows its resolution path to $\texttt{x}_{2}$, the second occurrence of a declaration named $\texttt{x}$.
  • Figure 4: A subset of the typing rules of LM, a toy language from NeronTVW15 used for the examples in this paper.
  • Figure 5: Small example program and its scope graph
  • ...and 9 more figures

Theorems & Definitions (19)

  • definition 1: Query-Connected Scopes
  • definition 2: Composite Path
  • theorem 1: Soundness 1
  • theorem 2: Soundness 2
  • theorem 3: Confluence
  • definition 3: Exhaustive Application
  • definition 4: Well-formed Initial Constraint
  • definition 5: Embedding of Subtitution
  • lemma 1: Equivalence of Embedding and Applying Substitutions
  • lemma 2: Equality Weakening
  • ...and 9 more