Table of Contents
Fetching ...

AlloyASG: Alloy Predicate Code Representation as a Compact Structurally Balanced Graph

Guanxuan Wu, Allison Sullivan

TL;DR

This work introduces CSBASG, a Complex-weighted Structurally Balanced Abstract Semantic Graph, to compactly represent Alloy code and enable direct comparison of predicates. By merging semantically identical AST nodes and encoding edges with a polynomial magnitude mapping, CSBASG achieves significant size reductions while preserving full expressivity for finite Alloy expressions. The framework is evaluated on 6,307 Alloy4Fun predicates, showing substantial compactness gains (avg. ~27.25% node reduction) and meaningful edge-level comparability between faulty and ground-truth solutions. The paper also outlines future directions in automated repair, code generation, and generalization to other languages, leveraging the CSBASG formalism and its mathematical properties, such as structural balance and the associated Laplacian structure.

Abstract

Writing declarative models has numerous benefits, ranging from automated reasoning and correction of design-level properties before systems are built to automated testing and debugging of their implementations after they are built. Unfortunately, the model itself needs to be correct to gain these benefits. Alloy is a commonly used modeling language that has several existing efforts to repair faulty models automatically. Currently, these efforts are search-based methods that use an Abstract Syntax Tree (AST) representation of the model and do not scale. One issue is that ASTs themselves suffer from exponential growth in their data size due to the limitation that ASTs will often have identical nodes separately listed in the tree. To address this issue, we introduce a novel code representation schema, Complex Structurally Balanced Abstract Semantic Graph (CSBASG), which represents code as a complex-weighted directed graph that lists a semantic element as a node in the graph and ensures its structural balance for almost finitely enumerable code segments. We evaluate the efficiency of our CSBASG representation for Alloy models in terms of it's compactness compared to ASTs, and we explore if a CSBASG can ease the process of comparing two Alloy predicates. Moreover, with this representation in place, we identify several future applications of CSBASG, including Alloy code generation and automated repair.

AlloyASG: Alloy Predicate Code Representation as a Compact Structurally Balanced Graph

TL;DR

This work introduces CSBASG, a Complex-weighted Structurally Balanced Abstract Semantic Graph, to compactly represent Alloy code and enable direct comparison of predicates. By merging semantically identical AST nodes and encoding edges with a polynomial magnitude mapping, CSBASG achieves significant size reductions while preserving full expressivity for finite Alloy expressions. The framework is evaluated on 6,307 Alloy4Fun predicates, showing substantial compactness gains (avg. ~27.25% node reduction) and meaningful edge-level comparability between faulty and ground-truth solutions. The paper also outlines future directions in automated repair, code generation, and generalization to other languages, leveraging the CSBASG formalism and its mathematical properties, such as structural balance and the associated Laplacian structure.

Abstract

Writing declarative models has numerous benefits, ranging from automated reasoning and correction of design-level properties before systems are built to automated testing and debugging of their implementations after they are built. Unfortunately, the model itself needs to be correct to gain these benefits. Alloy is a commonly used modeling language that has several existing efforts to repair faulty models automatically. Currently, these efforts are search-based methods that use an Abstract Syntax Tree (AST) representation of the model and do not scale. One issue is that ASTs themselves suffer from exponential growth in their data size due to the limitation that ASTs will often have identical nodes separately listed in the tree. To address this issue, we introduce a novel code representation schema, Complex Structurally Balanced Abstract Semantic Graph (CSBASG), which represents code as a complex-weighted directed graph that lists a semantic element as a node in the graph and ensures its structural balance for almost finitely enumerable code segments. We evaluate the efficiency of our CSBASG representation for Alloy models in terms of it's compactness compared to ASTs, and we explore if a CSBASG can ease the process of comparing two Alloy predicates. Moreover, with this representation in place, we identify several future applications of CSBASG, including Alloy code generation and automated repair.
Paper Structure (20 sections, 2 theorems, 2 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 2 theorems, 2 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

sbgsbgcomplete The following are equivalent:

Figures (5)

  • Figure 1: Alloy Model of a Classroom Management System with an oracle and student submission for inv15
  • Figure 2: Counterexample highlighting difference between inv15 and inv15oracle
  • Figure 3: An instance of an Alloy predicate in the dataset parsed with pretty string and its raw AST representation.
  • Figure 4: An example illustrating the compactness and simplicity of CSBASG; the original code (upper) could be expressed as an equivalent multigraph (middle) with the AST in Figure 1, and turns into the adjacency relationship (lower) with the polynomial-based encoding. With the same predicate in Figure 1, the number of nodes reduced significantly from 31 to 12. All edges are now assigned an integer weight value, which forms an adjacency matrix.
  • Figure 5: An example of a pair of student-written mutant predicate (Inv2) and the ground truth (Inv2C). The ASG edges in form $(x,y)$ indicate the $x$-th visit of the sourcing node, and the positional relationship between the nodes is $y$. Black edges are present in both predicates, red edges are present in the mutant only, and blue edges are present in the ground truth only.

Theorems & Definitions (10)

  • Definition 1: Context-Free Grammar
  • Definition 2: AST
  • Definition 3: ASG
  • Definition 4: Complex-Weighted Structural Balanced Graph
  • Lemma 1
  • proof
  • Definition 5: Complex-valued Structurally Balanced ASG (CSBASG)
  • Definition 6: Completeness of CSBASG
  • Lemma 2
  • proof