Table of Contents
Fetching ...

Inferring Pluggable Types with Machine Learning

Kazi Amanul Islam Siddiqui, Martin Kellogg

TL;DR

This work tackles the high cost of deploying pluggable type systems in legacy code by automatically inferring type qualifiers via machine learning. It introduces NaP-AST, a novel graph encoding that focuses on information relevant to a specific type system, and benchmarks graph-based models (GCN, GTN) against text-based LLMs for inferring nullability qualifiers in Java. Across 32,370 classes and 217,922 @Nullable annotations, the NaP-AST + GTN approach (NullGTN) achieves the strongest downstream impact, recovering about 69% of human-written qualifiers and reducing 69% of NullAway warnings, with precision around 0.39. The study also identifies data requirements, finding that roughly 16k annotated classes are needed for good performance, and discusses data-scarcity and a chicken-egg barrier for extending to other pluggable type systems. Overall, the results demonstrate the feasibility of ML-assisted qualifier inference and provide a concrete pipeline and dataset to enable broader adoption and future work in this area.

Abstract

Pluggable type systems allow programmers to extend the type system of a programming language to enforce semantic properties defined by the programmer. Pluggable type systems are difficult to deploy in legacy codebases because they require programmers to write type annotations manually. This paper investigates how to use machine learning to infer type qualifiers automatically. We propose a novel representation, NaP-AST, that encodes minimal dataflow hints for the effective inference of type qualifiers. We evaluate several model architectures for inferring type qualifiers, including Graph Transformer Network, Graph Convolutional Network and Large Language Model. We further validated these models by applying them to 12 open-source programs from a prior evaluation of the NullAway pluggable typechecker, lowering warnings in all but one unannotated project. We discovered that GTN shows the best performance, with a recall of .89 and precision of 0.6. Furthermore, we conduct a study to estimate the number of Java classes needed for good performance of the trained model. For our feasibility study, performance improved around 16k classes, and deteriorated due to overfitting around 22k classes.

Inferring Pluggable Types with Machine Learning

TL;DR

This work tackles the high cost of deploying pluggable type systems in legacy code by automatically inferring type qualifiers via machine learning. It introduces NaP-AST, a novel graph encoding that focuses on information relevant to a specific type system, and benchmarks graph-based models (GCN, GTN) against text-based LLMs for inferring nullability qualifiers in Java. Across 32,370 classes and 217,922 @Nullable annotations, the NaP-AST + GTN approach (NullGTN) achieves the strongest downstream impact, recovering about 69% of human-written qualifiers and reducing 69% of NullAway warnings, with precision around 0.39. The study also identifies data requirements, finding that roughly 16k annotated classes are needed for good performance, and discusses data-scarcity and a chicken-egg barrier for extending to other pluggable type systems. Overall, the results demonstrate the feasibility of ML-assisted qualifier inference and provide a concrete pipeline and dataset to enable broader adoption and future work in this area.

Abstract

Pluggable type systems allow programmers to extend the type system of a programming language to enforce semantic properties defined by the programmer. Pluggable type systems are difficult to deploy in legacy codebases because they require programmers to write type annotations manually. This paper investigates how to use machine learning to infer type qualifiers automatically. We propose a novel representation, NaP-AST, that encodes minimal dataflow hints for the effective inference of type qualifiers. We evaluate several model architectures for inferring type qualifiers, including Graph Transformer Network, Graph Convolutional Network and Large Language Model. We further validated these models by applying them to 12 open-source programs from a prior evaluation of the NullAway pluggable typechecker, lowering warnings in all but one unannotated project. We discovered that GTN shows the best performance, with a recall of .89 and precision of 0.6. Furthermore, we conduct a study to estimate the number of Java classes needed for good performance of the trained model. For our feasibility study, performance improved around 16k classes, and deteriorated due to overfitting around 22k classes.
Paper Structure (55 sections, 8 figures, 2 tables)

This paper contains 55 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: In heterogeneous graphs (i.e. having multiple edge types), a meta-path is a sequence of connected nodes and edges, where each edge has a distinct type.
  • Figure 2: An example TDG snippet. A function $serialize()$ that has a parameter called bytes of type $byte[]$. $serialize()$ is an expression node. $bytes$ is a symbol node. $Has_parameter$ is a merge node.
  • Figure 3: Order of steps in NaP-AST construction.
  • Figure 4: Name Augmentation. This diagram shows the pruned AST as yellow nodes, and a node ("x") in the name layer in green. All the nodes that use the same name are connected by a second edge and node type.
  • Figure 5: Ablation study to decide which node types can be dropped without affecting the F1 score. Only node types that performed worse than dropping random nodes (in red) were retained in later models.
  • ...and 3 more figures