Table of Contents
Fetching ...

Getting Python Types Right with RightTyper

Juan Altmayer Pizzorno, Emery D. Berger

TL;DR

RightTyper tackles the challenge of missing and error-prone Python type annotations by grounding inference in actual runtime behavior and static analysis. It combines adaptive Poisson-based sampling, Good–Turing-inspired container analysis, and name-resolution-informed typing to generate precise annotations. The authors provide an open-source prototype, enhance the TypeSim metric, and demonstrate superior semantic similarity to ground-truth and developer annotations with modest overhead. The work offers a practical path to scalable, reliable typing in real-world Python codebases.

Abstract

Python type annotations enable static type checking, but most code remains untyped because manual annotation is time-consuming and tedious. Past approaches to automatic type inference fall short: static methods struggle with dynamic features and infer overly broad types; AI-based methods are unsound and miss rare types; and dynamic methods impose extreme overheads (up to 270x), lack important language support such as inferring variable types, or produce annotations that cause runtime errors. This paper presents RightTyper, a novel hybrid approach for Python that produces accurate and precise type annotations grounded in actual program behavior. RightTyper grounds inference in types observed during actual program execution and combines these observations with static analysis and name resolution to produce substantially higher-quality type annotations than prior approaches. Through principled, statistically guided adaptive sampling, RightTyper balances runtime overhead with the need to observe sufficient execution behavior to infer high-quality type annotations. We evaluate RightTyper against static, dynamic, and AI-based systems on both synthetic benchmarks and real-world code, and find that it consistently achieves higher semantic similarity to ground-truth and developer-written annotations, respectively, while incurring only approximately 25% runtime overhead.

Getting Python Types Right with RightTyper

TL;DR

RightTyper tackles the challenge of missing and error-prone Python type annotations by grounding inference in actual runtime behavior and static analysis. It combines adaptive Poisson-based sampling, Good–Turing-inspired container analysis, and name-resolution-informed typing to generate precise annotations. The authors provide an open-source prototype, enhance the TypeSim metric, and demonstrate superior semantic similarity to ground-truth and developer annotations with modest overhead. The work offers a practical path to scalable, reliable typing in real-world Python codebases.

Abstract

Python type annotations enable static type checking, but most code remains untyped because manual annotation is time-consuming and tedious. Past approaches to automatic type inference fall short: static methods struggle with dynamic features and infer overly broad types; AI-based methods are unsound and miss rare types; and dynamic methods impose extreme overheads (up to 270x), lack important language support such as inferring variable types, or produce annotations that cause runtime errors. This paper presents RightTyper, a novel hybrid approach for Python that produces accurate and precise type annotations grounded in actual program behavior. RightTyper grounds inference in types observed during actual program execution and combines these observations with static analysis and name resolution to produce substantially higher-quality type annotations than prior approaches. Through principled, statistically guided adaptive sampling, RightTyper balances runtime overhead with the need to observe sufficient execution behavior to infer high-quality type annotations. We evaluate RightTyper against static, dynamic, and AI-based systems on both synthetic benchmarks and real-world code, and find that it consistently achieves higher semantic similarity to ground-truth and developer-written annotations, respectively, while incurring only approximately 25% runtime overhead.

Paper Structure

This paper contains 20 sections, 2 equations, 14 figures, 1 table, 1 algorithm.

Figures (14)

  • Figure 1: RightTyper Overview:RightTyper executes the target program under instrumentation, dynamically managing it to create Poisson-distributed observation windows (§\ref{['approach-instrumentation']}). As the instrumentation delivers execution events, it observes and collects types used at runtime (§\ref{['object-typing']}, §\ref{['object-sampling']}). After execution, RightTyper combines these runtime observations with types extracted via static analysis to generate type annotations (§\ref{['typing-functions']}--§\ref{['annotating-code']}).
  • Figure 2: Capturing variable initializations:Although RightTyper primarily captures variable types at function exit, its loader also records constant-valued initializations. This technique enables correct typing of optionally typed variables, such as prev_type in this excerpt from the black source code.
  • Figure 3: Annotation from shape pattern:As it does for type patterns, when generating jaxtyping-style annotations, RightTyper identifies recurring patterns in the observed array shapes across traces (top section) and replaces them with variables (bottom section), in a manner inspired by Hindley–Milner–style type generalization. To the best of our knowledge, no other existing typing tools support inferring and annotating array dimensions in this way.
  • Figure 4: Naı̈ve typing with unions:This example illustrates a function with interdependent argument and return types. The union of observed types (bottom), emitted by MonkeyType, is overly permissive and incorrectly allows mixed-type inputs that result in a type error.
  • Figure 5: RightTyper recognizes type patterns:recognizing that add (Figure \ref{['fig:a-plus-b']}) is consistently called with either strings or numbers, RightTyper annotates the function using either a type argument (if supported by the target Python version; top portion) or a generated type variable (rt_T1; bottom portion).
  • ...and 9 more figures