PoTo: A Hybrid Andersen's Points-to Analysis for Python
Ingkarat Rak-amnouykit, Ana Milanova, Guillaume Baudart, Martin Hirzel, Julian Dolby
TL;DR
PoTo introduces an Andersen-style, flow- and context-insensitive points-to analysis tailored for Python, addressing dynamic features and external libraries through a novel hybrid approach that integrates concrete evaluation. It combines a two-phase pipeline—Python source to 3-address code, then 3-address code to a points-to graph—with a client analysis PoTo+ that derives concrete-like types from the points-to graph. Evaluated against Pytype and neural baselines on ten real-world packages, PoTo+ achieves strong coverage and generally matches or exceeds static baselines in accuracy while scaling better than Pytype. The work demonstrates that static points-to analysis augmented with concrete evaluation can effectively support scalable type inference and program understanding for large Python codebases.
Abstract
As Python is increasingly being adopted for large and complex programs, the importance of static analysis for Python (such as type inference) grows. Unfortunately, static analysis for Python remains a challenging task due to its dynamic language features and its abundant external libraries. To help fill this gap, this paper presents PoTo, an Andersen-style context-insensitive and flow-insensitive points-to analysis for Python. PoTo addresses Python-specific challenges and works for large programs via a novel hybrid evaluation, integrating traditional static points-to analysis with concrete evaluation in the Python interpreter for external library calls. Next, this paper presents PoTo+, a static type inference for Python built on the points-to analysis. We evaluate PoTo+ and compare it to two state-of-the-art Python type inference techniques: (1) the static rule-based Pytype and (2) the deep-learning based DLInfer. Our results show that PoTo+ outperforms both Pytype and DLInfer on existing Python packages.
