TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference
Chong Wang, Jian Zhang, Yiling Lou, Mingwei Liu, Weisong Sun, Yang Liu, Xin Peng
TL;DR
TIGER addresses the practical challenge of inferring Python types in dynamically typed code by introducing a two-stage generating-then-ranking framework. It combines a fine-tuned generation model (via span masking) to propose diverse candidate types with a similarity model (via contrastive learning) to rank these candidates alongside visible user-defined types, enabling effective handling of parameterized and unseen types. Evaluated on ManyTypes4Py, TIGER achieves state-of-the-art accuracy, especially for unseen user-defined types, while maintaining efficiency suitable for large-scale inference. The results demonstrate the value of integrating generation and contextual similarity to cover a wide spectrum of types beyond fixed vocabularies, with strong practical implications for automated type hinting in real-world Python projects.
Abstract
Python's dynamic typing system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex generic types and (unseen) user-defined types. In this paper, we introduce TIGER, a two-stage generating-then-ranking (GTR) framework, designed to effectively handle Python's diverse type categories. TIGER leverages fine-tuned pre-trained code models to train a generative model with a span masking objective and a similarity model with a contrastive training objective. This approach allows TIGER to generate a wide range of type candidates, including complex generics in the generating stage, and accurately rank them with user-defined types in the ranking stage. Our evaluation on the ManyTypes4Py dataset shows TIGER's advantage over existing methods in various type categories, notably improving accuracy in inferring user-defined and unseen types by 11.2% and 20.1% respectively in Top-5 Exact Match. Moreover, the experimental results not only demonstrate TIGER's superior performance and efficiency, but also underscore the significance of its generating and ranking stages in enhancing automated type inference.
