Table of Contents
Fetching ...

TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference

Chong Wang, Jian Zhang, Yiling Lou, Mingwei Liu, Weisong Sun, Yang Liu, Xin Peng

TL;DR

TIGER addresses the practical challenge of inferring Python types in dynamically typed code by introducing a two-stage generating-then-ranking framework. It combines a fine-tuned generation model (via span masking) to propose diverse candidate types with a similarity model (via contrastive learning) to rank these candidates alongside visible user-defined types, enabling effective handling of parameterized and unseen types. Evaluated on ManyTypes4Py, TIGER achieves state-of-the-art accuracy, especially for unseen user-defined types, while maintaining efficiency suitable for large-scale inference. The results demonstrate the value of integrating generation and contextual similarity to cover a wide spectrum of types beyond fixed vocabularies, with strong practical implications for automated type hinting in real-world Python projects.

Abstract

Python's dynamic typing system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex generic types and (unseen) user-defined types. In this paper, we introduce TIGER, a two-stage generating-then-ranking (GTR) framework, designed to effectively handle Python's diverse type categories. TIGER leverages fine-tuned pre-trained code models to train a generative model with a span masking objective and a similarity model with a contrastive training objective. This approach allows TIGER to generate a wide range of type candidates, including complex generics in the generating stage, and accurately rank them with user-defined types in the ranking stage. Our evaluation on the ManyTypes4Py dataset shows TIGER's advantage over existing methods in various type categories, notably improving accuracy in inferring user-defined and unseen types by 11.2% and 20.1% respectively in Top-5 Exact Match. Moreover, the experimental results not only demonstrate TIGER's superior performance and efficiency, but also underscore the significance of its generating and ranking stages in enhancing automated type inference.

TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference

TL;DR

TIGER addresses the practical challenge of inferring Python types in dynamically typed code by introducing a two-stage generating-then-ranking framework. It combines a fine-tuned generation model (via span masking) to propose diverse candidate types with a similarity model (via contrastive learning) to rank these candidates alongside visible user-defined types, enabling effective handling of parameterized and unseen types. Evaluated on ManyTypes4Py, TIGER achieves state-of-the-art accuracy, especially for unseen user-defined types, while maintaining efficiency suitable for large-scale inference. The results demonstrate the value of integrating generation and contextual similarity to cover a wide spectrum of types beyond fixed vocabularies, with strong practical implications for automated type hinting in real-world Python projects.

Abstract

Python's dynamic typing system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex generic types and (unseen) user-defined types. In this paper, we introduce TIGER, a two-stage generating-then-ranking (GTR) framework, designed to effectively handle Python's diverse type categories. TIGER leverages fine-tuned pre-trained code models to train a generative model with a span masking objective and a similarity model with a contrastive training objective. This approach allows TIGER to generate a wide range of type candidates, including complex generics in the generating stage, and accurately rank them with user-defined types in the ranking stage. Our evaluation on the ManyTypes4Py dataset shows TIGER's advantage over existing methods in various type categories, notably improving accuracy in inferring user-defined and unseen types by 11.2% and 20.1% respectively in Top-5 Exact Match. Moreover, the experimental results not only demonstrate TIGER's superior performance and efficiency, but also underscore the significance of its generating and ranking stages in enhancing automated type inference.
Paper Structure (38 sections, 1 equation, 6 figures, 4 tables)

This paper contains 38 sections, 1 equation, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The Generating-Then-Ranking Framework
  • Figure 2: Type Placeholders for Three Categories of Variables
  • Figure 3: Overview of TIGER
  • Figure 4: Type Annotation Masking with Type Placeholder
  • Figure 5: Average inference time of CodeT5-ft, TypeGen, and TIGER.
  • ...and 1 more figures