Table of Contents
Fetching ...

Recent Developments in Deep Learning-based Author Name Disambiguation

Francesca Cappelli, Giovanni Colavizza, Silvio Peroni

TL;DR

This paper surveys deep learning-based author name disambiguation (AND) methods developed from 2016 to 2024, highlighting how DL models integrate structured metadata with unstructured text to improve both author assignment and grouping. It categorizes approaches into supervised, unsupervised, and hybrid systems, and provides a cross-dataset comparison primarily using AMiner-derived benchmarks; results show that hybrid methods often achieve state-of-the-art performance. The study also discusses data-related challenges, such as limited annotated resources, dataset bias toward oriental names, and the need for diverse benchmarks to assess generalizability. Overall, the work underscores the significant impact of DL on AND while calling for standardized benchmarks and richer, multilingual datasets to support robust, scalable disambiguation in digital libraries.

Abstract

Author Name Disambiguation (AND) is a critical task for digital libraries aiming to link existing authors with their respective publications. Due to the lack of persistent identifiers used by researchers and the presence of intrinsic linguistic challenges, such as homonymy, the development of Deep Learning algorithms to address this issue has become widespread. Many AND deep learning methods have been developed, and surveys exist comparing the approaches in terms of techniques, complexity, performance. However, none explicitly addresses AND methods in the context of deep learning in the latest years (i.e. timeframe 2016-2024). In this paper, we provide a systematic review of state-of-the-art AND techniques based on deep learning, highlighting recent improvements, challenges, and open issues in the field. We find that DL methods have significantly impacted AND by enabling the integration of structured and unstructured data, and hybrid approaches effectively balance supervised and unsupervised learning.

Recent Developments in Deep Learning-based Author Name Disambiguation

TL;DR

This paper surveys deep learning-based author name disambiguation (AND) methods developed from 2016 to 2024, highlighting how DL models integrate structured metadata with unstructured text to improve both author assignment and grouping. It categorizes approaches into supervised, unsupervised, and hybrid systems, and provides a cross-dataset comparison primarily using AMiner-derived benchmarks; results show that hybrid methods often achieve state-of-the-art performance. The study also discusses data-related challenges, such as limited annotated resources, dataset bias toward oriental names, and the need for diverse benchmarks to assess generalizability. Overall, the work underscores the significant impact of DL on AND while calling for standardized benchmarks and richer, multilingual datasets to support robust, scalable disambiguation in digital libraries.

Abstract

Author Name Disambiguation (AND) is a critical task for digital libraries aiming to link existing authors with their respective publications. Due to the lack of persistent identifiers used by researchers and the presence of intrinsic linguistic challenges, such as homonymy, the development of Deep Learning algorithms to address this issue has become widespread. Many AND deep learning methods have been developed, and surveys exist comparing the approaches in terms of techniques, complexity, performance. However, none explicitly addresses AND methods in the context of deep learning in the latest years (i.e. timeframe 2016-2024). In this paper, we provide a systematic review of state-of-the-art AND techniques based on deep learning, highlighting recent improvements, challenges, and open issues in the field. We find that DL methods have significantly impacted AND by enabling the integration of structured and unstructured data, and hybrid approaches effectively balance supervised and unsupervised learning.

Paper Structure

This paper contains 13 sections, 1 table.