BOND: Bootstrapping From-Scratch Name Disambiguation with Multi-task Promoting
Yuqing Cheng, Bo Chen, Fanjin Zhang, Jie Tang
TL;DR
BOND tackles the from-scratch name disambiguation problem by jointly learning local paper similarities and global clustering in an end-to-end framework. It constructs a multi-relational graph per name, uses a graph attention encoder–decoder to reconstruct local edges, and incorporates DBSCAN-derived pseudo-labels to steer cluster-aware learning, with these signals reciprocally reinforcing each other. The approach yields state-of-the-art results on WhoIsWho-v3, and its enhanced BOND+ variant with ensemble and post-match achieves top performance on the WhoIsWho leaderboard. The work demonstrates the value of end-to-end multi-task promoting for disambiguation tasks and provides insights into multi-relational graph design, clustering integration, and practical robustness considerations.
Abstract
From-scratch name disambiguation is an essential task for establishing a reliable foundation for academic platforms. It involves partitioning documents authored by identically named individuals into groups representing distinct real-life experts. Canonically, the process is divided into two decoupled tasks: locally estimating the pairwise similarities between documents followed by globally grouping these documents into appropriate clusters. However, such a decoupled approach often inhibits optimal information exchange between these intertwined tasks. Therefore, we present BOND, which bootstraps the local and global informative signals to promote each other in an end-to-end regime. Specifically, BOND harnesses local pairwise similarities to drive global clustering, subsequently generating pseudo-clustering labels. These global signals further refine local pairwise characterizations. The experimental results establish BOND's superiority, outperforming other advanced baselines by a substantial margin. Moreover, an enhanced version, BOND+, incorporating ensemble and post-match techniques, rivals the top methods in the WhoIsWho competition.
