Automated Cognate Detection as a Supervised Link Prediction Task with Cognate Transformer
V. S. D. S. Mahesh Akavarapu, Arnab Bhattacharya
TL;DR
The paper reframes automated cognate detection as a supervised link-prediction problem and introduces a Cognate Transformer that processes multiple sequence alignments to directly predict pairwise cognacy probabilities. By integrating outer product mean representations and triangle-based transitivity updates, the model achieves strong performance with increasing supervision and offers significant speed advantages over pairwise-pair methods. Across diverse language-family datasets, CogTran2 outperforms traditional LexStat-based and other supervised baselines, particularly when labeled data are available, while also enabling transfer learning to unseen data. Limitations include lag in certain datasets and challenges with partial cognacy and borrowings, pointing to future work in refining subword cognates and extending applications to phylogenetic reconstruction.
Abstract
Identification of cognates across related languages is one of the primary problems in historical linguistics. Automated cognate identification is helpful for several downstream tasks including identifying sound correspondences, proto-language reconstruction, phylogenetic classification, etc. Previous state-of-the-art methods for cognate identification are mostly based on distributions of phonemes computed across multilingual wordlists and make little use of the cognacy labels that define links among cognate clusters. In this paper, we present a transformer-based architecture inspired by computational biology for the task of automated cognate detection. Beyond a certain amount of supervision, this method performs better than the existing methods, and shows steady improvement with further increase in supervision, thereby proving the efficacy of utilizing the labeled information. We also demonstrate that accepting multiple sequence alignments as input and having an end-to-end architecture with link prediction head saves much computation time while simultaneously yielding superior performance.
