FinchGPT: a Transformer based language model for birdsong analysis

Kosei Kobayashi; Kosuke Matsuzaki; Masaya Taniguchi; Keisuke Sakaguchi; Kentaro Inui; Kentaro Abe

FinchGPT: a Transformer based language model for birdsong analysis

Kosei Kobayashi, Kosuke Matsuzaki, Masaya Taniguchi, Keisuke Sakaguchi, Kentaro Inui, Kentaro Abe

TL;DR

The paper investigates whether Bengalese finch songs exhibit long-range dependencies similar to human language and tests Transformer-based language models on a texturized birdsong corpus. FinchGPT, a Transformer model trained from scratch on species-specific syllable sequences, outperforms Markov, RNN, and LSTM baselines in next-syllable prediction and reveals long-range dependencies via attention mechanisms. Reverse engineering through attention-span restriction and HVC ablation shows the model relies on non-adjacent dependencies perturbed by brain manipulations, linking artificial processing to neural mechanisms. The findings suggest that large language models can reveal structure in animal vocalizations and provide a framework for comparing computational and neural processing of sequential vocalizations across species.

Abstract

The long-range dependencies among the tokens, which originate from hierarchical structures, are a defining hallmark of human language. However, whether similar dependencies exist within the sequential vocalization of non-human animals remains a topic of investigation. Transformer architectures, known for their ability to model long-range dependencies among tokens, provide a powerful tool for investigating this phenomenon. In this study, we employed the Transformer architecture to analyze the songs of Bengalese finch (Lonchura striata domestica), which are characterized by their highly variable and complex syllable sequences. To this end, we developed FinchGPT, a Transformer-based model trained on a textualized corpus of birdsongs, which outperformed other architecture models in this domain. Attention weight analysis revealed that FinchGPT effectively captures long-range dependencies within syllables sequences. Furthermore, reverse engineering approaches demonstrated the impact of computational and biological manipulations on its performance: restricting FinchGPT's attention span and disrupting birdsong syntax through the ablation of specific brain nuclei markedly influenced the model's outputs. Our study highlights the transformative potential of large language models (LLMs) in deciphering the complexities of animal vocalizations, offering a novel framework for exploring the structural properties of non-human communication systems while shedding light on the computational distinctions between biological brains and artificial neural networks.

FinchGPT: a Transformer based language model for birdsong analysis

TL;DR

Abstract

FinchGPT: a Transformer based language model for birdsong analysis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)