Phonetic and Lexical Discovery of a Canine Language using HuBERT
Xingyuan Li, Sinong Wang, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu
TL;DR
This work investigates whether canine vocalizations encode phoneme-like units and a basic vocabulary, challenging the notion that dog sounds form a human-language-like system. It introduces a self-supervised HuBERT-based pipeline that processes dog audio from raw recordings through AudioSep cleaning, sentence extraction, phoneme recognition, phoneme combination, and word discovery via an n-gram, scoring, and filtering framework. A key contribution is a non-redundant dog phoneme vocabulary shared across multiple dogs and a popularity-score-based lexical discovery method, implemented in a web-based labeling system to visualize and test the vocabulary. The results reveal acoustic consistency of identified phoneme ngrams across dogs and establish groundwork for future interpretation of dog communication and cross-dog meaning mapping.
Abstract
This paper delves into the pioneering exploration of potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers, which heavily relies on human priori knowledge on limited datasets to find sound units in dog vocalization. We present a self-supervised approach with HuBERT, enabling the accurate classification of phoneme labels and the identification of vocal patterns that suggest a rudimentary vocabulary within dog vocalizations. Our findings indicate a significant acoustic consistency in these identified canine vocabulary, covering the entirety of observed dog vocalization sequences. We further develop a web-based dog vocalization labeling system. This system can highlight phoneme n-grams, present in the vocabulary, in the dog audio uploaded by users.
