Table of Contents
Fetching ...

Phonetic and Lexical Discovery of a Canine Language using HuBERT

Xingyuan Li, Sinong Wang, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu

TL;DR

This work investigates whether canine vocalizations encode phoneme-like units and a basic vocabulary, challenging the notion that dog sounds form a human-language-like system. It introduces a self-supervised HuBERT-based pipeline that processes dog audio from raw recordings through AudioSep cleaning, sentence extraction, phoneme recognition, phoneme combination, and word discovery via an n-gram, scoring, and filtering framework. A key contribution is a non-redundant dog phoneme vocabulary shared across multiple dogs and a popularity-score-based lexical discovery method, implemented in a web-based labeling system to visualize and test the vocabulary. The results reveal acoustic consistency of identified phoneme ngrams across dogs and establish groundwork for future interpretation of dog communication and cross-dog meaning mapping.

Abstract

This paper delves into the pioneering exploration of potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers, which heavily relies on human priori knowledge on limited datasets to find sound units in dog vocalization. We present a self-supervised approach with HuBERT, enabling the accurate classification of phoneme labels and the identification of vocal patterns that suggest a rudimentary vocabulary within dog vocalizations. Our findings indicate a significant acoustic consistency in these identified canine vocabulary, covering the entirety of observed dog vocalization sequences. We further develop a web-based dog vocalization labeling system. This system can highlight phoneme n-grams, present in the vocabulary, in the dog audio uploaded by users.

Phonetic and Lexical Discovery of a Canine Language using HuBERT

TL;DR

This work investigates whether canine vocalizations encode phoneme-like units and a basic vocabulary, challenging the notion that dog sounds form a human-language-like system. It introduces a self-supervised HuBERT-based pipeline that processes dog audio from raw recordings through AudioSep cleaning, sentence extraction, phoneme recognition, phoneme combination, and word discovery via an n-gram, scoring, and filtering framework. A key contribution is a non-redundant dog phoneme vocabulary shared across multiple dogs and a popularity-score-based lexical discovery method, implemented in a web-based labeling system to visualize and test the vocabulary. The results reveal acoustic consistency of identified phoneme ngrams across dogs and establish groundwork for future interpretation of dog communication and cross-dog meaning mapping.

Abstract

This paper delves into the pioneering exploration of potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers, which heavily relies on human priori knowledge on limited datasets to find sound units in dog vocalization. We present a self-supervised approach with HuBERT, enabling the accurate classification of phoneme labels and the identification of vocal patterns that suggest a rudimentary vocabulary within dog vocalizations. Our findings indicate a significant acoustic consistency in these identified canine vocabulary, covering the entirety of observed dog vocalization sequences. We further develop a web-based dog vocalization labeling system. This system can highlight phoneme n-grams, present in the vocabulary, in the dog audio uploaded by users.
Paper Structure (24 sections, 3 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 3 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Six different dog barking sounds from AudioSet gemmeke2017audio
  • Figure 2: Full pipeline from data processing to word discovery.
  • Figure 3: Inertia under different clusters. 50 is a suitable clusters. This is the basis for our choice of 50 clusters in the third K-Means model. The same method was used for the others.
  • Figure 4: 2-D Visualization of 50 phonemes from HuBERT.
  • Figure 5: Segmentation and Phoneme Labelling Result of A Growl Sound.
  • ...and 3 more figures