Table of Contents
Fetching ...

What Needs to be Known in Order to Perform a Meaningful Scientific Comparison Between Animal Communications and Human Spoken Language

Roger K. Moore

TL;DR

This paper addresses how to meaningfully compare human speech with animal vocal communications. It introduces a minimal seven-phenomena checklist covering production, transmission, perception, and social context. The framework leverages established models and theories—source-filter modeling, PCA dimensionality estimation, control theory, perceptual mechanisms, and information-theoretic measures—and discusses their applicability to non-human species. Overall, it provides a unified cross-species and multimodal research framework while highlighting the practical challenges of operationalizing these criteria in animal studies.

Abstract

Human spoken language has long been the subject of scientific investigation, particularly with regard to the mechanisms underpinning speech production. Likewise, the study of animal communications has a substantial literature, with many studies focusing on vocalisation. More recently, there has been growing interest in comparing animal communications and human speech. However, it is proposed here that such a comparison necessitates the appraisal of a minimum set of critical phenomena: i) the number of degrees-of-freedom of the vocal apparatus, ii) the ability to control those degrees-of-freedom independently, iii) the properties of the acoustic environment in which communication takes place, iv) the perceptual salience of the generated sounds, v) the degree to which sounds are contrastive, vi) the presence/absence of compositionality, and vii) the information rate(s) of the resulting communications.

What Needs to be Known in Order to Perform a Meaningful Scientific Comparison Between Animal Communications and Human Spoken Language

TL;DR

This paper addresses how to meaningfully compare human speech with animal vocal communications. It introduces a minimal seven-phenomena checklist covering production, transmission, perception, and social context. The framework leverages established models and theories—source-filter modeling, PCA dimensionality estimation, control theory, perceptual mechanisms, and information-theoretic measures—and discusses their applicability to non-human species. Overall, it provides a unified cross-species and multimodal research framework while highlighting the practical challenges of operationalizing these criteria in animal studies.

Abstract

Human spoken language has long been the subject of scientific investigation, particularly with regard to the mechanisms underpinning speech production. Likewise, the study of animal communications has a substantial literature, with many studies focusing on vocalisation. More recently, there has been growing interest in comparing animal communications and human speech. However, it is proposed here that such a comparison necessitates the appraisal of a minimum set of critical phenomena: i) the number of degrees-of-freedom of the vocal apparatus, ii) the ability to control those degrees-of-freedom independently, iii) the properties of the acoustic environment in which communication takes place, iv) the perceptual salience of the generated sounds, v) the degree to which sounds are contrastive, vi) the presence/absence of compositionality, and vii) the information rate(s) of the resulting communications.

Paper Structure

This paper contains 11 sections, 2 figures.

Figures (2)

  • Figure 1: Illustration (using an extension of Maturana & Varela's pictographs Maturana1987Moore2016s) of communication between sender and receiver 'cognitive unities' (human beings or animals) via a conditioning environmental context.
  • Figure 2: An illustration of contrastive behaviour in everyday human conversation. On hearing a verbal enquiry from a family member as to the whereabouts of some mislaid object, the listener might reply with any of the utterances shown (all of which would be perceived as "I do not know") Hawkins2003a. The particular utterance emitted would depend on the communicative context; the shouts would be necessary in a noisy environment, the nasal grunts would be sufficient in a quiet environment.