Table of Contents
Fetching ...

Symbolic Audio Classification via Modal Decision Tree Learning

Enrico Marzano, Giovanni Pagliarini, Riccardo Pasini, Guido Sciavicco, Ionel Eduard Stan

TL;DR

This work investigates interpretable acoustic classification by applying a symbolic learning pipeline built with the SOLE framework. By converting audio into a 77-feature multivariate time series and leveraging both propositional and modal (interval temporal) decision trees, the approach achieves competitive accuracy across 11 tasks (gender, age, emotion, and respiratory disorders) while yielding human-readable rules. Modal reasoning often provides small accuracy gains and can reduce rule counts, supporting transparent decision-making in domains like automated call centers and healthcare. The results demonstrate that symbolic, rule-based models can match sub-symbolic performance with the added benefit of interpretability and potential for integration into conversational systems.

Abstract

The range of potential applications of acoustic analysis is wide. Classification of sounds, in particular, is a typical machine learning task that received a lot of attention in recent years. The most common approaches to sound classification are sub-symbolic, typically based on neural networks, and result in black-box models with high performances but very low transparency. In this work, we consider several audio tasks, namely, age and gender recognition, emotion classification, and respiratory disease diagnosis, and we approach them with a symbolic technique, that is, (modal) decision tree learning. We prove that such tasks can be solved using the same symbolic pipeline, that allows to extract simple rules with very high accuracy and low complexity. In principle, all such tasks could be associated to an autonomous conversation system, which could be useful in different contexts, such as an automatic reservation agent for an hospital or a clinic.

Symbolic Audio Classification via Modal Decision Tree Learning

TL;DR

This work investigates interpretable acoustic classification by applying a symbolic learning pipeline built with the SOLE framework. By converting audio into a 77-feature multivariate time series and leveraging both propositional and modal (interval temporal) decision trees, the approach achieves competitive accuracy across 11 tasks (gender, age, emotion, and respiratory disorders) while yielding human-readable rules. Modal reasoning often provides small accuracy gains and can reduce rule counts, supporting transparent decision-making in domains like automated call centers and healthcare. The results demonstrate that symbolic, rule-based models can match sub-symbolic performance with the added benefit of interpretability and potential for integration into conversational systems.

Abstract

The range of potential applications of acoustic analysis is wide. Classification of sounds, in particular, is a typical machine learning task that received a lot of attention in recent years. The most common approaches to sound classification are sub-symbolic, typically based on neural networks, and result in black-box models with high performances but very low transparency. In this work, we consider several audio tasks, namely, age and gender recognition, emotion classification, and respiratory disease diagnosis, and we approach them with a symbolic technique, that is, (modal) decision tree learning. We prove that such tasks can be solved using the same symbolic pipeline, that allows to extract simple rules with very high accuracy and low complexity. In principle, all such tasks could be associated to an autonomous conversation system, which could be useful in different contexts, such as an automatic reservation agent for an hospital or a clinic.

Paper Structure

This paper contains 6 sections, 3 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Structure of the Sole.jl framework for symbolic AI. Packages in green provide tools for manipulating logical formulas; packages in red provide tools for (symbolic) data processing; those in blue provide tools for learning symbolic models; those in purple provide tools for manipulating (symbolic) models.