Table of Contents
Fetching ...

Tessellated Linear Model for Age Prediction from Voice

Dareen Alharthi, Mahsa Zamani, Bhiksha Raj, Rita Singh

TL;DR

The paper tackles precise age estimation from voice under limited labeled data by introducing the Tessellated Linear Model (TLM), a piecewise-linear approach that partitions the feature space into convex regions and fits a local linear predictor per region. By optimizing both the tessellation and the region-wise models through a hierarchical binary-partition strategy and optionally fine-tuning a deep feature extractor, TLM achieves strong regression performance while maintaining interpretability. On the TIMIT dataset, TLM with feature optimization attains a state-of-the-art MAE of 3.97, with hard and soft routing also outperforming several deep-learning baselines, highlighting the method’s data efficiency and modeling flexibility. The work demonstrates that combining convex tessellations with local linear models can yield substantial gains in structured regression tasks and may generalize to other domains beyond voice age estimation.

Abstract

Voice biometric tasks, such as age estimation require modeling the often complex relationship between voice features and the biometric variable. While deep learning models can handle such complexity, they typically require large amounts of accurately labeled data to perform well. Such data are often scarce for biometric tasks such as voice-based age prediction. On the other hand, simpler models like linear regression can work with smaller datasets but often fail to generalize to the underlying non-linear patterns present in the data. In this paper we propose the Tessellated Linear Model (TLM), a piecewise linear approach that combines the simplicity of linear models with the capacity of non-linear functions. TLM tessellates the feature space into convex regions and fits a linear model within each region. We optimize the tessellation and the linear models using a hierarchical greedy partitioning. We evaluated TLM on the TIMIT dataset on the task of age prediction from voice, where it outperformed state-of-the-art deep learning models.

Tessellated Linear Model for Age Prediction from Voice

TL;DR

The paper tackles precise age estimation from voice under limited labeled data by introducing the Tessellated Linear Model (TLM), a piecewise-linear approach that partitions the feature space into convex regions and fits a local linear predictor per region. By optimizing both the tessellation and the region-wise models through a hierarchical binary-partition strategy and optionally fine-tuning a deep feature extractor, TLM achieves strong regression performance while maintaining interpretability. On the TIMIT dataset, TLM with feature optimization attains a state-of-the-art MAE of 3.97, with hard and soft routing also outperforming several deep-learning baselines, highlighting the method’s data efficiency and modeling flexibility. The work demonstrates that combining convex tessellations with local linear models can yield substantial gains in structured regression tasks and may generalize to other domains beyond voice age estimation.

Abstract

Voice biometric tasks, such as age estimation require modeling the often complex relationship between voice features and the biometric variable. While deep learning models can handle such complexity, they typically require large amounts of accurately labeled data to perform well. Such data are often scarce for biometric tasks such as voice-based age prediction. On the other hand, simpler models like linear regression can work with smaller datasets but often fail to generalize to the underlying non-linear patterns present in the data. In this paper we propose the Tessellated Linear Model (TLM), a piecewise linear approach that combines the simplicity of linear models with the capacity of non-linear functions. TLM tessellates the feature space into convex regions and fits a linear model within each region. We optimize the tessellation and the linear models using a hierarchical greedy partitioning. We evaluated TLM on the TIMIT dataset on the task of age prediction from voice, where it outperformed state-of-the-art deep learning models.
Paper Structure (15 sections, 8 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 8 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: (a) A convex tessellation of the input space. Ideally, both the tessellation and the linear estimator parameters within each cell must be optimized for prediction. (b) Our hierarchical solution. The space is recursively partitioned in a binary manner for locally optimal prediction. Here, the red line shows the first level partition, blue lines show the second level, and green lines show the third level.
  • Figure 2: The tree shows the training and test MAE at each node of the TLM model. It illustrates the stepwise reduction in MAE with distinct thresholds defining the regions. Each parent node represents a binary decision based on a chosen threshold, and the MAE is evaluated only on the data samples routed to the region based on the threshold. It is evident that the error is lower for younger speakers, which corresponds to a higher number of training samples in these regions.
  • Figure 3: The tessellation of the feature space is shown across the decision tree depths. Each region represents an age group, with colors indicating the predicted age. This segmentation shows how the model captures non-linear relationships between age and voice features in a piecewise-linear manner.