Tessellated Linear Model for Age Prediction from Voice
Dareen Alharthi, Mahsa Zamani, Bhiksha Raj, Rita Singh
TL;DR
The paper tackles precise age estimation from voice under limited labeled data by introducing the Tessellated Linear Model (TLM), a piecewise-linear approach that partitions the feature space into convex regions and fits a local linear predictor per region. By optimizing both the tessellation and the region-wise models through a hierarchical binary-partition strategy and optionally fine-tuning a deep feature extractor, TLM achieves strong regression performance while maintaining interpretability. On the TIMIT dataset, TLM with feature optimization attains a state-of-the-art MAE of 3.97, with hard and soft routing also outperforming several deep-learning baselines, highlighting the method’s data efficiency and modeling flexibility. The work demonstrates that combining convex tessellations with local linear models can yield substantial gains in structured regression tasks and may generalize to other domains beyond voice age estimation.
Abstract
Voice biometric tasks, such as age estimation require modeling the often complex relationship between voice features and the biometric variable. While deep learning models can handle such complexity, they typically require large amounts of accurately labeled data to perform well. Such data are often scarce for biometric tasks such as voice-based age prediction. On the other hand, simpler models like linear regression can work with smaller datasets but often fail to generalize to the underlying non-linear patterns present in the data. In this paper we propose the Tessellated Linear Model (TLM), a piecewise linear approach that combines the simplicity of linear models with the capacity of non-linear functions. TLM tessellates the feature space into convex regions and fits a linear model within each region. We optimize the tessellation and the linear models using a hierarchical greedy partitioning. We evaluated TLM on the TIMIT dataset on the task of age prediction from voice, where it outperformed state-of-the-art deep learning models.
