Decision Tree Based Wrappers for Hearing Loss
Miguel Rabuge, Nuno Lourenço
TL;DR
This paper presents FEDORA, an evolutionary feature-engineering wrapper that uses grammar-based feature construction and decision-tree proxies to guide feature selection for hearing-loss detection. By splitting data into training, validation, and test sets and optimizing via $1 -$ Balanced Accuracy, FEDORA consistently reduces feature dimensionality while preserving or improving performance across DT, RF, and XGB proxies, achieving up to $76.2\%$ balanced accuracy with $57$ features and $72.8\%$ with a single feature. Statistical analyses confirm significant advantages over baseline and common methods, with large effect sizes, supporting FEDORA as a viable, interpretable approach to improve audiology screening with fewer features. The study highlights trade-offs between proxy complexity and feature count and points to future work on grammar biasing and explainability to enhance generalization and interpretability in clinical settings.
Abstract
Audiology entities are using Machine Learning (ML) models to guide their screening towards people at risk. Feature Engineering (FE) focuses on optimizing data for ML models, with evolutionary methods being effective in feature selection and construction tasks. This work aims to benchmark an evolutionary FE wrapper, using models based on decision trees as proxies. The FEDORA framework is applied to a Hearing Loss (HL) dataset, being able to reduce data dimensionality and statistically maintain baseline performance. Compared to traditional methods, FEDORA demonstrates superior performance, with a maximum balanced accuracy of 76.2%, using 57 features. The framework also generated an individual that achieved 72.8% balanced accuracy using a single feature.
