MetaStackVis: Visually-Assisted Performance Evaluation of Metamodels
Ilya Ploshchik, Angelos Chatzimparmpas, Andreas Kerren
TL;DR
The paper tackles the challenge of selecting effective metamodels in stacking ensembles by introducing MetaStackVis, an interactive visualization tool that extends StackGenVis to evaluate singular and paired metamodels using predictive probabilities and multiple validation metrics. It leverages HDBSCAN-based clustering to group base models, and offers three coordinated views (stacked bar chart, UMAP, zone-based matrix) to compare performance and identify misclassified instances, with a real healthcare dataset demonstration. Contributions include system design, multi-view visualization for metamodel assessment, and qualitative expert feedback from healthcare and visualization specialists, indicating potential for a third-layer stacking and outlining usability and scalability considerations. Overall, MetaStackVis provides practitioners with a concrete, visual workflow to design, compare, and potentially improve stacking ensembles in applied domains like medicine.
Abstract
Stacking (or stacked generalization) is an ensemble learning method with one main distinctiveness from the rest: even though several base models are trained on the original data set, their predictions are further used as input data for one or more metamodels arranged in at least one extra layer. Composing a stack of models can produce high-performance outcomes, but it usually involves a trial-and-error process. Therefore, our previously developed visual analytics system, StackGenVis, was mainly designed to assist users in choosing a set of top-performing and diverse models by measuring their predictive performance. However, it only employs a single logistic regression metamodel. In this paper, we investigate the impact of alternative metamodels on the performance of stacking ensembles using a novel visualization tool, called MetaStackVis. Our interactive tool helps users to visually explore different singular and pairs of metamodels according to their predictive probabilities and multiple validation metrics, as well as their ability to predict specific problematic data instances. MetaStackVis was evaluated with a usage scenario based on a medical data set and via expert interviews.
