TRESTLE: A Model of Concept Formation in Structured Domains
Christopher J. MacLellan, Erik Harpstead, Vincent Aleven, Kenneth R. Koedinger
TL;DR
TRESTLE presents an incremental probabilistic model of concept formation for structured domains, unifying prior approaches by learning a hierarchical categorization tree from mixed data types (nominal, numeric, relational, and component) through partial matching, flattening, and COBWEB-based categorization. It supports supervised prediction and unsupervised clustering, aligning more closely with human behavior than a nonincremental baseline (CFE) despite often lower asymptotic predictive accuracy, due to its emphasis on incremental learning and generalization across attributes. The model demonstrates competitive performance on the RumbleBlocks domain and yields human-like cluster structures, illustrating its potential as a computational account of human categorization and as a tool for educational design and analysis. The work highlights the value of integrating incremental representation learning with probabilistic categorization to capture structure-aware learning in realistic, noisy environments.
Abstract
The literature on concept formation has demonstrated that humans are capable of learning concepts incrementally, with a variety of attribute types, and in both supervised and unsupervised settings. Many models of concept formation focus on a subset of these characteristics, but none account for all of them. In this paper, we present TRESTLE, an incremental account of probabilistic concept formation in structured domains that unifies prior concept learning models. TRESTLE works by creating a hierarchical categorization tree that can be used to predict missing attribute values and cluster sets of examples into conceptually meaningful groups. It updates its knowledge by partially matching novel structures and sorting them into its categorization tree. Finally, the system supports mixed-data representations, including nominal, numeric, relational, and component attributes. We evaluate TRESTLE's performance on a supervised learning task and an unsupervised clustering task. For both tasks, we compare it to a nonincremental model and to human participants. We find that this new categorization model is competitive with the nonincremental approach and more closely approximates human behavior on both tasks. These results serve as an initial demonstration of TRESTLE's capabilities and show that, by taking key characteristics of human learning into account, it can better model behavior than approaches that ignore them.
