Table of Contents
Fetching ...

TRESTLE: A Model of Concept Formation in Structured Domains

Christopher J. MacLellan, Erik Harpstead, Vincent Aleven, Kenneth R. Koedinger

TL;DR

TRESTLE presents an incremental probabilistic model of concept formation for structured domains, unifying prior approaches by learning a hierarchical categorization tree from mixed data types (nominal, numeric, relational, and component) through partial matching, flattening, and COBWEB-based categorization. It supports supervised prediction and unsupervised clustering, aligning more closely with human behavior than a nonincremental baseline (CFE) despite often lower asymptotic predictive accuracy, due to its emphasis on incremental learning and generalization across attributes. The model demonstrates competitive performance on the RumbleBlocks domain and yields human-like cluster structures, illustrating its potential as a computational account of human categorization and as a tool for educational design and analysis. The work highlights the value of integrating incremental representation learning with probabilistic categorization to capture structure-aware learning in realistic, noisy environments.

Abstract

The literature on concept formation has demonstrated that humans are capable of learning concepts incrementally, with a variety of attribute types, and in both supervised and unsupervised settings. Many models of concept formation focus on a subset of these characteristics, but none account for all of them. In this paper, we present TRESTLE, an incremental account of probabilistic concept formation in structured domains that unifies prior concept learning models. TRESTLE works by creating a hierarchical categorization tree that can be used to predict missing attribute values and cluster sets of examples into conceptually meaningful groups. It updates its knowledge by partially matching novel structures and sorting them into its categorization tree. Finally, the system supports mixed-data representations, including nominal, numeric, relational, and component attributes. We evaluate TRESTLE's performance on a supervised learning task and an unsupervised clustering task. For both tasks, we compare it to a nonincremental model and to human participants. We find that this new categorization model is competitive with the nonincremental approach and more closely approximates human behavior on both tasks. These results serve as an initial demonstration of TRESTLE's capabilities and show that, by taking key characteristics of human learning into account, it can better model behavior than approaches that ignore them.

TRESTLE: A Model of Concept Formation in Structured Domains

TL;DR

TRESTLE presents an incremental probabilistic model of concept formation for structured domains, unifying prior approaches by learning a hierarchical categorization tree from mixed data types (nominal, numeric, relational, and component) through partial matching, flattening, and COBWEB-based categorization. It supports supervised prediction and unsupervised clustering, aligning more closely with human behavior than a nonincremental baseline (CFE) despite often lower asymptotic predictive accuracy, due to its emphasis on incremental learning and generalization across attributes. The model demonstrates competitive performance on the RumbleBlocks domain and yields human-like cluster structures, illustrating its potential as a computational account of human categorization and as a tool for educational design and analysis. The work highlights the value of integrating incremental representation learning with probabilistic categorization to capture structure-aware learning in realistic, noisy environments.

Abstract

The literature on concept formation has demonstrated that humans are capable of learning concepts incrementally, with a variety of attribute types, and in both supervised and unsupervised settings. Many models of concept formation focus on a subset of these characteristics, but none account for all of them. In this paper, we present TRESTLE, an incremental account of probabilistic concept formation in structured domains that unifies prior concept learning models. TRESTLE works by creating a hierarchical categorization tree that can be used to predict missing attribute values and cluster sets of examples into conceptually meaningful groups. It updates its knowledge by partially matching novel structures and sorting them into its categorization tree. Finally, the system supports mixed-data representations, including nominal, numeric, relational, and component attributes. We evaluate TRESTLE's performance on a supervised learning task and an unsupervised clustering task. For both tasks, we compare it to a nonincremental model and to human participants. We find that this new categorization model is competitive with the nonincremental approach and more closely approximates human behavior on both tasks. These results serve as an initial demonstration of TRESTLE's capabilities and show that, by taking key characteristics of human learning into account, it can better model behavior than approaches that ignore them.

Paper Structure

This paper contains 14 sections, 1 equation, 6 figures, 1 table.

Figures (6)

  • Figure 1: A screenshot of a player building a tower in RumbleBlocks. The final tower must cover the light blue energy balls and survive a simulated earthquake to be successful.
  • Figure 2: A tower in RumbleBlocks, its representation as an instance in TRESTLE using the four attribute-value types (nominal, numeric, component, and relational), and the representation of a TRESTLE concept that might describe the instance. The concept stores the number of instances categorized as this concept, the probability of each nominal attribute value given their occurrence counts, and the normal density function for each numeric attribute given the mean and standard deviation of their values. The arrows denote the mapping between blocks, instance components, and components in the concept.
  • Figure 3: The four operations used by the COBWEB Fisher:1987tb to incorporate matched instances into its categorization tree. Each shaded node depicts the location of the instance being sorted into the tree before and after an operation. The blue dotted lines represent nodes and links that are being added to the tree and red dashed lines represent nodes and links that are being removed from the tree.
  • Figure 4: A simple example of how TRESTLE's categorization tree is updated in response to two new instances. The original tree (a) is modified in (b) and (c) to incorporate the instances shown at the top. In each case, the path of the instance through the categorization tree is shown in bold. The concepts are depicted as overlapping images of the instances that they contain to represent their probabilistic nature.
  • Figure 5: The sequential prediction task as presented to Amazon Mechanical Turk workers, who were asked to categorize 30 RumbleBlocks solutions into one of two categories: Category 1 (successful) or Category 2 (unsuccessful). They were not given any information about the meaning of the category labels, but they were given correctness feedback after each attempt.
  • ...and 1 more figures