Table of Contents
Fetching ...

Compositional learning of functions in humans and machines

Yanli Zhou, Brenden M. Lake, Adina Williams

TL;DR

These findings indicate that humans can make zero-shot generalizations on novel visual function compositions across interaction conditions, demonstrating sensitivity to contextual changes.

Abstract

The ability to learn and compose functions is foundational to efficient learning and reasoning in humans, enabling flexible generalizations such as creating new dishes from known cooking processes. Beyond sequential chaining of functions, existing linguistics literature indicates that humans can grasp more complex compositions with interacting functions, where output production depends on context changes induced by different function orderings. Extending the investigation into the visual domain, we developed a function learning paradigm to explore the capacity of humans and neural network models in learning and reasoning with compositional functions under varied interaction conditions. Following brief training on individual functions, human participants were assessed on composing two learned functions, in ways covering four main interaction types, including instances in which the application of the first function creates or removes the context for applying the second function. Our findings indicate that humans can make zero-shot generalizations on novel visual function compositions across interaction conditions, demonstrating sensitivity to contextual changes. A comparison with a neural network model on the same task reveals that, through the meta-learning for compositionality (MLC) approach, a standard sequence-to-sequence Transformer can mimic human generalization patterns in composing functions.

Compositional learning of functions in humans and machines

TL;DR

These findings indicate that humans can make zero-shot generalizations on novel visual function compositions across interaction conditions, demonstrating sensitivity to contextual changes.

Abstract

The ability to learn and compose functions is foundational to efficient learning and reasoning in humans, enabling flexible generalizations such as creating new dishes from known cooking processes. Beyond sequential chaining of functions, existing linguistics literature indicates that humans can grasp more complex compositions with interacting functions, where output production depends on context changes induced by different function orderings. Extending the investigation into the visual domain, we developed a function learning paradigm to explore the capacity of humans and neural network models in learning and reasoning with compositional functions under varied interaction conditions. Following brief training on individual functions, human participants were assessed on composing two learned functions, in ways covering four main interaction types, including instances in which the application of the first function creates or removes the context for applying the second function. Our findings indicate that humans can make zero-shot generalizations on novel visual function compositions across interaction conditions, demonstrating sensitivity to contextual changes. A comparison with a neural network model on the same task reveals that, through the meta-learning for compositionality (MLC) approach, a standard sequence-to-sequence Transformer can mimic human generalization patterns in composing functions.
Paper Structure (14 sections, 6 figures, 1 algorithm)

This paper contains 14 sections, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: Function interactions influence function compositions. The chopping operation provides context for frying while the puréeing operation renders frying inapplicable.
  • Figure 2: Stimuli and experimental procedure. (A) Car stimuli are structured as trees: a car part like the window is a child node of the car body, and it has two nodes denoting its type and color. All functions are defined as tree edits that add, remove or modify nodes of the car tree. (B) Participants learn 3 functions $A$, $B$ and $C$ satisfying a set of 4 interaction relations. Arrows denote the order of function application (see concrete examples in Fig.\ref{['fig:examples']}). (C) During the training period, participants saw 9 cars moving through each factory unit representing a different function. If a car is a valid input to the function, it will come out of the unit with changes reflecting the underlying function. Cars remain unchanged if they are invalid inputs. (D) Participants were first asked to generate the correct output based on each prompted input car and factory unit. (E) For each of the interaction types, represented by the relevant factory units and the order of their application, participants were asked to generate the correct output car for 8 different input cars.
  • Figure 3: Behavioral results. (A) Generation accuracy by interaction type. Performance did not differ significantly across different conditions. We also did not observe higher performance in F/CBL trials over CF/BL trials (maximum utilization bias), nor F/BL over CF/CBL trials (transparency bias). (B) Proportion of each error type over all incorrect generations by interaction type.
  • Figure 4: Examples of human and model output for each interaction type. Individual functions $A$, $B$ and $C$ are shown on the left. Prompted input cars (top rows) are shown alongside different output generations by humans, the base MLC model, and the fine-tuned MLC. The number in the top-left corner reflects the count (or percentage for model samples) of generation for each car. Correct outputs are marked by $*$; erroneous generations are underlined with colors corresponding to the error types (green: function mismatch; purple: input copying; grey: feature mismatch).
  • Figure 5: Model schematics and training details.
  • ...and 1 more figures