Table of Contents
Fetching ...

When Redundancy Matters: Machine Teaching of Representations

Cèsar Ferri, Dario Garigliotti, Brigt Arve Toppe Håvardstun, Josè Hernández-Orallo, Jan Arne Telle

TL;DR

The paper tackles redundancy in representational languages and its impact on machine teaching by extending the traditional concept-teaching framework to teach representations. It introduces three protocols—Eager, Greedy, and Optimal—analyzes them formally via an ordered consistency graph, and evaluates them experimentally on DNFs and a Turing-complete P3 language. The key findings show that Greedy, which teaches all representations, often covers more concepts and can use smaller witnesses than Eager, with performance strongly influenced by Redundancy Spread; Optimal-1 and Optimal-2 provide theoretical bounds and best-case coverage, though they are not always feasible. These results illuminate the role of language bias and representation redundancy in inductive search and provide a principled basis for selecting teaching strategies in practical settings where multiple representations map to the same concept.

Abstract

In traditional machine teaching, a teacher wants to teach a concept to a learner, by means of a finite set of examples, the witness set. But concepts can have many equivalent representations. This redundancy strongly affects the search space, to the extent that teacher and learner may not be able to easily determine the equivalence class of each representation. In this common situation, instead of teaching concepts, we explore the idea of teaching representations. We work with several teaching schemas that exploit representation and witness size (Eager, Greedy and Optimal) and analyze the gains in teaching effectiveness for some representational languages (DNF expressions and Turing-complete P3 programs). Our theoretical and experimental results indicate that there are various types of redundancy, handled better by the Greedy schema introduced here than by the Eager schema, although both can be arbitrarily far away from the Optimal. For P3 programs we found that witness sets are usually smaller than the programs they identify, which is an illuminating justification of why machine teaching from examples makes sense at all.

When Redundancy Matters: Machine Teaching of Representations

TL;DR

The paper tackles redundancy in representational languages and its impact on machine teaching by extending the traditional concept-teaching framework to teach representations. It introduces three protocols—Eager, Greedy, and Optimal—analyzes them formally via an ordered consistency graph, and evaluates them experimentally on DNFs and a Turing-complete P3 language. The key findings show that Greedy, which teaches all representations, often covers more concepts and can use smaller witnesses than Eager, with performance strongly influenced by Redundancy Spread; Optimal-1 and Optimal-2 provide theoretical bounds and best-case coverage, though they are not always feasible. These results illuminate the role of language bias and representation redundancy in inductive search and provide a principled basis for selecting teaching strategies in practical settings where multiple representations map to the same concept.

Abstract

In traditional machine teaching, a teacher wants to teach a concept to a learner, by means of a finite set of examples, the witness set. But concepts can have many equivalent representations. This redundancy strongly affects the search space, to the extent that teacher and learner may not be able to easily determine the equivalence class of each representation. In this common situation, instead of teaching concepts, we explore the idea of teaching representations. We work with several teaching schemas that exploit representation and witness size (Eager, Greedy and Optimal) and analyze the gains in teaching effectiveness for some representational languages (DNF expressions and Turing-complete P3 programs). Our theoretical and experimental results indicate that there are various types of redundancy, handled better by the Greedy schema introduced here than by the Eager schema, although both can be arbitrarily far away from the Optimal. For P3 programs we found that witness sets are usually smaller than the programs they identify, which is an illuminating justification of why machine teaching from examples makes sense at all.
Paper Structure (8 sections, 4 theorems, 2 equations, 3 figures, 1 table)

This paper contains 8 sections, 4 theorems, 2 equations, 3 figures, 1 table.

Key Result

Theorem 1

The teacher mappings returned by Greedy following $\prec\mathrel{\mkern-5mu}\mathrel{\cdot}_W$ versus the alternative following $\prec\mathrel{\mkern-5mu}\mathrel{\cdot}_R$ are the same.

Figures (3)

  • Figure 1: Simplest possible ordered consistency graph ($\prec\mathrel{\mkern-5mu}\mathrel{\cdot}_R$ and $\prec\mathrel{\mkern-5mu}\mathrel{\cdot}_W$ given by the indices) showing that Eager, Greedy and Optimal-1 can require different maximum teaching sizes (witnesses of size 6, 5 and 4 respectively), assuming witness $w_i$ has size $i$.
  • Figure 2: Relation between Redundancy Spread, and % Greedy better than Eager. Data from Tables \ref{['table-small-results-overview_graph']} and \ref{['table-small-results-greedy_vs_eager']}. The red line is the best fit linear approximation.
  • Figure 3: Greedy: Program length versus witness size, using Elias coding (DBLP:journals/tit/Elias75). Circles above the unit diagonal denote witness smaller than program, with size of circle = number of programs.

Theorems & Definitions (8)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Theorem 4
  • proof