Table of Contents
Fetching ...

Heuristic-Free Multi-Teacher Learning

Huy Thong Nguyen, En-Hung Chu, Lenord Melvix, Jazon Jiao, Chunglin Wen, Benjamin Louie

Abstract

We introduce Teacher2Task, a novel framework for multi-teacher learning that eliminates the need for manual aggregation heuristics. Existing multi-teacher methods typically rely on such heuristics to combine predictions from multiple teachers, often resulting in sub-optimal aggregated labels and the propagation of aggregation errors. Teacher2Task addresses these limitations by introducing teacher-specific input tokens and reformulating the training process. Instead of relying on aggregated labels, the framework transforms the training data, consisting of ground truth labels and annotations from N teachers, into N+1 distinct tasks: N auxiliary tasks that predict the labeling styles of the N individual teachers, and one primary task that focuses on the ground truth labels. This approach, drawing upon principles from multiple learning paradigms, demonstrates strong empirical results across a range of architectures, modalities, and tasks.

Heuristic-Free Multi-Teacher Learning

Abstract

We introduce Teacher2Task, a novel framework for multi-teacher learning that eliminates the need for manual aggregation heuristics. Existing multi-teacher methods typically rely on such heuristics to combine predictions from multiple teachers, often resulting in sub-optimal aggregated labels and the propagation of aggregation errors. Teacher2Task addresses these limitations by introducing teacher-specific input tokens and reformulating the training process. Instead of relying on aggregated labels, the framework transforms the training data, consisting of ground truth labels and annotations from N teachers, into N+1 distinct tasks: N auxiliary tasks that predict the labeling styles of the N individual teachers, and one primary task that focuses on the ground truth labels. This approach, drawing upon principles from multiple learning paradigms, demonstrates strong empirical results across a range of architectures, modalities, and tasks.

Paper Structure

This paper contains 20 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: (a) Conventional methods with a heuristic to aggregate multiple predictions, (b) Our proposed Teacher2Task method (c) Examples of our Teacher2Task transformation.
  • Figure 2: Conceptual illustration for our proposed Multi-Teacher Learning. Our algorithm defines N + 1 learning tasks: N auxiliary tasks focused on predicting each teacher's confidence scores, and one primary task focused on learning the ground truth.
  • Figure 3: Examples of extracting Teacher2Task samples from (a) LLMs (b) classification models.
  • Figure 4: Various model architectures that the proposed algorithm supports (a) Encoder-only (b) Dual-Encoders (c ) (Multi-head) Classification.
  • Figure 5: Precision-Recall curves comparison among PaLI, Gemini, and our Multi-Teacher Learning model. At higher precision levels, our model outperforms Gemini due to its access to human annotations. At lower precision levels, it leverages the strengths of both PaLI and Gemini, achieving an outer bound on their respective PR curves.