Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery

Haonan Lin; Wenbin An; Jiahao Wang; Yan Chen; Feng Tian; Mengmeng Wang; Guang Dai; Qianying Wang; Jingdong Wang

Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery

Haonan Lin, Wenbin An, Jiahao Wang, Yan Chen, Feng Tian, Mengmeng Wang, Guang Dai, Qianying Wang, Jingdong Wang

TL;DR

This work dives into why traditional teacher-student designs falter in open-world generalized category discovery as compared to their success in closed-world semi-supervised learning, and introduces FlipClass, a method that dynamically updates the teacher to align with the student's attention, instead of maintaining a static teacher reference.

Abstract

Recent advancements have shown promise in applying traditional Semi-Supervised Learning strategies to the task of Generalized Category Discovery (GCD). Typically, this involves a teacher-student framework in which the teacher imparts knowledge to the student to classify categories, even in the absence of explicit labels. Nevertheless, GCD presents unique challenges, particularly the absence of priors for new classes, which can lead to the teacher's misguidance and unsynchronized learning with the student, culminating in suboptimal outcomes. In our work, we delve into why traditional teacher-student designs falter in open-world generalized category discovery as compared to their success in closed-world semi-supervised learning. We identify inconsistent pattern learning across attention layers as the crux of this issue and introduce FlipClass, a method that dynamically updates the teacher to align with the student's attention, instead of maintaining a static teacher reference. Our teacher-student attention alignment strategy refines the teacher's focus based on student feedback from an energy perspective, promoting consistent pattern recognition and synchronized learning across old and new classes. Extensive experiments on a spectrum of benchmarks affirm that FlipClass significantly surpasses contemporary GCD methods, establishing new standards for the field.

Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery

TL;DR

Abstract

Paper Structure (45 sections, 6 theorems, 45 equations, 17 figures, 8 tables)

This paper contains 45 sections, 6 theorems, 45 equations, 17 figures, 8 tables.

Introduction
Background
Integrating SSL Techniques into a Consistency Loss Framework
Class Prior Gap between SSL and GCD
How Consistency Loss Goes Awry: Unraveling the Pitfalls
What to Bridge the Class Prior
Inconsistent Patterns Spoil the Whole Barrel
FlipClass: Teacher-Student Attention Alignment
Teacher Attention Update Rule
Representation Learning and Parametric Classification
Experiments
Experimental Settings
Experimental Results
Analysis and Discussion
Conclusion
...and 30 more sections

Key Result

Theorem 4.1

The minimization can be formulated as obtaining a maximum a posteriori probability (MAP) estimate of teacher keys $\mathbf{K}_t$ given a set of observed student queries $\mathbf{Q}_s$: where $p(\mathbf{Q}_s | \mathbf{K}_t)$ and $p(\mathbf{K}_t)$ are modeled by energy functions Eq. eq:energy_qk and eq:energy_k, respectively. We approximate the posterior inference by the gradient of the log posteri

Figures (17)

Figure 1: Left: Learning effects of traditional Teacher-Student Consistency Model (TSCM, e.g., SimGCD wen2023parametric) and our Flipped Classroom Consistency Model (FlipClass) on Stanford Cars krause20133d. Middle: Model comparison between TSCM and our FlipClass, where $\mathbb{D}^\text{new}$ refers to data belonging to new classes. Right: Illustration of the inner feedback mechanism in FlipClass, where teacher attention is adapted to the student, leading to the alignment of attention.
Figure 2: Exploring prior gaps between SSL and GCD on SCars and CUB datasets. Left: Accuracy of sorted pseudo labels for old and new classes. Middle: Consistency loss trends over epochs, illustrating challenges in optimization and slower convergence for new classes. Right: Categorize errors wen2023parametric, where "True Old" refers to predicting an 'Old' class sample to another 'Old' class, while 'False Old" indicates predicting an 'Old' class sample as some 'New' class.
Figure 3: Left: Attention heatmaps for teacher and student across attention layers. Right: Energy trend over epochs, with lower energy indicating less discrepancy in pattern recognition between teacher and student.
Figure 4: Framework of FlipClass demonstrating teacher-student interaction, where teacher's and student's attention is aligned by teacher's updating (Eq. \ref{['eq:teacher_update']}). Then $\mathcal{L}_\text{rep}$ and $\mathcal{L}_\text{cons}$ are combined for optimization.
Figure 5: Ablation study results for FlipClass, indicate the critical role of strong augmentations, attention alignment, and regularization in model performance across multiple datasets.
...and 12 more figures

Theorems & Definitions (12)

Theorem 4.1
Theorem A.1: Global Convergence (Zangwill): Energy
proof
Lemma A.2
proof
Lemma A.3
proof
Lemma A.4
proof
Lemma A.5
...and 2 more

Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery

TL;DR

Abstract

Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (12)