Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering

Haopeng Li; Mohammed Bennamoun; Jun Liu; Hossein Rahmani; Qiuhong Ke

Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering

Haopeng Li, Mohammed Bennamoun, Jun Liu, Hossein Rahmani, Qiuhong Ke

TL;DR

This work tackles VideoQA generalization by introducing uncertainty-aware curriculum learning (UCL) to progressively train on data of increasing difficulty, measured by data and predictive uncertainty rather than loss alone. It casts VideoQA as a stochastic computation graph, enabling probabilistic modeling of visual representations and deriving two uncertainty types, feature uncertainty $U_F$ and predictive uncertainty $U_P$, to guide training. The approach integrates with MASN and uses sampling-based variational inference to obtain uncertainty-aware predictions, achieving state-of-the-art results on TGIF-QA and NExT-QA, while providing meaningful uncertainty quantification and robustness analyses. The framework demonstrates improved generalization across multiple VideoQA models and datasets, with comprehensive ablations and hyper-parameter studies validating the benefits of probabilistic modeling and uncertainty-guided curriculum scheduling.

Abstract

While significant advancements have been made in video question answering (VideoQA), the potential benefits of enhancing model generalization through tailored difficulty scheduling have been largely overlooked in existing research. This paper seeks to bridge that gap by incorporating VideoQA into a curriculum learning (CL) framework that progressively trains models from simpler to more complex data. Recognizing that conventional self-paced CL methods rely on training loss for difficulty measurement, which might not accurately reflect the intricacies of video-question pairs, we introduce the concept of uncertainty-aware CL. Here, uncertainty serves as the guiding principle for dynamically adjusting the difficulty. Furthermore, we address the challenge posed by uncertainty by presenting a probabilistic modeling approach for VideoQA. Specifically, we conceptualize VideoQA as a stochastic computation graph, where the hidden representations are treated as stochastic variables. This yields two distinct types of uncertainty: one related to the inherent uncertainty in the data and another pertaining to the model's confidence. In practice, we seamlessly integrate the VideoQA model into our framework and conduct comprehensive experiments. The findings affirm that our approach not only achieves enhanced performance but also effectively quantifies uncertainty in the context of VideoQA.

Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering

TL;DR

and predictive uncertainty

, to guide training. The approach integrates with MASN and uses sampling-based variational inference to obtain uncertainty-aware predictions, achieving state-of-the-art results on TGIF-QA and NExT-QA, while providing meaningful uncertainty quantification and robustness analyses. The framework demonstrates improved generalization across multiple VideoQA models and datasets, with comprehensive ablations and hyper-parameter studies validating the benefits of probabilistic modeling and uncertainty-guided curriculum scheduling.

Abstract

Paper Structure (33 sections, 16 equations, 2 figures, 10 tables, 1 algorithm)

This paper contains 33 sections, 16 equations, 2 figures, 10 tables, 1 algorithm.

Introduction
Related Work
Video Question Answering
Curriculum Learning
Uncertainty Modeling
The Proposed Method
Uncertainty-Based Curriculum Learning
Self-Paced Curriculum Learning Revisit
Uncertainty-Based Curriculum Learning
Probabilistic Modeling for VideoQA
Uncertainty-Aware Curriculum Learning for VQA
Experiments
Implementation Details
Comparisons with Existing Methods
Results on TGIF-QA
...and 18 more sections

Figures (2)

Figure 1: The Uncertainty-Accuracy curves for TGIF-Action and TGIF-Transition.
Figure 2: (Up) Examples of high predictive uncertainty. The uncertainty of each option is also provided (normalized to $[0,1]$). (Down) Examples of high uncertainty. The predictions are in RGB]251,229,214orange, while the correct answers are in RGB]226,240,217green.

Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering

TL;DR

Abstract

Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering

Authors

TL;DR

Abstract

Table of Contents

Figures (2)