Human Action Anticipation: A Survey
Bolin Lai, Sam Toyer, Tushar Nagarajan, Rohit Girdhar, Shengxin Zha, James M. Rehg, Kris Kitani, Kristen Grauman, Ruta Desai, Miao Liu
TL;DR
This survey comprehensively organizes the action-anticipation literature into seven fine-grained tasks, detailing their input/output specifications, evaluation metrics, model biases, and data modalities. It synthesizes a broad spectrum of approaches—from classic probabilistic and RNN-based methods to transformer-based and multimodal architectures—while highlighting pretraining strategies and auxiliary objectives that improve forecast quality. The authors provide a thorough cross-dataset quantitative panorama, comparing methods on eleven benchmarks and outlining key gaps, such as error accumulation, long-horizon modeling, and the potential of foundation-model–driven approaches. The work also maps a path forward for egocentric and exocentric forecasting, advocating for richer multimodal fusion, language-integrated perception, and more nuanced evaluation standards to drive progress in real-world forecasting systems.
Abstract
Predicting future human behavior is an increasingly popular topic in computer vision, driven by the interest in applications such as autonomous vehicles, digital assistants and human-robot interactions. The literature on behavior prediction spans various tasks, including action anticipation, activity forecasting, intent prediction, goal prediction, and so on. Our survey aims to tie together this fragmented literature, covering recent technical innovations as well as the development of new large-scale datasets for model training and evaluation. We also summarize the widely-used metrics for different tasks and provide a comprehensive performance comparison of existing approaches on eleven action anticipation datasets. This survey serves as not only a reference for contemporary methodologies in action anticipation, but also a guideline for future research direction of this evolving landscape.
