Table of Contents
Fetching ...

ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition

Sanjoy Kundu, Shanmukha Vellamcheti, Sathyanarayanan N. Aakur

TL;DR

ProbRes tackles open-world egocentric activity recognition by integrating structured commonsense priors with likelihood-based reasoning in a probabilistic residual search. The method builds a priors-informed search space via ConceptNet, structures VLM embeddings for semantically coherent jumps, and applies a three-phase process (exploration, exploitation, residual refinement) with component-wise re-ranking. Across L1–L3 settings on four benchmarks, ProbRes achieves state-of-the-art or competitive accuracy while drastically reducing Vision-Language Model queries, illustrating the value of structured priors for scalable open-world reasoning. The work also provides a principled taxonomy for openness in egocentric recognition and highlights directions for improving semantic structuring and search efficiency in real-world deployments.

Abstract

Open-world egocentric activity recognition poses a fundamental challenge due to its unconstrained nature, requiring models to infer unseen activities from an expansive, partially observed search space. We introduce ProbRes, a Probabilistic Residual search framework based on jump-diffusion that efficiently navigates this space by balancing prior-guided exploration with likelihood-driven exploitation. Our approach integrates structured commonsense priors to construct a semantically coherent search space, adaptively refines predictions using Vision-Language Models (VLMs) and employs a stochastic search mechanism to locate high-likelihood activity labels while minimizing exhaustive enumeration efficiently. We systematically evaluate ProbRes across multiple openness levels (L0-L3), demonstrating its adaptability to increasing search space complexity. In addition to achieving state-of-the-art performance on benchmark datasets (GTEA Gaze, GTEA Gaze+, EPIC-Kitchens, and Charades-Ego), we establish a clear taxonomy for open-world recognition, delineating the challenges and methodological advancements necessary for egocentric activity understanding. Our results highlight the importance of structured search strategies, paving the way for scalable and efficient open-world activity recognition.

ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition

TL;DR

ProbRes tackles open-world egocentric activity recognition by integrating structured commonsense priors with likelihood-based reasoning in a probabilistic residual search. The method builds a priors-informed search space via ConceptNet, structures VLM embeddings for semantically coherent jumps, and applies a three-phase process (exploration, exploitation, residual refinement) with component-wise re-ranking. Across L1–L3 settings on four benchmarks, ProbRes achieves state-of-the-art or competitive accuracy while drastically reducing Vision-Language Model queries, illustrating the value of structured priors for scalable open-world reasoning. The work also provides a principled taxonomy for openness in egocentric recognition and highlights directions for improving semantic structuring and search efficiency in real-world deployments.

Abstract

Open-world egocentric activity recognition poses a fundamental challenge due to its unconstrained nature, requiring models to infer unseen activities from an expansive, partially observed search space. We introduce ProbRes, a Probabilistic Residual search framework based on jump-diffusion that efficiently navigates this space by balancing prior-guided exploration with likelihood-driven exploitation. Our approach integrates structured commonsense priors to construct a semantically coherent search space, adaptively refines predictions using Vision-Language Models (VLMs) and employs a stochastic search mechanism to locate high-likelihood activity labels while minimizing exhaustive enumeration efficiently. We systematically evaluate ProbRes across multiple openness levels (L0-L3), demonstrating its adaptability to increasing search space complexity. In addition to achieving state-of-the-art performance on benchmark datasets (GTEA Gaze, GTEA Gaze+, EPIC-Kitchens, and Charades-Ego), we establish a clear taxonomy for open-world recognition, delineating the challenges and methodological advancements necessary for egocentric activity understanding. Our results highlight the importance of structured search strategies, paving the way for scalable and efficient open-world activity recognition.

Paper Structure

This paper contains 13 sections, 5 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Taxonomy of Openness in Egocentric Activity Recognition. We define four levels of openness based on search space constraints: L0 (fixed activity set), L1 (known atomic concepts, unknown compositions), L2 (known domain, inferred activities), and L3 (fully unconstrained search).
  • Figure 2: (a) ProbRes framework for open-world egocentric activity recognition. The search space is structured using ConceptNet priors, enabling guided exploration. The model iteratively refines candidates via likelihood estimation, balancing exploration and exploitation, followed by local refinement and re-ranking. (b) Search trajectory visualization, showing how ProbRes navigates likelihood regions to reach high-confidence predictions near the ground truth.
  • Figure 3: Ablation Studies that illustrate the impact of (a) the number of search iterations, (b) exploration-exploitation tradeoff and local refinement, and (c) knowledge-based priors and concept decomposition-based re-ranking, on the final performance.
  • Figure 4: Qualitative Visualization of the search trajectory by ProbRes across different phases indicating exploration, exploitation, and the final refinement phase.