Table of Contents
Fetching ...

End-to-End Shared Attention Estimation via Group Detection with Feedback Refinement

Chihiro Nakatani, Norimichi Ukita, Jean-Marc Odobez

Abstract

This paper proposes an end-to-end shared attention estimation method via group detection. Most previous methods estimate shared attention (SA) without detecting the actual group of people focusing on it, or assume that there is a single SA point in a given image. These issues limit the applicability of SA detection in practice and impact performance. To address them, we propose to simultaneously achieve group detection and shared attention estimation using a two step process: (i) the generation of SA heatmaps relying on individual gaze attention heatmaps and group membership scalars estimated in a group inference; (ii) a refinement of the initial group memberships allowing to account for the initial SA heatmaps, and the final prediction of the SA heatmap. Experiments demonstrate that our method outperforms other methods in group detection and shared attention estimation. Additional analyses validate the effectiveness of the proposed components. Code: https://github.com/chihina/sagd-CVPRW2026.

End-to-End Shared Attention Estimation via Group Detection with Feedback Refinement

Abstract

This paper proposes an end-to-end shared attention estimation method via group detection. Most previous methods estimate shared attention (SA) without detecting the actual group of people focusing on it, or assume that there is a single SA point in a given image. These issues limit the applicability of SA detection in practice and impact performance. To address them, we propose to simultaneously achieve group detection and shared attention estimation using a two step process: (i) the generation of SA heatmaps relying on individual gaze attention heatmaps and group membership scalars estimated in a group inference; (ii) a refinement of the initial group memberships allowing to account for the initial SA heatmaps, and the final prediction of the SA heatmap. Experiments demonstrate that our method outperforms other methods in group detection and shared attention estimation. Additional analyses validate the effectiveness of the proposed components. Code: https://github.com/chihina/sagd-CVPRW2026.

Paper Structure

This paper contains 29 sections, 6 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Difference between previous and our shared attention estimation methods. (a) Shared attention estimation using simple integration of the individual attention of all people in the scene as post-processing. (b) Direct shared attention estimation without group detection. (c) Our shared attention estimation via group detection, in which shared attention is estimated by integration of individual attention based on detected groups.
  • Figure 2: Overview of our network. First, the individual attention heatmaps $A$ are estimated for each person. They are then exploited to derive group memberships per each group token $\bm{M}$ and integrated to infer the shared attention heatmap associated with each group token $\bm{S}$. Both the group memberships and shared attention heatmaps are further refined in a second step.
  • Figure 3: Membership-based shared attention heatmap estimation. Individual attentions heatmaps are integrated by weighting them using the group membership obtained in the group detection step.
  • Figure 4: Overview of group detection. The dot product between the $e$-th updated group token and the $n$-th person token is computed as a group membership coefficient (i.e., $\bm{M}_{e,n}$).
  • Figure 5: Overview of refined group detection in which spatial argmax is applied to the initial SA heatmaps $\bm{S}$ for refinement.
  • ...and 4 more figures