Table of Contents
Fetching ...

Leverage Task Context for Object Affordance Ranking

Haojie Huang, Hongchen Luo, Wei Zhai, Yang Cao, Zheng-Jun Zha

TL;DR

A novel Context-embed Group Ranking Framework with task relation mining module and graph group update module to deeply integrate task context and perform global relative relationship transmission and demonstrates the feasibility of the task context based affordance learning paradigm and the superiority of the model over state-of-the-art models in the fields of saliency ranking and multimodal object detection.

Abstract

Intelligent agents accomplish different tasks by utilizing various objects based on their affordance, but how to select appropriate objects according to task context is not well-explored. Current studies treat objects within the affordance category as equivalent, ignoring that object affordances vary in priority with different task contexts, hindering accurate decision-making in complex environments. To enable agents to develop a deeper understanding of the objects required to perform tasks, we propose to leverage task context for object affordance ranking, i.e., given image of a complex scene and the textual description of the affordance and task context, revealing task-object relationships and clarifying the priority rank of detected objects. To this end, we propose a novel Context-embed Group Ranking Framework with task relation mining module and graph group update module to deeply integrate task context and perform global relative relationship transmission. Due to the lack of such data, we construct the first large-scale task-oriented affordance ranking dataset with 25 common tasks, over 50k images and more than 661k objects. Experimental results demonstrate the feasibility of the task context based affordance learning paradigm and the superiority of our model over state-of-the-art models in the fields of saliency ranking and multimodal object detection. The source code and dataset will be made available to the public.

Leverage Task Context for Object Affordance Ranking

TL;DR

A novel Context-embed Group Ranking Framework with task relation mining module and graph group update module to deeply integrate task context and perform global relative relationship transmission and demonstrates the feasibility of the task context based affordance learning paradigm and the superiority of the model over state-of-the-art models in the fields of saliency ranking and multimodal object detection.

Abstract

Intelligent agents accomplish different tasks by utilizing various objects based on their affordance, but how to select appropriate objects according to task context is not well-explored. Current studies treat objects within the affordance category as equivalent, ignoring that object affordances vary in priority with different task contexts, hindering accurate decision-making in complex environments. To enable agents to develop a deeper understanding of the objects required to perform tasks, we propose to leverage task context for object affordance ranking, i.e., given image of a complex scene and the textual description of the affordance and task context, revealing task-object relationships and clarifying the priority rank of detected objects. To this end, we propose a novel Context-embed Group Ranking Framework with task relation mining module and graph group update module to deeply integrate task context and perform global relative relationship transmission. Due to the lack of such data, we construct the first large-scale task-oriented affordance ranking dataset with 25 common tasks, over 50k images and more than 661k objects. Experimental results demonstrate the feasibility of the task context based affordance learning paradigm and the superiority of our model over state-of-the-art models in the fields of saliency ranking and multimodal object detection. The source code and dataset will be made available to the public.

Paper Structure

This paper contains 19 sections, 16 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Task Context represents the subsequent purpose of a task. For the same affordance, different task contexts lead to varying priority of objects.
  • Figure 2: Motivation. (a) This paper considers the impact of different task contexts on object affordances and achieves a more fine-grained understanding of objects through ranking. (b) We deeply fuse task context to group candidate objects and perform context conditioned graph update to obtain the correct relative ranking.
  • Figure 3: Overview of Context-embed Group Ranking (CGR) Model. It first extracts image features $F_I$ and task features $F_a,F_c$ separately, then aligns them through the TRM module (Sec. \ref{['TRM']}) and get gathered object queries $\tilde{O}$ and Group Tokens $\tilde{G}$. Subsequently, the GGU (Sec. \ref{['GGU']}) groups $\tilde{O}$ using $\tilde{G}$ and aggregates group information, then performs global message passing with context features $T_c$ to determine final ranking results.
  • Figure 4: The properties of the TAR dataset. (a) Ranking annotation samples of affordance "contain" and "place" with different task contexts (b) Total affordance verbs and corresponding task contexts (c) Distribution of image and object numbers across different affordance (d) Object average rank across different task contexts within affordance "contain".
  • Figure 5: Ranking results on the TAR dataset. We visualize the ranking results for the nine tasks and select IRSRliu2021instance and QAGNetdeng2024advancing from saliency ranking and OVDetrzang2022open from multimodal object detection for comparison.
  • ...and 6 more figures