Table of Contents
Fetching ...

Open-World Amodal Appearance Completion

Jiayang Ao, Yanbei Jiang, Qiuhong Ke, Krista A. Ehinger

TL;DR

Open-World Amodal Appearance Completion is introduced, a training-free framework that expands amodal completion capabilities by accepting flexible text queries as input and generalizes to arbitrary objects specified by both direct terms and abstract queries.

Abstract

Understanding and reconstructing occluded objects is a challenging problem, especially in open-world scenarios where categories and contexts are diverse and unpredictable. Traditional methods, however, are typically restricted to closed sets of object categories, limiting their use in complex, open-world scenes. We introduce Open-World Amodal Appearance Completion, a training-free framework that expands amodal completion capabilities by accepting flexible text queries as input. Our approach generalizes to arbitrary objects specified by both direct terms and abstract queries. We term this capability reasoning amodal completion, where the system reconstructs the full appearance of the queried object based on the provided image and language query. Our framework unifies segmentation, occlusion analysis, and inpainting to handle complex occlusions and generates completed objects as RGBA elements, enabling seamless integration into applications such as 3D reconstruction and image editing. Extensive evaluations demonstrate the effectiveness of our approach in generalizing to novel objects and occlusions, establishing a new benchmark for amodal completion in open-world settings. The code and datasets will be released after paper acceptance.

Open-World Amodal Appearance Completion

TL;DR

Open-World Amodal Appearance Completion is introduced, a training-free framework that expands amodal completion capabilities by accepting flexible text queries as input and generalizes to arbitrary objects specified by both direct terms and abstract queries.

Abstract

Understanding and reconstructing occluded objects is a challenging problem, especially in open-world scenarios where categories and contexts are diverse and unpredictable. Traditional methods, however, are typically restricted to closed sets of object categories, limiting their use in complex, open-world scenes. We introduce Open-World Amodal Appearance Completion, a training-free framework that expands amodal completion capabilities by accepting flexible text queries as input. Our approach generalizes to arbitrary objects specified by both direct terms and abstract queries. We term this capability reasoning amodal completion, where the system reconstructs the full appearance of the queried object based on the provided image and language query. Our framework unifies segmentation, occlusion analysis, and inpainting to handle complex occlusions and generates completed objects as RGBA elements, enabling seamless integration into applications such as 3D reconstruction and image editing. Extensive evaluations demonstrate the effectiveness of our approach in generalizing to novel objects and occlusions, establishing a new benchmark for amodal completion in open-world settings. The code and datasets will be released after paper acceptance.

Paper Structure

This paper contains 18 sections, 8 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: Examples of our open-world amodal completion using both specific (e.g., “polar bear") and abstract (e.g., “What is the mammal in this image") text queries. Our approach supports various applications, including image editing, novel view synthesis and 3D reconstruction.
  • Figure 2: Overview of our framework. Starting with a text query, a VLM generates a visible mask to locate the target object in the input image. The framework then identifies all objects and background segments for occlusion analysis. An auto-generated prompt guides the inpainting model, which iteratively reconstructs the occluded object to produce a transparent RGBA amodal completion output.
  • Figure 3: Distribution of the top 50 most frequent categories in the our evaluation dataset.
  • Figure 4: Visual comparisons of amodal completions across different methods: Ours consistently outperforms others in terms of realism, handling complex occlusions, and producing plausible completions. Top to bottom: examples from VG, COCO-A, free image, LAION.
  • Figure 5: Model preference of human evaluators by agreement levels. X-axis shows the number of images. 3/3 denotes full agreement among three evaluators per image, 1/3 indicates no consensus. “Ours" shows the strongest consensus on completion quality.
  • ...and 9 more figures