Table of Contents
Fetching ...

Placing Objects in Context via Inpainting for Out-of-distribution Segmentation

Pau de Jorge, Riccardo Volpi, Puneet K. Dokania, Philip H. S. Torr, Gregory Rogez

TL;DR

The paper tackles the challenge of evaluating and improving anomaly (OOD) segmentation for real-world deployment under limited and shifting data. It introduces Placing Objects in Context (POC), a diffusion-model–based inpainting pipeline that realistically inserts objects into images using region-aware placement and open-vocabulary labeling, enabling dynamic dataset generation with minimal setup. POC-generated data improves OOD fine-tuning performance across benchmarks and can extend Cityscapes with Pascal animal classes, achieving near-Pascal baselines and demonstrating a small synth2real gap. The work provides a practical, plug-and-play tool for building realistic anomaly datasets and extending annotations, with broad applicability to safe deployment of semantic segmentation in autonomous systems.

Abstract

When deploying a semantic segmentation model into the real world, it will inevitably encounter semantic classes that were not seen during training. To ensure a safe deployment of such systems, it is crucial to accurately evaluate and improve their anomaly segmentation capabilities. However, acquiring and labelling semantic segmentation data is expensive and unanticipated conditions are long-tail and potentially hazardous. Indeed, existing anomaly segmentation datasets capture a limited number of anomalies, lack realism or have strong domain shifts. In this paper, we propose the Placing Objects in Context (POC) pipeline to realistically add any object into any image via diffusion models. POC can be used to easily extend any dataset with an arbitrary number of objects. In our experiments, we present different anomaly segmentation datasets based on POC-generated data and show that POC can improve the performance of recent state-of-the-art anomaly fine-tuning methods across several standardized benchmarks. POC is also effective for learning new classes. For example, we utilize it to augment Cityscapes samples by incorporating a subset of Pascal classes and demonstrate that models trained on such data achieve comparable performance to the Pascal-trained baseline. This corroborates the low synth2real gap of models trained on POC-generated images. Code: https://github.com/naver/poc

Placing Objects in Context via Inpainting for Out-of-distribution Segmentation

TL;DR

The paper tackles the challenge of evaluating and improving anomaly (OOD) segmentation for real-world deployment under limited and shifting data. It introduces Placing Objects in Context (POC), a diffusion-model–based inpainting pipeline that realistically inserts objects into images using region-aware placement and open-vocabulary labeling, enabling dynamic dataset generation with minimal setup. POC-generated data improves OOD fine-tuning performance across benchmarks and can extend Cityscapes with Pascal animal classes, achieving near-Pascal baselines and demonstrating a small synth2real gap. The work provides a practical, plug-and-play tool for building realistic anomaly datasets and extending annotations, with broad applicability to safe deployment of semantic segmentation in autonomous systems.

Abstract

When deploying a semantic segmentation model into the real world, it will inevitably encounter semantic classes that were not seen during training. To ensure a safe deployment of such systems, it is crucial to accurately evaluate and improve their anomaly segmentation capabilities. However, acquiring and labelling semantic segmentation data is expensive and unanticipated conditions are long-tail and potentially hazardous. Indeed, existing anomaly segmentation datasets capture a limited number of anomalies, lack realism or have strong domain shifts. In this paper, we propose the Placing Objects in Context (POC) pipeline to realistically add any object into any image via diffusion models. POC can be used to easily extend any dataset with an arbitrary number of objects. In our experiments, we present different anomaly segmentation datasets based on POC-generated data and show that POC can improve the performance of recent state-of-the-art anomaly fine-tuning methods across several standardized benchmarks. POC is also effective for learning new classes. For example, we utilize it to augment Cityscapes samples by incorporating a subset of Pascal classes and demonstrate that models trained on such data achieve comparable performance to the Pascal-trained baseline. This corroborates the low synth2real gap of models trained on POC-generated images. Code: https://github.com/naver/poc
Paper Structure (21 sections, 30 figures, 4 tables)

This paper contains 21 sections, 30 figures, 4 tables.

Figures (30)

  • Figure 1: Samples from previous OOD datasets. FS Static has unrealistic OOD objects while RoadAnomaly and SMIYC datasets have strong domain shifts from Cityscapes. FS L&F (which manually inserts OOD objects) and StreetHazards (full simulation) have large set-up costs.
  • Figure 2: Left: Samples of our POC-generated datasets. Top to bottom, inserted anomalies are "sheep", "dumped furniture" and "carton box". Middle: AUPRC on different anomaly segmentation datasets. We evaluate RbA nayal2023rba prior to fine-tuning, and after fine-tuning with COCO objects or POC-generated images. Fine-tuning with POC improves results on several benchmarks. Right: Beyond road scenes, POC can be applied seamlessly in diverse scenes. Clockwise, inserted objects are: "sea turtle", "person skiing", "white porcelain mug", "inflatable flamingo", "polar bear" and "rubber duck".
  • Figure 3: Illustratation of our POC pipeline and applications. Our pipeline builds on top of inpainting and open-vocabulary segmentation models to insert arbitrary objects into images realistically. The resulting images can be used for different tasks.
  • Figure 4: Anomaly score maps. Per-pixel anomaly scores on POC-generated images obtained with M2A rai2023unmasking, before and after fine-tuning with COCO and POC data. COCO and POC fine-tuning have notable improvements over the No ft. baseline, e.g., note the garbage bag or matress in second and third images.
  • Figure 5: Boxplots of anomaly scores. All datasets have consistently very high scores for OOD pixels while ID pixels of datasets with strong distribution shifts also have shifted scores. Thus, distribution shifts may lead to underestimated performance.
  • ...and 25 more figures