TETRIS: Towards Exploring the Robustness of Interactive Segmentation
Andrey Moskalenko, Vlad Shakhuro, Anna Vorontsova, Anton Konushin, Anton Antonov, Alexander Krapukhin, Denis Shepelev, Konstantin Soshin
TL;DR
This paper tackles the robustness gap in click-based interactive segmentation by showing that real user clicks diverge from common baseline strategies. It introduces a differentiable, white-box adversarial-input framework to generate adversarial prompts and a high-resolution TETRIS benchmark to quantify robustness across multiple click trajectories. A formal robustness score based on IoU curves (minimizing and maximizing trajectories) reveals substantial sensitivity to click position, even for leading methods, and demonstrates dataset-specific rankings. The work provides a practical protocol and a valuable dataset to drive the development of more robust interactive segmentation systems for real-world use.
Abstract
Interactive segmentation methods rely on user inputs to iteratively update the selection mask. A click specifying the object of interest is arguably the most simple and intuitive interaction type, and thereby the most common choice for interactive segmentation. However, user clicking patterns in the interactive segmentation context remain unexplored. Accordingly, interactive segmentation evaluation strategies rely more on intuition and common sense rather than empirical studies (e.g., assuming that users tend to click in the center of the area with the largest error). In this work, we conduct a real user study to investigate real user clicking patterns. This study reveals that the intuitive assumption made in the common evaluation strategy may not hold. As a result, interactive segmentation models may show high scores in the standard benchmarks, but it does not imply that they would perform well in a real world scenario. To assess the applicability of interactive segmentation methods, we propose a novel evaluation strategy providing a more comprehensive analysis of a model's performance. To this end, we propose a methodology for finding extreme user inputs by a direct optimization in a white-box adversarial attack on the interactive segmentation model. Based on the performance with such adversarial user inputs, we assess the robustness of interactive segmentation models w.r.t click positions. Besides, we introduce a novel benchmark for measuring the robustness of interactive segmentation, and report the results of an extensive evaluation of dozens of models.
