SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space

Viktoriia Zinkovich; Anton Antonov; Andrei Spiridonov; Denis Shepelev; Andrey Moskalenko; Daria Pugacheva; Elena Tutubalina; Andrey Kuznetsov; Vlad Shakhuro

SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space

Viktoriia Zinkovich, Anton Antonov, Andrei Spiridonov, Denis Shepelev, Andrey Moskalenko, Daria Pugacheva, Elena Tutubalina, Andrey Kuznetsov, Vlad Shakhuro

TL;DR

This work addresses the robustness of reasoning segmentation models to semantically equivalent yet adversarial paraphrases. It introduces SPARTA, a black-box method that optimizes paraphrases in the latent space of a pretrained text autoencoder (SONAR) via reinforcement learning to maximize segmentation degradation measured by IoU drops. An automatic evaluation protocol, augmented by LLM-based paraphrase detection and semantic similarity filtering, validates paraphrase quality and attack effectiveness; human studies further align automatic scoring with judgments of validity. Across ReasonSeg and LLMSeg-40k, SPARTA outperforms baselines by up to 2x and reveals that current reasoning segmentation models remain vulnerable to carefully crafted paraphrases under strict grammatical and semantic constraints. The work provides a foundation for evaluating and improving the robustness of multimodal vision-language systems, with implications for safer and more reliable AI deployments.

Abstract

Multimodal large language models (MLLMs) have shown impressive capabilities in vision-language tasks such as reasoning segmentation, where models generate segmentation masks based on textual queries. While prior work has primarily focused on perturbing image inputs, semantically equivalent textual paraphrases-crucial in real-world applications where users express the same intent in varied ways-remain underexplored. To address this gap, we introduce a novel adversarial paraphrasing task: generating grammatically correct paraphrases that preserve the original query meaning while degrading segmentation performance. To evaluate the quality of adversarial paraphrases, we develop a comprehensive automatic evaluation protocol validated with human studies. Furthermore, we introduce SPARTA-a black-box, sentence-level optimization method that operates in the low-dimensional semantic latent space of a text autoencoder, guided by reinforcement learning. SPARTA achieves significantly higher success rates, outperforming prior methods by up to 2x on both the ReasonSeg and LLMSeg-40k datasets. We use SPARTA and competitive baselines to assess the robustness of advanced reasoning segmentation models. We reveal that they remain vulnerable to adversarial paraphrasing-even under strict semantic and grammatical constraints. All code and data will be released publicly upon acceptance.

SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space

TL;DR

Abstract

SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)