PTQ4RIS: Post-Training Quantization for Referring Image Segmentation

Xiaoyan Jiang; Hang Yang; Kaiying Zhu; Xihe Qiu; Shibo Zhao; Sifan Zhou

PTQ4RIS: Post-Training Quantization for Referring Image Segmentation

Xiaoyan Jiang, Hang Yang, Kaiying Zhu, Xihe Qiu, Shibo Zhao, Sifan Zhou

TL;DR

PTQ4RIS targets enabling on-device RIS by introducing RIS-specific post-training quantization: Dual-Region Quantization (DRQ) for the visual encoder to handle non-Gaussian post-Softmax/GeLU activations, and Reorder-based Outlier-Retained Quantization (RORQ) for the text encoder to manage activation outliers. Together with a coordinated, fine-grained PTQ workflow, the method preserves cross-modal segmentation performance at 8-bit and remains robust at 6- and 4-bit settings, outperforming existing PTQ strategies designed for single modalities. Extensive ablations validate the contribution of each component, and results on three RIS benchmarks show near-FP performance on key datasets, highlighting the practical viability of RIS quantization for edge robotics. The work provides code and a video to support reproducibility and real-world deployment considerations.

Abstract

Referring Image Segmentation (RIS), aims to segment the object referred by a given sentence in an image by understanding both visual and linguistic information. However, existing RIS methods tend to explore top-performance models, disregarding considerations for practical applications on resources-limited edge devices. This oversight poses a significant challenge for on-device RIS inference. To this end, we propose an effective and efficient post-training quantization framework termed PTQ4RIS. Specifically, we first conduct an in-depth analysis of the root causes of performance degradation in RIS model quantization and propose dual-region quantization (DRQ) and reorder-based outlier-retained quantization (RORQ) to address the quantization difficulties in visual and text encoders. Extensive experiments on three benchmarks with different bits settings (from 8 to 4 bits) demonstrates its superior performance. Importantly, we are the first PTQ method specifically designed for the RIS task, highlighting the feasibility of PTQ in RIS applications. Code and video are available at {https://github.com/gugu511yy/PTQ4RIS}.

PTQ4RIS: Post-Training Quantization for Referring Image Segmentation

TL;DR

Abstract

PTQ4RIS: Post-Training Quantization for Referring Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)