Table of Contents
Fetching ...

PTQ4RIS: Post-Training Quantization for Referring Image Segmentation

Xiaoyan Jiang, Hang Yang, Kaiying Zhu, Xihe Qiu, Shibo Zhao, Sifan Zhou

TL;DR

PTQ4RIS targets enabling on-device RIS by introducing RIS-specific post-training quantization: Dual-Region Quantization (DRQ) for the visual encoder to handle non-Gaussian post-Softmax/GeLU activations, and Reorder-based Outlier-Retained Quantization (RORQ) for the text encoder to manage activation outliers. Together with a coordinated, fine-grained PTQ workflow, the method preserves cross-modal segmentation performance at 8-bit and remains robust at 6- and 4-bit settings, outperforming existing PTQ strategies designed for single modalities. Extensive ablations validate the contribution of each component, and results on three RIS benchmarks show near-FP performance on key datasets, highlighting the practical viability of RIS quantization for edge robotics. The work provides code and a video to support reproducibility and real-world deployment considerations.

Abstract

Referring Image Segmentation (RIS), aims to segment the object referred by a given sentence in an image by understanding both visual and linguistic information. However, existing RIS methods tend to explore top-performance models, disregarding considerations for practical applications on resources-limited edge devices. This oversight poses a significant challenge for on-device RIS inference. To this end, we propose an effective and efficient post-training quantization framework termed PTQ4RIS. Specifically, we first conduct an in-depth analysis of the root causes of performance degradation in RIS model quantization and propose dual-region quantization (DRQ) and reorder-based outlier-retained quantization (RORQ) to address the quantization difficulties in visual and text encoders. Extensive experiments on three benchmarks with different bits settings (from 8 to 4 bits) demonstrates its superior performance. Importantly, we are the first PTQ method specifically designed for the RIS task, highlighting the feasibility of PTQ in RIS applications. Code and video are available at {https://github.com/gugu511yy/PTQ4RIS}.

PTQ4RIS: Post-Training Quantization for Referring Image Segmentation

TL;DR

PTQ4RIS targets enabling on-device RIS by introducing RIS-specific post-training quantization: Dual-Region Quantization (DRQ) for the visual encoder to handle non-Gaussian post-Softmax/GeLU activations, and Reorder-based Outlier-Retained Quantization (RORQ) for the text encoder to manage activation outliers. Together with a coordinated, fine-grained PTQ workflow, the method preserves cross-modal segmentation performance at 8-bit and remains robust at 6- and 4-bit settings, outperforming existing PTQ strategies designed for single modalities. Extensive ablations validate the contribution of each component, and results on three RIS benchmarks show near-FP performance on key datasets, highlighting the practical viability of RIS quantization for edge robotics. The work provides code and a video to support reproducibility and real-world deployment considerations.

Abstract

Referring Image Segmentation (RIS), aims to segment the object referred by a given sentence in an image by understanding both visual and linguistic information. However, existing RIS methods tend to explore top-performance models, disregarding considerations for practical applications on resources-limited edge devices. This oversight poses a significant challenge for on-device RIS inference. To this end, we propose an effective and efficient post-training quantization framework termed PTQ4RIS. Specifically, we first conduct an in-depth analysis of the root causes of performance degradation in RIS model quantization and propose dual-region quantization (DRQ) and reorder-based outlier-retained quantization (RORQ) to address the quantization difficulties in visual and text encoders. Extensive experiments on three benchmarks with different bits settings (from 8 to 4 bits) demonstrates its superior performance. Importantly, we are the first PTQ method specifically designed for the RIS task, highlighting the feasibility of PTQ in RIS applications. Code and video are available at {https://github.com/gugu511yy/PTQ4RIS}.
Paper Structure (15 sections, 3 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 15 sections, 3 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: (a) The pipeline of RIS, (b) Comparison of OIoU with various quantization methods (RTN, PTQ4ViT 37, RepQ-ViT 38) and our PTQ4RIS on the RefCOCO+ testB.
  • Figure 2: Dual-region quantization under 8-bit.
  • Figure 3: Reorder-based Outlier-Retained Quantization