Reinforcement Learning as a Parsimonious Alternative to Prediction Cascades: A Case Study on Image Segmentation

Bharat Srikishan; Anika Tabassum; Srikanth Allu; Ramakrishnan Kannan; Nikhil Muralidhar

Reinforcement Learning as a Parsimonious Alternative to Prediction Cascades: A Case Study on Image Segmentation

Bharat Srikishan, Anika Tabassum, Srikanth Allu, Ramakrishnan Kannan, Nikhil Muralidhar

TL;DR

PaSeR introduces a non-cascading, cost-aware reinforcement learning framework for image segmentation that selects among multiple models on a per-patch basis to balance accuracy and computation. By basing its decisions on the small model’s predictions and entropy, PaSeR achieves competitive segmentation performance with substantially lower compute, quantified via the IoU per GigaFlop metric. Across battery material phase segmentation and Noisy MNIST, PaSeR delivers large improvements in efficiency (IoU/GF) while maintaining IoU close to SOTA, and demonstrates robustness to unseen noise and compatibility with complementary models. The approach offers a practical pathway for deploying high-quality segmentation in resource-constrained edge environments and provides a reusable metric and training framework for cost-aware multi-model pipelines.

Abstract

Deep learning architectures have achieved state-of-the-art (SOTA) performance on computer vision tasks such as object detection and image segmentation. This may be attributed to the use of over-parameterized, monolithic deep learning architectures executed on large datasets. Although such architectures lead to increased accuracy, this is usually accompanied by a large increase in computation and memory requirements during inference. While this is a non-issue in traditional machine learning pipelines, the recent confluence of machine learning and fields like the Internet of Things has rendered such large architectures infeasible for execution in low-resource settings. In such settings, previous efforts have proposed decision cascades where inputs are passed through models of increasing complexity until desired performance is achieved. However, we argue that cascaded prediction leads to increased computational cost due to wasteful intermediate computations. To address this, we propose PaSeR (Parsimonious Segmentation with Reinforcement Learning) a non-cascading, cost-aware learning pipeline as an alternative to cascaded architectures. Through experimental evaluation on real-world and standard datasets, we demonstrate that PaSeR achieves better accuracy while minimizing computational cost relative to cascaded models. Further, we introduce a new metric IoU/GigaFlop to evaluate the balance between cost and performance. On the real-world task of battery material phase segmentation, PaSeR yields a minimum performance improvement of 174% on the IoU/GigaFlop metric with respect to baselines. We also demonstrate PaSeR's adaptability to complementary models trained on a noisy MNIST dataset, where it achieved a minimum performance improvement on IoU/GigaFlop of 13.4% over SOTA models. Code and data are available at https://github.com/scailab/paser .

Reinforcement Learning as a Parsimonious Alternative to Prediction Cascades: A Case Study on Image Segmentation

TL;DR

Abstract

Paper Structure (23 sections, 5 equations, 12 figures, 3 tables)

This paper contains 23 sections, 5 equations, 12 figures, 3 tables.

Introduction
Related Work
Problem Formulation
Experimental Setup
Baselines
Evaluation Metrics
Dataset Description
Results & Discussion
R1: Task Performance and Computational Efficiency vs. IDK-Cascade
R2: Performance Comparison with SOTA Segmentation Models
R3: Adaptability to Unseen Contexts (Battery Data)
R4. Adaptability to Complementary Models (Noisy MNIST)
R5: Sensitivity to Hyperparameters
Conclusion
A: Results & Discussion
...and 8 more sections

Figures (12)

Figure 1: Performance w.r.t IoU/GigaFlop metric (higher is better) of SOTA models and our proposed model on the battery material phase segmentation task.
Figure 2: Overview of . The small UNet ($f_0$) yields the segmentation ($\hat{\mathbf{y}}_{f_0}$) and corresponding entropy map $\mathbf{e}_{f_0}$ conditioned on the whole input image ($\mathbf{x}$). Then, $\mathbf{x}$ is divided into 'P' equal sized patches. The RL policy directs each patch $\mathbf{x}^{(p)}$ of $\mathbf{x}$ to one of $f_0, f_1, f_2$ to maximize reward. Based on the RL actions, models $f_1$ and $f_2$ yield predictions for the corresponding image patch. All the predicted patches are then aggregated to yield the final segmentation.
Figure 3: Examples of types of noise added to MNIST data.
Figure 4: Model assignment confusion matrices for , and
Figure 5: (a) Distribution of entropy estimates with 5 and 20 Monte Carlo Dropout (MCD) samples. (b) IoU vs Mean Cost as $\lambda$ changes on battery material phase segmentation dataset.
...and 7 more figures

Reinforcement Learning as a Parsimonious Alternative to Prediction Cascades: A Case Study on Image Segmentation

TL;DR

Abstract

Reinforcement Learning as a Parsimonious Alternative to Prediction Cascades: A Case Study on Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)