OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval

Teng Wang; Rong Shan; Jianghao Lin; Junjie Wu; Tianyi Xu; Jianping Zhang; Wenteng Chen; Changwang Zhang; Zhaoxiang Wang; Weinan Zhang; Jun Wang

OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval

Teng Wang, Rong Shan, Jianghao Lin, Junjie Wu, Tianyi Xu, Jianping Zhang, Wenteng Chen, Changwang Zhang, Zhaoxiang Wang, Weinan Zhang, Jun Wang

TL;DR

OSCAR reframes agentic composed image retrieval as principled trajectory optimization, replacing heuristic search with a two-stage MIP that yields optimal tool-call trajectories and set-theoretic compositions. An offline phase constructs a Golden Library of demonstrations that guide a VLM planner during online inference, enabling efficient, single-pass CIR with robust generalization from only 10% of training data. Empirically, OSCAR achieves state-of-the-art results on CIRCO, CIRR, FashionIQ, and industrial galleries, while maintaining strong performance across diverse VLM backbones. This optimization-guided framework offers a scalable, reusable approach to complex multimodal reasoning in retrieval tasks.

Abstract

Composed image retrieval (CIR) requires complex reasoning over heterogeneous visual and textual constraints. Existing approaches largely fall into two paradigms: unified embedding retrieval, which suffers from single-model myopia, and heuristic agentic retrieval, which is limited by suboptimal, trial-and-error orchestration. To this end, we propose OSCAR, an optimization-steered agentic planning framework for composed image retrieval. We are the first to reformulate agentic CIR from a heuristic search process into a principled trajectory optimization problem. Instead of relying on heuristic trial-and-error exploration, OSCAR employs a novel offline-online paradigm. In the offline phase, we model CIR via atomic retrieval selection and composition as a two-stage mixed-integer programming problem, mathematically deriving optimal trajectories that maximize ground-truth coverage for training samples via rigorous boolean set operations. These trajectories are then stored in a golden library to serve as in-context demonstrations for online steering of VLM planner at online inference time. Extensive experiments on three public benchmarks and a private industrial benchmark show that OSCAR consistently outperforms SOTA baselines. Notably, it achieves superior performance using only 10% of training data, demonstrating strong generalization of planning logic rather than dataset-specific memorization.

OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval

TL;DR

Abstract

Paper Structure (38 sections, 20 equations, 4 figures, 9 tables)

This paper contains 38 sections, 20 equations, 4 figures, 9 tables.

Introduction
Related Works
Methodology
Preliminaries
Atomic Retrieval Construction
Recall-Oriented Selection MIP
Precision-Oriented Composition MIP
Optimization-Steered Inference for CIR
Discussion
Experiment
Experiment Setups
Datasets
Metrics
Baselines
Implementation Details
...and 23 more sections

Figures (4)

Figure 1: The illustration of limitations of existing image retrieval methods, i.e., (a) single-model myopia of unified embedding retrieval, and (b) suboptimal orchestration of heuristic agentic retrieval.
Figure 2: The overall framework of our proposed OSCAR.
Figure 3: Performance comparison w.r.t. different numbers of demonstrations (i.e., the number of shots) for inference-time steering. "m" and "R" denotes mAP and Recall. The red dashed line denotes the zero-shot performance of OSCAR (i.e., mAP@50 on CIRCO, Recall@50 on CIRR and FashionIQ).
Figure 4: Case Studies on FashionIQ (left) and CIRR (right) datasets. The left part shows the correct tool call trajectory with the ground truth image ranked to the first place. The right part illustrates the effectiveness of golden library, with whose help the agent can avoid previous wrong tool calls and finally retrieve the ground truth image.

OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval

TL;DR

Abstract

OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (4)