Table of Contents
Fetching ...

ShelfRectNet: Single View Shelf Image Rectification with Homography Estimation

Onur Berk Tore, Ibrahim Samil Yalciner, Server Calap

TL;DR

ShelfRectNet tackles the practical problem of rectifying retail shelf images from a single view by directly regressing a $4$-point homography using a ConvNeXt-Nano backbone and normalized coordinate regression. It introduces a synthetic augmentation strategy that samples homographies from the training distribution and a new ShelfRectSet dataset for realistic benchmarking, achieving a mean corner error of $1.298$ pixels on the test set. The authors provide comprehensive ablations showing the benefits of augmentation, range normalization, and a 4-point parameterization, while also comparing against classical and deep-learning baselines to demonstrate competitive accuracy and speed. This work advances real-world single-view rectification and provides public data and code to accelerate future retail-vision research.

Abstract

Estimating homography from a single image remains a challenging yet practically valuable task, particularly in domains like retail, where only one viewpoint is typically available for shelf monitoring and product alignment. In this paper, we present a deep learning framework that predicts a 4-point parameterized homography matrix to rectify shelf images captured from arbitrary angles. Our model leverages a ConvNeXt-based backbone for enhanced feature representation and adopts normalized coordinate regression for improved stability. To address data scarcity and promote generalization, we introduce a novel augmentation strategy by modeling and sampling synthetic homographies. Our method achieves a mean corner error of 1.298 pixels on the test set. When compared with both classical computer vision and deep learning-based approaches, our method demonstrates competitive performance in both accuracy and inference speed. Together, these results establish our approach as a robust and efficient solution for realworld single-view rectification. To encourage further research in this domain, we will make our dataset, ShelfRectSet, and code publicly available

ShelfRectNet: Single View Shelf Image Rectification with Homography Estimation

TL;DR

ShelfRectNet tackles the practical problem of rectifying retail shelf images from a single view by directly regressing a -point homography using a ConvNeXt-Nano backbone and normalized coordinate regression. It introduces a synthetic augmentation strategy that samples homographies from the training distribution and a new ShelfRectSet dataset for realistic benchmarking, achieving a mean corner error of pixels on the test set. The authors provide comprehensive ablations showing the benefits of augmentation, range normalization, and a 4-point parameterization, while also comparing against classical and deep-learning baselines to demonstrate competitive accuracy and speed. This work advances real-world single-view rectification and provides public data and code to accelerate future retail-vision research.

Abstract

Estimating homography from a single image remains a challenging yet practically valuable task, particularly in domains like retail, where only one viewpoint is typically available for shelf monitoring and product alignment. In this paper, we present a deep learning framework that predicts a 4-point parameterized homography matrix to rectify shelf images captured from arbitrary angles. Our model leverages a ConvNeXt-based backbone for enhanced feature representation and adopts normalized coordinate regression for improved stability. To address data scarcity and promote generalization, we introduce a novel augmentation strategy by modeling and sampling synthetic homographies. Our method achieves a mean corner error of 1.298 pixels on the test set. When compared with both classical computer vision and deep learning-based approaches, our method demonstrates competitive performance in both accuracy and inference speed. Together, these results establish our approach as a robust and efficient solution for realworld single-view rectification. To encourage further research in this domain, we will make our dataset, ShelfRectSet, and code publicly available

Paper Structure

This paper contains 15 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Two representative retail shelf images, one exhibiting significant perspective distortion (left) and the other captured from a fronto-parallel viewpoint (right).
  • Figure 2: Mean displacement values (in pixels) for each of the four corners across the entire ShelfRectSet, illustrating the distribution of annotation adjustments.
  • Figure 3: The left image displays the original single-viewpoint capture with initial corner placements marked by red circles. On the right, the green circles indicate the interactively adjusted corner locations used to compute the rectifying homography, transforming the image to a fronto-parallel perspective.
  • Figure 4: Qualitative comparison of our model's predictions against the ground truth. Each group from left to right displays the input image, the ground-truth homography overlay, and our model's predicted homography overlay.