Table of Contents
Fetching ...

Subtle Motion Blur Detection and Segmentation from Static Image Artworks

Ganesh Samarth, Sibendu Paul, Solale Tabarestani, Caren Chen

TL;DR

This work proposes SMBlurDetect, a unified framework combining high-quality motion blur specific dataset generation with an end-to-end detector capable of zero-shot detection at multiple granularities, and achieves strong zero-shot generalization.

Abstract

Streaming services serve hundreds of millions of viewers worldwide, where visual assets such as thumbnails, box art, and cover images are critical for engagement. Subtle motion blur remains a pervasive quality issue, reducing visual clarity and negatively affecting user trust and click-through rates. However, motion blur detection from static images is underexplored, as existing methods and datasets focus on severe blur and lack fine-grained pixel-level annotations needed for quality-critical applications. Benchmarks such as GOPRO and NFS are dominated by strong synthetic blur and often contain residual blur in their sharp references, leading to ambiguous supervision. We propose SMBlurDetect, a unified framework combining high-quality motion blur specific dataset generation with an end-to-end detector capable of zero-shot detection at multiple granularities. Our pipeline synthesizes realistic motion blur from super high resolution aesthetic images using controllable camera and object motion simulations over SAM segmented regions, enhanced with alpha-aware compositing and balanced sampling to generate subtle, spatially localized blur with precise ground truth masks. We train a U-Net based detector with ImageNet pretrained encoders using a hybrid mask and image centric strategy incorporating curriculum learning, hard negatives, focal loss, blur frequency channels, and resolution aware augmentation.Our method achieves strong zero-shot generalization, reaching 89.68% accuracy on GoPro (vs 66.50% baseline) and 59.77% Mean IoU on CUHK (vs 9.00% baseline), demonstrating 6.6x improvement in segmentation. Qualitative results show accurate localization of subtle blur artifacts, enabling automated filtering of low quality frames and precise region of interest extraction for intelligent cropping.

Subtle Motion Blur Detection and Segmentation from Static Image Artworks

TL;DR

This work proposes SMBlurDetect, a unified framework combining high-quality motion blur specific dataset generation with an end-to-end detector capable of zero-shot detection at multiple granularities, and achieves strong zero-shot generalization.

Abstract

Streaming services serve hundreds of millions of viewers worldwide, where visual assets such as thumbnails, box art, and cover images are critical for engagement. Subtle motion blur remains a pervasive quality issue, reducing visual clarity and negatively affecting user trust and click-through rates. However, motion blur detection from static images is underexplored, as existing methods and datasets focus on severe blur and lack fine-grained pixel-level annotations needed for quality-critical applications. Benchmarks such as GOPRO and NFS are dominated by strong synthetic blur and often contain residual blur in their sharp references, leading to ambiguous supervision. We propose SMBlurDetect, a unified framework combining high-quality motion blur specific dataset generation with an end-to-end detector capable of zero-shot detection at multiple granularities. Our pipeline synthesizes realistic motion blur from super high resolution aesthetic images using controllable camera and object motion simulations over SAM segmented regions, enhanced with alpha-aware compositing and balanced sampling to generate subtle, spatially localized blur with precise ground truth masks. We train a U-Net based detector with ImageNet pretrained encoders using a hybrid mask and image centric strategy incorporating curriculum learning, hard negatives, focal loss, blur frequency channels, and resolution aware augmentation.Our method achieves strong zero-shot generalization, reaching 89.68% accuracy on GoPro (vs 66.50% baseline) and 59.77% Mean IoU on CUHK (vs 9.00% baseline), demonstrating 6.6x improvement in segmentation. Qualitative results show accurate localization of subtle blur artifacts, enabling automated filtering of low quality frames and precise region of interest extraction for intelligent cropping.
Paper Structure (28 sections, 10 equations, 4 figures, 3 tables)

This paper contains 28 sections, 10 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Examples from GOPRO gopro and NFS nfs where “sharp” images still contain subtle motion blur, illustrating why existing datasets are unreliable for training motion-blur detectors for quality-critical applications.
  • Figure 2: End-to-end SMBlurDetect Pipeline Overview. The system comprises three main components: (1) Dataset Preparation: High-quality images from LAION-5B laion are processed through SAM-based segmentation to extract foreground instance masks for critical regions (faces, hands, hair), with hybrid Mask-Centric and Image-Centric sampling strategies controlled by $mask_{ratio}$ parameter. (2) Motion Blur Synthesis: Six physically motivated blur types (straight, curved, zoom with rotation, random-walk, edge-ring, rolling) are applied using exposure-based temporal integration and PSF convolution, with alpha-aware compositing and adaptive edge feathering to generate photorealistic blur with precise ground-truth masks. (3) Dual-Head U-Net Architecture: A ResNet-50 ImageNet-pretrained encoder with decoder skip connections produces binary blur segmentation (Mask Head) and continuous blur-intensity maps (Regression Head), trained using composite loss through three-stage progressive curriculum learning, enabling accurate multi-granularity detection of subtle motion blur in artworks.
  • Figure 3: Foreground instance masks highlighting regions most susceptible to motion blur, for example face, hair, and hands.
  • Figure 4: Our photorealistic blur augmentation pipeline implements six distinct blur types, each designed to model a specific motion scenario. For every instance, one blur type is applied at varying strengths, enabling multiple motion patterns to coexist within a single image.