Table of Contents
Fetching ...

SnapPix: Efficient-Coding--Inspired In-Sensor Compression for Edge Vision

Weikai Lin, Tianrui Ma, Adith Boloor, Yu Feng, Ruofan Xing, Xuan Zhang, Yuhao Zhu

TL;DR

SnapPix tackles the energy bottleneck of edge sensing by performing in-sensor compression via coded exposure (CE). It introduces a decorrelation-based, task-agnostic CE pattern learned to minimize redundancy and a tile-repetitive exposure scheme co-designed with a Vision Transformer (ViT) backbone, plus lightweight hardware augmentations to support CE with negligible area impact. The approach yields energy savings ranging from 1.4x to 15.4x and outperforms task-specific and video-based baselines on action recognition and video reconstruction, while maintaining competitive accuracy. This work enables energy-efficient, multi-task edge vision with practical hardware support and open-source tooling for broader adoption.

Abstract

Energy-efficient image acquisition on the edge is crucial for enabling remote sensing applications where the sensor node has weak compute capabilities and must transmit data to a remote server/cloud for processing. To reduce the edge energy consumption, this paper proposes a sensor-algorithm co-designed system called SnapPix, which compresses raw pixels in the analog domain inside the sensor. We use coded exposure (CE) as the in-sensor compression strategy as it offers the flexibility to sample, i.e., selectively expose pixels, both spatially and temporally. SNAPPIX has three contributions. First, we propose a task-agnostic strategy to learn the sampling/exposure pattern based on the classic theory of efficient coding. Second, we co-design the downstream vision model with the exposure pattern to address the pixel-level non-uniformity unique to CE-compressed images. Finally, we propose lightweight augmentations to the image sensor hardware to support our in-sensor CE compression. Evaluating on action recognition and video reconstruction, SnapPix outperforms state-of-the-art video-based methods at the same speed while reducing the energy by up to 15.4x. We have open-sourced the code at: https://github.com/horizon-research/SnapPix.

SnapPix: Efficient-Coding--Inspired In-Sensor Compression for Edge Vision

TL;DR

SnapPix tackles the energy bottleneck of edge sensing by performing in-sensor compression via coded exposure (CE). It introduces a decorrelation-based, task-agnostic CE pattern learned to minimize redundancy and a tile-repetitive exposure scheme co-designed with a Vision Transformer (ViT) backbone, plus lightweight hardware augmentations to support CE with negligible area impact. The approach yields energy savings ranging from 1.4x to 15.4x and outperforms task-specific and video-based baselines on action recognition and video reconstruction, while maintaining competitive accuracy. This work enables energy-efficient, multi-task edge vision with practical hardware support and open-source tooling for broader adoption.

Abstract

Energy-efficient image acquisition on the edge is crucial for enabling remote sensing applications where the sensor node has weak compute capabilities and must transmit data to a remote server/cloud for processing. To reduce the edge energy consumption, this paper proposes a sensor-algorithm co-designed system called SnapPix, which compresses raw pixels in the analog domain inside the sensor. We use coded exposure (CE) as the in-sensor compression strategy as it offers the flexibility to sample, i.e., selectively expose pixels, both spatially and temporally. SNAPPIX has three contributions. First, we propose a task-agnostic strategy to learn the sampling/exposure pattern based on the classic theory of efficient coding. Second, we co-design the downstream vision model with the exposure pattern to address the pixel-level non-uniformity unique to CE-compressed images. Finally, we propose lightweight augmentations to the image sensor hardware to support our in-sensor CE compression. Evaluating on action recognition and video reconstruction, SnapPix outperforms state-of-the-art video-based methods at the same speed while reducing the energy by up to 15.4x. We have open-sourced the code at: https://github.com/horizon-research/SnapPix.

Paper Structure

This paper contains 33 sections, 3 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: SnapPix reduces edge sensing energy through in-sensor compression by decorrelating output pixel values. This is inspired by the mammalian visual system, where the retina compresses information by decorrelating the retinal output neurons; signals carried through the optic nerve, while at a much lower bandwidth than at the initial stage of the retina, encodes essential information that permits the downstream visual cortex to effectively perform visual tasks.
  • Figure 2: Coded exposure with 5 exposure slots. In each slot, pixels are selectively exposed, controlled by a coded mask. In the end, the values at all the exposure slots are integrated pixel-wise to form one single coded image.
  • Figure 3: Illustration of training for pixel decorrelation in coded images. A coded image is divided into tiles, each containing $P$ pixels. The CE mask (not shown) is optimized to decorrelate any pair of pixels within a tile. Zero-mean contrast encoding is applied, ensuring the mean pixel value of each tile is zero.
  • Figure 4: The end-to-end pipeline of SnapPix with in-sensor CE for compression and a ViT-based vision model for downstream tasks. The CE pattern is trained task-independently using decorrelation, while the downstream model is co-designed with CE patterns and pre-trained specifically for CE-encoded inputs.
  • Figure 5: Schematic of the proposed CE pixel. It is based on a stacked design that is commonly used in modern CMOS image sensors oike2021evolution.
  • ...and 1 more figures