Table of Contents
Fetching ...

Minimalist Vision with Freeform Pixels

Jeremy Klotz, Shree K. Nayar

TL;DR

The paper introduces minimalist vision with freeform pixels to solve lightweight vision tasks using far fewer measurements than traditional cameras. By modeling the camera as the first layer of a neural network, it learns task-specific freeform pixel shapes and downstream inference weights from data collected with a training camera, then fabricates the masks and deploys a hardware prototype that is self-powered by harvested light. Across tasks such as workspace monitoring, room lighting estimation, and traffic speed, the approach achieves performance on par with much higher-resolution baselines, while enabling privacy protection (limited identifiability) and energy autonomy through compact sensing. The work situates itself within deep optics and differentiable sensor design, and points to future gains via dynamic masking and learned optical mappings to broaden task coverage while maintaining privacy and low-power operation.

Abstract

A minimalist vision system uses the smallest number of pixels needed to solve a vision task. While traditional cameras use a large grid of square pixels, a minimalist camera uses freeform pixels that can take on arbitrary shapes to increase their information content. We show that the hardware of a minimalist camera can be modeled as the first layer of a neural network, where the subsequent layers are used for inference. Training the network for any given task yields the shapes of the camera's freeform pixels, each of which is implemented using a photodetector and an optical mask. We have designed minimalist cameras for monitoring indoor spaces (with 8 pixels), measuring room lighting (with 8 pixels), and estimating traffic flow (with 8 pixels). The performance demonstrated by these systems is on par with a traditional camera with orders of magnitude more pixels. Minimalist vision has two major advantages. First, it naturally tends to preserve the privacy of individuals in the scene since the captured information is inadequate for extracting visual details. Second, since the number of measurements made by a minimalist camera is very small, we show that it can be fully self-powered, i.e., function without an external power supply or a battery.

Minimalist Vision with Freeform Pixels

TL;DR

The paper introduces minimalist vision with freeform pixels to solve lightweight vision tasks using far fewer measurements than traditional cameras. By modeling the camera as the first layer of a neural network, it learns task-specific freeform pixel shapes and downstream inference weights from data collected with a training camera, then fabricates the masks and deploys a hardware prototype that is self-powered by harvested light. Across tasks such as workspace monitoring, room lighting estimation, and traffic speed, the approach achieves performance on par with much higher-resolution baselines, while enabling privacy protection (limited identifiability) and energy autonomy through compact sensing. The work situates itself within deep optics and differentiable sensor design, and points to future gains via dynamic masking and learned optical mappings to broaden task coverage while maintaining privacy and low-power operation.

Abstract

A minimalist vision system uses the smallest number of pixels needed to solve a vision task. While traditional cameras use a large grid of square pixels, a minimalist camera uses freeform pixels that can take on arbitrary shapes to increase their information content. We show that the hardware of a minimalist camera can be modeled as the first layer of a neural network, where the subsequent layers are used for inference. Training the network for any given task yields the shapes of the camera's freeform pixels, each of which is implemented using a photodetector and an optical mask. We have designed minimalist cameras for monitoring indoor spaces (with 8 pixels), measuring room lighting (with 8 pixels), and estimating traffic flow (with 8 pixels). The performance demonstrated by these systems is on par with a traditional camera with orders of magnitude more pixels. Minimalist vision has two major advantages. First, it naturally tends to preserve the privacy of individuals in the scene since the captured information is inadequate for extracting visual details. Second, since the number of measurements made by a minimalist camera is very small, we show that it can be fully self-powered, i.e., function without an external power supply or a battery.
Paper Structure (19 sections, 4 equations, 8 figures, 2 tables)

This paper contains 19 sections, 4 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: A freeform pixel can have an arbitrary shape. (a) A single pixel in a traditional camera is square and captures light from a small patch in the scene. (b) A freeform pixel uses a detector and an optical mask to implement any pixel shape. (c) Example of an optical mask. While this mask is binary, a mask can have any continuous transmittance function.
  • Figure 2: A minimalist camera as a part of a network. (a) The optical effects within a freeform pixel include the attenuation due to the mask, the detector’s directional response, and its active area. (b) The detector output is amplified by a gain, degraded by readout and quantization noise, and clipped by the finite dynamic range of the detector. (c) The output $p_f$ of the freeform pixel is fed into the inference network, which uses the outputs of all the pixels of the camera to produce the task output.
  • Figure 3: Reduction in pixel count with freeform pixels. (a) The task is to count the number of patches in an image (up to 10), where the patches have random locations, brightnesses, and sizes. We trained minimalist cameras with an increasing number of freeform pixels (up to 128). (b) The learned freeform pixels for a 4-pixel minimalist camera. (c) The counting performance of a minimalist camera with these 4 freeform pixels is on par with that of a $32\times32$ baseline camera. This corresponds to a $256\times$ reduction in pixel count. Note that the $x$-axis is scaled logarithmically.
  • Figure 4: Hardware prototype of a minimalist camera. (a) The masks of the pixels are printed on a single transparency, and the corresponding detectors are arranged on the imaging board in (b). (c) The back of the imaging board shows the key components of the camera, including an amplifier for each pixel, a supercap, a multiplexer, and a microcontroller that is Bluetooth enabled. Attached to each side of the camera is a thin solar panel. The energy harvested from the four panels is sufficient for the camera to function in a fully self-powered mode in an indoor environment (see \ref{['fig:self-powered']}).
  • Figure 5: Minimalist camera in fully self-powered mode. The prototype can be entirely powered by just the light falling on it. In a well-lit indoor environment, it can can read out and wirelessly transmit measurements from 24 pixels at 30 frames per second. In this demonstration, the mask of each pixel is uniform in transmittance. A black sheet is moved over the array of pixels, and the wirelessly received measurements are displayed on a remote host shown below. Please see the supplemental video.
  • ...and 3 more figures