Table of Contents
Fetching ...

OneNet: A Channel-Wise 1D Convolutional U-Net

Sanghyun Byun, Kayvan Shah, Ayushi Gang, Christopher Apton, Jacob Song, Woo Seong Chung

TL;DR

This work presents a streamlined alternative: a 1D convolutional encoder that retains accuracy while enhancing its suitability for edge applications, and explores a fully 1D encoder-decoder that achieves a 71% reduction in size, albeit with some accuracy loss.

Abstract

Many state-of-the-art computer vision architectures leverage U-Net for its adaptability and efficient feature extraction. However, the multi-resolution convolutional design often leads to significant computational demands, limiting deployment on edge devices. We present a streamlined alternative: a 1D convolutional encoder that retains accuracy while enhancing its suitability for edge applications. Our novel encoder architecture achieves semantic segmentation through channel-wise 1D convolutions combined with pixel-unshuffle operations. By incorporating PixelShuffle, known for improving accuracy in super-resolution tasks while reducing computational load, OneNet captures spatial relationships without requiring 2D convolutions, reducing parameters by up to 47%. Additionally, we explore a fully 1D encoder-decoder that achieves a 71% reduction in size, albeit with some accuracy loss. We benchmark our approach against U-Net variants across diverse mask-generation tasks, demonstrating that it preserves accuracy effectively. Although focused on image segmentation, this architecture is adaptable to other convolutional applications. Code for the project is available at https://github.com/shbyun080/OneNet .

OneNet: A Channel-Wise 1D Convolutional U-Net

TL;DR

This work presents a streamlined alternative: a 1D convolutional encoder that retains accuracy while enhancing its suitability for edge applications, and explores a fully 1D encoder-decoder that achieves a 71% reduction in size, albeit with some accuracy loss.

Abstract

Many state-of-the-art computer vision architectures leverage U-Net for its adaptability and efficient feature extraction. However, the multi-resolution convolutional design often leads to significant computational demands, limiting deployment on edge devices. We present a streamlined alternative: a 1D convolutional encoder that retains accuracy while enhancing its suitability for edge applications. Our novel encoder architecture achieves semantic segmentation through channel-wise 1D convolutions combined with pixel-unshuffle operations. By incorporating PixelShuffle, known for improving accuracy in super-resolution tasks while reducing computational load, OneNet captures spatial relationships without requiring 2D convolutions, reducing parameters by up to 47%. Additionally, we explore a fully 1D encoder-decoder that achieves a 71% reduction in size, albeit with some accuracy loss. We benchmark our approach against U-Net variants across diverse mask-generation tasks, demonstrating that it preserves accuracy effectively. Although focused on image segmentation, this architecture is adaptable to other convolutional applications. Code for the project is available at https://github.com/shbyun080/OneNet .

Paper Structure

This paper contains 12 sections, 3 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Channel-Wise 1D Convolution Block (a) Encoder convolution block with pixel-unshuffle downscaling replacing max pooling operation, followed by a single spatial and two channel-wise layers. (b) Decoder convolution block with pixel-shuffle upscaling for tensor upsampling, followed by a spatial layer between two channel-wise layers.
  • Figure 2: Channel-Wise 1D Encoder-Decoder OneNet employs a U-Net unet architecture with skip connections for segmentation tasks. The architecture above is a 3-layer variant shown for simplicity. The encoder block replaces the max pool layer with pixel-unshuffle downscaling, with the image downscaled immediately on input for spatial relations to be captured. The decoder block replaces upsampling methods with a pixel-shuffle upscaling. In the architecture shown, 1D convolution is used for both encoder and decoder, with optional spatial convolution. To satisfy the spatial-preservation property in the decoder, we only decode to half resolution. The top layer of the decoder is implemented without batch normalization or ReLU to avoid zero-centering of the prediction head.
  • Figure 3: Comparison of Convolutional Block (a) Traditional 2D convolutional block with max pooling. (b) MobileNet mobilenet block with max pooling. (c) OneNet implementation with pixel-unshuffle downscaling followed by 1D convolution.