Table of Contents
Fetching ...

A Strong Inductive Bias: Gzip for binary image classification

Marco Scilipoti, Marina Fuster, Rodrigo Ramele

TL;DR

The paper investigates whether parameter-less models with strong inductive biases can outperform standard deep learning approaches in few-shot binary image classification. It introduces Gik, a gzip-based image kNN that uses Normalized Compression Distance to quantify similarity between images and assigns labels via nearest neighbors. Experimental results on two binary rice datasets show Gik achieves high accuracy with a markedly smaller memory footprint, while deep networks require more data to match or exceed performance. These findings highlight the potential of inductive-bias-driven approaches for memory-constrained or data-limited vision tasks and motivate further work on formalizing and leveraging inductive biases in model selection and design.

Abstract

Deep learning networks have become the de-facto standard in Computer Vision for industry and research. However, recent developments in their cousin, Natural Language Processing (NLP), have shown that there are areas where parameter-less models with strong inductive biases can serve as computationally cheaper and simpler alternatives. We propose such a model for binary image classification: a nearest neighbor classifier combined with a general purpose compressor like Gzip. We test and compare it against popular deep learning networks like Resnet, EfficientNet and Mobilenet and show that it achieves better accuracy and utilizes significantly less space, more than two order of magnitude, within a few-shot setting. As a result, we believe that this underlines the untapped potential of models with stronger inductive biases in few-shot scenarios.

A Strong Inductive Bias: Gzip for binary image classification

TL;DR

The paper investigates whether parameter-less models with strong inductive biases can outperform standard deep learning approaches in few-shot binary image classification. It introduces Gik, a gzip-based image kNN that uses Normalized Compression Distance to quantify similarity between images and assigns labels via nearest neighbors. Experimental results on two binary rice datasets show Gik achieves high accuracy with a markedly smaller memory footprint, while deep networks require more data to match or exceed performance. These findings highlight the potential of inductive-bias-driven approaches for memory-constrained or data-limited vision tasks and motivate further work on formalizing and leveraging inductive biases in model selection and design.

Abstract

Deep learning networks have become the de-facto standard in Computer Vision for industry and research. However, recent developments in their cousin, Natural Language Processing (NLP), have shown that there are areas where parameter-less models with strong inductive biases can serve as computationally cheaper and simpler alternatives. We propose such a model for binary image classification: a nearest neighbor classifier combined with a general purpose compressor like Gzip. We test and compare it against popular deep learning networks like Resnet, EfficientNet and Mobilenet and show that it achieves better accuracy and utilizes significantly less space, more than two order of magnitude, within a few-shot setting. As a result, we believe that this underlines the untapped potential of models with stronger inductive biases in few-shot scenarios.
Paper Structure (8 sections, 3 equations, 4 figures, 1 table)

This paper contains 8 sections, 3 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Diagram of gik architecture
  • Figure 2: Sample of the rice categories tested.
  • Figure 3: Mean accuracy in binary classification across the classes Jasmine and Basmati
  • Figure 4: Mean accuracy in binary classification across the classes Karacadag and Arborio