Learning More by Seeing Less: Structure First Learning for Efficient, Transferable, and Human-Aligned Vision

Tianqin Li; George Liu; Tai Sing Lee

Learning More by Seeing Less: Structure First Learning for Efficient, Transferable, and Human-Aligned Vision

Tianqin Li, George Liu, Tai Sing Lee

TL;DR

This paper introduces a structure-first learning paradigm that begins training with line drawings to bias models toward structural information, aiming for efficient, transferable, and human-aligned vision. By converting photographs to line drawings and augmenting with stylized sketches, the authors train two-stage curricula (Line→Color) that improve shape bias, attention focus, and data efficiency across classification, segmentation, and detection, while yielding compact representations that transfer well to lightweight models. The approach demonstrates consistent gains across CNN and transformer backbones, improves downstream task performance, and enhances distillation effectiveness, suggesting structure over texture as a robust inductive bias. Overall, the work provides a computational perspective on human-like perception and offers a practical strategy for building more robust, data-efficient vision systems.

Abstract

Despite remarkable progress in computer vision, modern recognition systems remain fundamentally limited by their dependence on rich, redundant visual inputs. In contrast, humans can effortlessly understand sparse, minimal representations like line drawings, suggesting that structure, rather than appearance, underlies efficient visual understanding. In this work, we propose a novel structure-first learning paradigm that uses line drawings as an initial training modality to induce more compact and generalizable visual representations. We demonstrate that models trained with this approach develop a stronger shape bias, more focused attention, and greater data efficiency across classification, detection, and segmentation tasks. Notably, these models also exhibit lower intrinsic dimensionality, requiring significantly fewer principal components to capture representational variance, which mirrors observations of low-dimensional, efficient representations in the human brain. Beyond performance improvements, structure-first learning produces more compressible representations, enabling better distillation into lightweight student models. Students distilled from teachers trained on line drawings consistently outperform those trained from color-supervised teachers, highlighting the benefits of structurally compact knowledge. Together, our results support the view that structure-first visual learning fosters efficiency, generalization, and human-aligned inductive biases, offering a simple yet powerful strategy for building more robust and adaptable vision systems.

Learning More by Seeing Less: Structure First Learning for Efficient, Transferable, and Human-Aligned Vision

TL;DR

Abstract

Learning More by Seeing Less: Structure First Learning for Efficient, Transferable, and Human-Aligned Vision

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)