Exploring the Hierarchical Reasoning Model for Small Natural-Image Classification Without Augmentation

Alexander V. Mantzaris

Exploring the Hierarchical Reasoning Model for Small Natural-Image Classification Without Augmentation

Alexander V. Mantzaris

TL;DR

The paper evaluates the Hierarchical Reasoning Model (HRM) as a practical image classifier under a deliberately no-augmentation regime, using two Transformer-style modules, a DEQ-based one-step gradient, deep supervision, and modern normalization/positional techniques. It compares HRM to a conventional CNN baseline on MNIST, CIFAR-10, and CIFAR-100, revealing strong MNIST performance but substantial generalization gaps on CIFAR-10/100 due to insufficient image-specific inductive bias. The results indicate that HRM can train stably with small parameter budgets, but without augmentation or additional inductive structure it underperforms simple convolutional architectures on small natural images. The work highlights potential directions for improving HRM, such as architectural tweaks to bolster image priors and regularization in the no-augmentation setting, to realize its theoretical advantages in practical classification tasks.

Abstract

This paper asks whether the Hierarchical Reasoning Model (HRM) with the two Transformer-style modules $(f_L,f_H)$, one step (DEQ-style) training, deep supervision, Rotary Position Embeddings, and RMSNorm can serve as a practical image classifier. It is evaluated on MNIST, CIFAR-10, and CIFAR-100 under a deliberately raw regime: no data augmentation, identical optimizer family with one-epoch warmup then cosine-floor decay, and label smoothing. HRM optimizes stably and performs well on MNIST ($\approx 98\%$ test accuracy), but on small natural images it overfits and generalizes poorly: on CIFAR-10, HRM reaches 65.0\% after 25 epochs, whereas a two-stage Conv--BN--ReLU baseline attains 77.2\% while training $\sim 30\times$ faster per epoch; on CIFAR-100, HRM achieves only 29.7\% test accuracy despite 91.5\% train accuracy, while the same CNN reaches 45.3\% test with 50.5\% train accuracy. Loss traces and error analyses indicate healthy optimization but insufficient image-specific inductive bias for HRM in this regime. It is concluded that, for small-resolution image classification without augmentation, HRM is not competitive with even simple convolutional architectures as the HRM currently exist but this does not exclude possibilities that modifications to the model may allow it to improve greatly.

Exploring the Hierarchical Reasoning Model for Small Natural-Image Classification Without Augmentation

TL;DR

Abstract

This paper asks whether the Hierarchical Reasoning Model (HRM) with the two Transformer-style modules

, one step (DEQ-style) training, deep supervision, Rotary Position Embeddings, and RMSNorm can serve as a practical image classifier. It is evaluated on MNIST, CIFAR-10, and CIFAR-100 under a deliberately raw regime: no data augmentation, identical optimizer family with one-epoch warmup then cosine-floor decay, and label smoothing. HRM optimizes stably and performs well on MNIST (

test accuracy), but on small natural images it overfits and generalizes poorly: on CIFAR-10, HRM reaches 65.0\% after 25 epochs, whereas a two-stage Conv--BN--ReLU baseline attains 77.2\% while training

faster per epoch; on CIFAR-100, HRM achieves only 29.7\% test accuracy despite 91.5\% train accuracy, while the same CNN reaches 45.3\% test with 50.5\% train accuracy. Loss traces and error analyses indicate healthy optimization but insufficient image-specific inductive bias for HRM in this regime. It is concluded that, for small-resolution image classification without augmentation, HRM is not competitive with even simple convolutional architectures as the HRM currently exist but this does not exclude possibilities that modifications to the model may allow it to improve greatly.

Exploring the Hierarchical Reasoning Model for Small Natural-Image Classification Without Augmentation

TL;DR

Abstract

Exploring the Hierarchical Reasoning Model for Small Natural-Image Classification Without Augmentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)