SoftStep: Learning Sparse Similarity Powers Deep Neighbor-Based Regression
Aviad Susman, Baihan Lin, Mayte Suárez-Fariñas, Joseph T Colonel
TL;DR
The paper addresses the underutilization of neighbor-based regression in deep learning by introducing SoftStep, a differentiable module that learns sparse instance-wise similarities to power nonlinear, neighbor-based regression heads. It provides theoretical links showing that mean squared error on neighbor-based predictions induces structuring constraints on embedding spaces (pairwise and triplet relationships) and demonstrates through extensive experiments that SoftStep-enhanced heads outperform traditional linear predictors across varied architectures and unstructured data domains. By unifying neighbor-based regression with differentiable similarity warping, the work highlights a bridge to sparse attention and representational alignment, offering a plug-in approach that improves expressiveness without sacrificing end-to-end training. The findings suggest SoftStep as a general mechanism for adaptive, sparse similarity in deep networks, with potential applications in attention, metric learning, and representational analysis beyond regression.
Abstract
Neighbor-based methods are a natural alternative to linear prediction for tabular data when relationships between inputs and targets exhibit complexity such as nonlinearity, periodicity, or heteroscedasticity. Yet in deep learning on unstructured data, nonparametric neighbor-based approaches are rarely implemented in lieu of simple linear heads. This is primarily due to the ability of systems equipped with linear regression heads to co-learn internal representations along with the linear head's parameters. To unlock the full potential of neighbor-based methods in neural networks we introduce SoftStep, a parametric module that learns sparse instance-wise similarity measures directly from data. When integrated with existing neighbor-based methods, SoftStep enables regression models that consistently outperform linear heads across diverse architectures, domains, and training scenarios. We focus on regression tasks, where we show theoretically that neighbor-based prediction with a mean squared error objective constitutes a metric learning algorithm that induces well-structured embedding spaces. We then demonstrate analytically and empirically that this representational structure translates into superior performance when combined with the sparse, instance-wise similarity measures introduced by SoftStep. Beyond regression, SoftStep is a general method for learning instance-wise similarity in deep neural networks, with broad applicability to attention mechanisms, metric learning, representational alignment, and related paradigms.
