RTify: Aligning Deep Neural Networks with Human Behavioral Decisions

Yu-Ang Cheng; Ivan Felipe Rodriguez; Sixuan Chen; Kohitij Kar; Takeo Watanabe; Thomas Serre

RTify: Aligning Deep Neural Networks with Human Behavioral Decisions

Yu-Ang Cheng, Ivan Felipe Rodriguez, Sixuan Chen, Kohitij Kar, Takeo Watanabe, Thomas Serre

TL;DR

The paper addresses the mismatch between static accuracy metrics in vision models and the dynamic, time-resolved nature of human decisions. It introduces RTify, a differentiable framework that learns to align recurrent network dynamics to human reaction times by mapping hidden states to evidence and accumulating until a threshold is reached. The approach supports both supervised training on human RTs and self-penalized, ideal-observer optimization, and includes a differentiable, multi-class Wong-Wang module that can plug into CNNs. Across random dot motion and natural-image categorization tasks, RTify achieves superior fits to human RT distributions and reveals that human-like speed-accuracy trade-offs can emerge from self-penalized optimization, offering a pathway toward integrated, human-aligned vision-decision models.

Abstract

Current neural network models of primate vision focus on replicating overall levels of behavioral accuracy, often neglecting perceptual decisions' rich, dynamic nature. Here, we introduce a novel computational framework to model the dynamics of human behavioral choices by learning to align the temporal dynamics of a recurrent neural network (RNN) to human reaction times (RTs). We describe an approximation that allows us to constrain the number of time steps an RNN takes to solve a task with human RTs. The approach is extensively evaluated against various psychophysics experiments. We also show that the approximation can be used to optimize an "ideal-observer" RNN model to achieve an optimal tradeoff between speed and accuracy without human data. The resulting model is found to account well for human RT data. Finally, we use the approximation to train a deep learning implementation of the popular Wong-Wang decision-making model. The model is integrated with a convolutional neural network (CNN) model of visual processing and evaluated using both artificial and natural image stimuli. Overall, we present a novel framework that helps align current vision models with human behavior, bringing us closer to an integrated model of human vision.

RTify: Aligning Deep Neural Networks with Human Behavioral Decisions

TL;DR

Abstract

RTify: Aligning Deep Neural Networks with Human Behavioral Decisions

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (2)