Normalizing Speed-accuracy Biases in 2D Pointing Tasks with Better Calculation of Effective Target Widths
Shota Yamanaka, I. Scott MacKenzie
TL;DR
The paper investigates how to normalize speed–accuracy biases in 2D pointing tasks by comparing univariate ($\upsigma_x$) and bivariate ($\upsigma_{xy}$) standard deviations for the effective width $W_ ext{e}$ within an ISO-style Fitts' law framework. Using three bias instructions and a large crowdsourced dataset (n=342) across multiple amplitudes and widths, the authors show that univariate $\upsigma_x$ yields higher model-fit ($R^2$) in mixed bias conditions, while TT-based effective width models with effective amplitude ($A_e$) offer the strongest throughput stability. A Monte Carlo simulation confirms these findings hold for smaller samples, with bivariate models rarely outperforming univariate ones as $N$ grows. The study argues for adopting ID$_{xTT}$ (and ID$_{xTT}A_e$ for TP stability) and TT task-axis definitions as practical defaults, contributing to standardized, bias-resistant evaluation of pointing performance in HCI. The work clarifies prior inconsistencies with Wobbrock et al. (2011) and provides guidance for reproducible reporting of $W_e$ calculations, task-axis choices, and bias normalization in future 2D Fitts’ law studies.
Abstract
For evaluations of 2D target selection using Fitts' law, ISO 9241-411 recommends using the effective target width (W_e) calculated using the univariate standard deviation of selection coordinates. Related research proposed using a bivariate standard deviation; however, the proposal was only tested using a single speed-accuracy bias condition, thus the assessment was limited. We compared the univariate and bivariate techniques in a 2D Fitts' law experiment using three speed-accuracy biases and 346 crowdworkers. Calculating W_e using the univariate standard deviation yielded higher model correlations across all bias conditions and produced more stable throughput among the biases. The findings were also consistent in cases using randomly sampled subsets of the participant data. We recommend that future research should calculate W_e using the univariate standard deviation for fair performance evaluations. Also, we found trivial effects when using nominal or effective amplitude and using different perspectives of the task axis.
