Table of Contents
Fetching ...

Human Guided Learning of Transparent Regression Models

Lukas Pensel, Stefan Kramer

TL;DR

This work introduces HuGuR, a human-in-the-loop framework for permutation regression that builds interpretable models from binary, human-understandable order constraints. The model combines a gradient-boosted regressor with a constraint-derived feature space, yielding ŷ(x) = μ + ∑_{i=1}^{l} β_i g_{ρ_i}(x), and uses greedy gradient boosting to iteratively select informative constraints. A user study across nine real-world datasets shows HuGuR often outperforms naive and fixed-encoding baselines on small data and remains competitive with neural sequence encoders on larger data, while using far fewer parameters. The results support the value of interactive, constraint-guided modeling for transparency and performance in permutation-based tasks, with future work extending to trust studies and broader pattern domains.

Abstract

We present a human-in-the-loop (HIL) approach to permutation regression, the novel task of predicting a continuous value for a given ordering of items. The model is a gradient boosted regression model that incorporates simple human-understandable constraints of the form x < y, i.e. item x has to be before item y, as binary features. The approach, HuGuR (Human Guided Regression), lets a human explore the search space of such transparent regression models. Interacting with HuGuR, users can add, remove, and refine order constraints interactively, while the coefficients are calculated on the fly. We evaluate HuGuR in a user study and compare the performance of user-built models with multiple baselines on 9 data sets. The results show that the user-built models outperform the compared methods on small data sets and in general perform on par with the other methods, while being in principle understandable for humans. On larger datasets from the same domain, machine-induced models begin to outperform the user-built models. Further work will study the trust users have in models when constructed by themselves and how the scheme can be transferred to other pattern domains, such as strings, sequences, trees, or graphs.

Human Guided Learning of Transparent Regression Models

TL;DR

This work introduces HuGuR, a human-in-the-loop framework for permutation regression that builds interpretable models from binary, human-understandable order constraints. The model combines a gradient-boosted regressor with a constraint-derived feature space, yielding ŷ(x) = μ + ∑_{i=1}^{l} β_i g_{ρ_i}(x), and uses greedy gradient boosting to iteratively select informative constraints. A user study across nine real-world datasets shows HuGuR often outperforms naive and fixed-encoding baselines on small data and remains competitive with neural sequence encoders on larger data, while using far fewer parameters. The results support the value of interactive, constraint-guided modeling for transparency and performance in permutation-based tasks, with future work extending to trust studies and broader pattern domains.

Abstract

We present a human-in-the-loop (HIL) approach to permutation regression, the novel task of predicting a continuous value for a given ordering of items. The model is a gradient boosted regression model that incorporates simple human-understandable constraints of the form x < y, i.e. item x has to be before item y, as binary features. The approach, HuGuR (Human Guided Regression), lets a human explore the search space of such transparent regression models. Interacting with HuGuR, users can add, remove, and refine order constraints interactively, while the coefficients are calculated on the fly. We evaluate HuGuR in a user study and compare the performance of user-built models with multiple baselines on 9 data sets. The results show that the user-built models outperform the compared methods on small data sets and in general perform on par with the other methods, while being in principle understandable for humans. On larger datasets from the same domain, machine-induced models begin to outperform the user-built models. Further work will study the trust users have in models when constructed by themselves and how the scheme can be transferred to other pattern domains, such as strings, sequences, trees, or graphs.

Paper Structure

This paper contains 18 sections, 6 equations, 4 figures, 7 tables, 2 algorithms.

Figures (4)

  • Figure 1: In Human Guided Regression (HuGuR), the user chooses which part of a model to refine (1.), the newly refined model (after step 2.) still has to be parameterized based on the training set (step 3.), the resulting model is validated on the validation set (4.), the performance is monitored and presented to the user (5.). The model refinement goes in cycles. Upon completion, the model chosen as the best is tested on the test set (6.).
  • Figure 2: Exemplary use of HuGuR. The first step i is to select the hyperparameters, which are the number of constraints added in each step and the learning rate of the model. The number of constraints $l$ is an integer between 1 and 20 and the learning rate is a float between 0.000001 and 1. After the generation, we obtain the model view. Here, we see the constraints the model encompasses, and we can interact with them. Blue constraints indicate a positive impact on the target value and red constraints indicate a negative impact. By clicking on an active constraint ii, we deactivate the constraint and replace it by the $l$ most promising child constraints. And vice versa, by clicking on an inactive constraint iii, we activate it and remove all of its child constraints. We can also use additional controls iv to restart with other hyperparameters or reset to a previous iteration. The error history v lets us track our progress over the iterations. Also, one can always jump back to the best model found so far vi.
  • Figure 3: For each data set -- data sets with small and large versions such as $edm_5$ are combined into one plot -- we examine the relation between the measured performance, here the $R^2$ score, and multiple properties of the user-built models. Each point represents one human-built model. A dotted line represents a negative slope of the line of best fit, while a solid line represents a positive slope.
  • Figure 4: Average model properties in relation to the average performance for each participant. A solid line represents a positively sloped line of best fit.

Theorems & Definitions (1)

  • Definition 1