Table of Contents
Fetching ...

Regulation of Language Models With Interpretability Will Likely Result In A Performance Trade-Off

Eoin M. Kenny, Julie A. Shah

TL;DR

The paper tackles regulatory constraints on large language models by developing a regulatable, prototype-based LLM designed to use human-defined concepts in a transparent manner within insurance liability tasks. It formalizes the Regulation Performance Trade-Off, balancing compliance with interpretability against traditional predictive performance, and demonstrates that enforcing regulatability can incur about a 7.34% drop in class accuracy while still improving human task efficiency and confidence in deployment. The authors present a two-dataset evaluation (insurance liability and beer reviews) and a pilot with eight adjusters to assess real-world utility, showing that human-AI collaboration can benefit even under regulatory constraints. This work advances practical pathways for auditable, safer AI in high-stakes domains, while outlining limitations and directions for broader generalization and end-to-end training improvements.

Abstract

Regulation is increasingly cited as the most important and pressing concern in machine learning. However, it is currently unknown how to implement this, and perhaps more importantly, how it would effect model performance alongside human collaboration if actually realized. In this paper, we attempt to answer these questions by building a regulatable large-language model (LLM), and then quantifying how the additional constraints involved affect (1) model performance, alongside (2) human collaboration. Our empirical results reveal that it is possible to force an LLM to use human-defined features in a transparent way, but a "regulation performance trade-off" previously not considered reveals itself in the form of a 7.34% classification performance drop. Surprisingly however, we show that despite this, such systems actually improve human task performance speed and appropriate confidence in a realistic deployment setting compared to no AI assistance, thus paving a way for fair, regulatable AI, which benefits users.

Regulation of Language Models With Interpretability Will Likely Result In A Performance Trade-Off

TL;DR

The paper tackles regulatory constraints on large language models by developing a regulatable, prototype-based LLM designed to use human-defined concepts in a transparent manner within insurance liability tasks. It formalizes the Regulation Performance Trade-Off, balancing compliance with interpretability against traditional predictive performance, and demonstrates that enforcing regulatability can incur about a 7.34% drop in class accuracy while still improving human task efficiency and confidence in deployment. The authors present a two-dataset evaluation (insurance liability and beer reviews) and a pilot with eight adjusters to assess real-world utility, showing that human-AI collaboration can benefit even under regulatory constraints. This work advances practical pathways for auditable, safer AI in high-stakes domains, while outlining limitations and directions for broader generalization and end-to-end training improvements.

Abstract

Regulation is increasingly cited as the most important and pressing concern in machine learning. However, it is currently unknown how to implement this, and perhaps more importantly, how it would effect model performance alongside human collaboration if actually realized. In this paper, we attempt to answer these questions by building a regulatable large-language model (LLM), and then quantifying how the additional constraints involved affect (1) model performance, alongside (2) human collaboration. Our empirical results reveal that it is possible to force an LLM to use human-defined features in a transparent way, but a "regulation performance trade-off" previously not considered reveals itself in the form of a 7.34% classification performance drop. Surprisingly however, we show that despite this, such systems actually improve human task performance speed and appropriate confidence in a realistic deployment setting compared to no AI assistance, thus paving a way for fair, regulatable AI, which benefits users.

Paper Structure

This paper contains 31 sections, 6 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: The Regulation Performance Trade-Off: A black-box LLM will learn to use the optimal feature set which minimizes its loss on the training data. In contrast, an interpretable LLM will often compromise performance by adding the constraint to only use a human-interpretable feature subset. Lastly, a regulatable LLM will further constrain this to be a feature set that is legally permissible. Naturally, these constraints will possibly lead to a degradation in performance. Note what there are exceptions, as e.g. what is considered interpretable can sometimes not degrade performance kenny2023towards.
  • Figure 2: Our proposed framework for regulatable LLMs: A test instance has its sentences encoded and compared to prototypes representing regulatable concepts defined a-priori by humans. The maximum activation for each concept is used as similarity scores for the model's forward pass. Note, the test instance $x$ in this example is fabricated, it is not an example of real data.
  • Figure 3: Time Results: Each user's average time to complete statements with and without the AI assistant is shown. Statistically significant results were seen in those users who benefited form the AI against those who did not, with both forming two distinct clusters regardless of their baseline without the AI. Standard error shown. The dashed line represents User 3 who seemed averse to the AI overall.
  • Figure 4: Page 1 of user study
  • Figure 5: Page 2 of user study
  • ...and 4 more figures