Table of Contents
Fetching ...

Reframing Tax Law Entailment as Analogical Reasoning

Xinrui Zou, Ming Zhang, Nathaniel Weir, Benjamin Van Durme, Nils Holzenberger

TL;DR

This work reframes statutory reasoning as an analogy task by constructing quadruples (S1,C1,S2,C2) from SARA, enabling scalable data generation and interpretability through cross-pair analogy labels. It evaluates multiple approaches, including GPT prompting, vector-offset based analogies with SBERT embeddings, binary classification with T5-Large, and retrieval-augmented reasoning that uses BM25 or DPR to fetch similar pairs. The findings show that the analogy task remains challenging for current models, with vector-offset and GPT-based methods achieving modest gains and retrieval-augmented approaches offering the most promising improvements (best around 57% accuracy) though not always statistically significant on small test sets. Overall, the study suggests that integrating robust analogy capabilities with principled retrieval can advance statutory reasoning, while highlighting the need for stronger analogical models and larger evaluation suites in legal NLP.

Abstract

Statutory reasoning refers to the application of legislative provisions to a series of case facts described in natural language. We re-frame statutory reasoning as an analogy task, where each instance of the analogy task involves a combination of two instances of statutory reasoning. This increases the dataset size by two orders of magnitude, and introduces an element of interpretability. We show that this task is roughly as difficult to Natural Language Processing models as the original task. Finally, we come back to statutory reasoning, solving it with a combination of a retrieval mechanism and analogy models, and showing some progress on prior comparable work.

Reframing Tax Law Entailment as Analogical Reasoning

TL;DR

This work reframes statutory reasoning as an analogy task by constructing quadruples (S1,C1,S2,C2) from SARA, enabling scalable data generation and interpretability through cross-pair analogy labels. It evaluates multiple approaches, including GPT prompting, vector-offset based analogies with SBERT embeddings, binary classification with T5-Large, and retrieval-augmented reasoning that uses BM25 or DPR to fetch similar pairs. The findings show that the analogy task remains challenging for current models, with vector-offset and GPT-based methods achieving modest gains and retrieval-augmented approaches offering the most promising improvements (best around 57% accuracy) though not always statistically significant on small test sets. Overall, the study suggests that integrating robust analogy capabilities with principled retrieval can advance statutory reasoning, while highlighting the need for stronger analogical models and larger evaluation suites in legal NLP.

Abstract

Statutory reasoning refers to the application of legislative provisions to a series of case facts described in natural language. We re-frame statutory reasoning as an analogy task, where each instance of the analogy task involves a combination of two instances of statutory reasoning. This increases the dataset size by two orders of magnitude, and introduces an element of interpretability. We show that this task is roughly as difficult to Natural Language Processing models as the original task. Finally, we come back to statutory reasoning, solving it with a combination of a retrieval mechanism and analogy models, and showing some progress on prior comparable work.
Paper Structure (24 sections, 3 equations, 6 figures, 7 tables)

This paper contains 24 sections, 3 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Performing statutory reasoning as a combination of retrieval and analogy. To determine the label of $(S,C)$, candidate statute-case pairs are retrieved from a set of prototypes, and used to form quadruples. The quadruples are labeled as analogical or not. Analogy labels are aggregated through voting to yield labels for statutory reasoning.
  • Figure 2: Data generation. $(S_1, C_1)$ and $(S_3, C_3)$ are pairs from the SARA dataset labeled entailment; $(S_2, C_2)$ and $(S_4, C_4)$ are labeled contradiction. When forming quadruples, $(S_1, C_1, S_4, C_4)$ and $(S_2, C_2, S_3, C_3)$ are labeled not analogy (white cells); $(S_1, C_1, S_3, C_3)$ and $(S_2, C_2, S_4, C_4)$ are labeled analogy (green cells).
  • Figure 3: Vector offset methods.
  • Figure 4: Prediction accuracy as a function of the similarity threshold, for different methods of computing similarity scores. A similarity score larger than the threshold will be labeled as analogy, otherwise not analogy.
  • Figure 5: Statutory reasoning as retrieval and analogy with models from Section \ref{['sec:binary-classification']}. Left: BERT-based model. Right: T5-Large-based model.
  • ...and 1 more figures