Mimicking How Humans Interpret Out-of-Context Sentences Through Controlled Toxicity Decoding

Maria Mihaela Trusca; Liesbeth Allein

Mimicking How Humans Interpret Out-of-Context Sentences Through Controlled Toxicity Decoding

Maria Mihaela Trusca, Liesbeth Allein

TL;DR

This paper aims to simulate how readers perceive content with varying toxicity levels by generating diverse interpretations of out-of-context sentences by improving alignment with human-written interpretations in both syntax and semantics while reducing model prediction uncertainty.

Abstract

Interpretations of a single sentence can vary, particularly when its context is lost. This paper aims to simulate how readers perceive content with varying toxicity levels by generating diverse interpretations of out-of-context sentences. By modeling toxicity, we can anticipate misunderstandings and reveal hidden toxic meanings. Our proposed decoding strategy explicitly controls toxicity in the set of generated interpretations by (i) aligning interpretation toxicity with the input, (ii) relaxing toxicity constraints for more toxic input sentences, and (iii) promoting diversity in toxicity levels within the set of generated interpretations. Experimental results show that our method improves alignment with human-written interpretations in both syntax and semantics while reducing model prediction uncertainty.