Table of Contents
Fetching ...

Why do objects have many names? A study on word informativeness in language use and lexical systems

Eleonora Gualdoni, Gemma Boleda

Abstract

Human lexicons contain many different words that speakers can use to refer to the same object, e.g., "purple" or "magenta" for the same shade of color. On the one hand, studies on language use have explored how speakers adapt their referring expressions to successfully communicate in context, without focusing on properties of the lexical system. On the other hand, studies in language evolution have discussed how competing pressures for informativeness and simplicity shape lexical systems, without tackling in-context communication. We aim at bridging the gap between these traditions, and explore why a soft mapping between referents and words is a good solution for communication, by taking into account both in-context communication and the structure of the lexicon. We propose a simple measure of informativeness for words and lexical systems, grounded in a visual space, and analyze color naming data for English and Mandarin Chinese. We conclude that optimal lexical systems are those where multiple words can apply to the same referent, conveying different amounts of information. Such systems allow speakers to maximize communication accuracy and minimize the amount of information they convey when communicating about referents in contexts.

Why do objects have many names? A study on word informativeness in language use and lexical systems

Abstract

Human lexicons contain many different words that speakers can use to refer to the same object, e.g., "purple" or "magenta" for the same shade of color. On the one hand, studies on language use have explored how speakers adapt their referring expressions to successfully communicate in context, without focusing on properties of the lexical system. On the other hand, studies in language evolution have discussed how competing pressures for informativeness and simplicity shape lexical systems, without tackling in-context communication. We aim at bridging the gap between these traditions, and explore why a soft mapping between referents and words is a good solution for communication, by taking into account both in-context communication and the structure of the lexicon. We propose a simple measure of informativeness for words and lexical systems, grounded in a visual space, and analyze color naming data for English and Mandarin Chinese. We conclude that optimal lexical systems are those where multiple words can apply to the same referent, conveying different amounts of information. Such systems allow speakers to maximize communication accuracy and minimize the amount of information they convey when communicating about referents in contexts.

Paper Structure

This paper contains 19 sections, 3 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: To allow successful identification of a target color chip (in the black frame) within a grid of candidates, a general term like purple is sufficient when the context is not challenging (above). A more specific name like magenta is needed when the distractors compete more with the target (bottom) ---data from Monroe2017.
  • Figure 2: Denotation in the CIELAB color space of the words purple and magenta (a) and 蓝 "blue” and 海 "ocean”(b). Note that there is a difference in numbers of objects, that we control for when computing $I$. A color chip called 海 "ocean”(b) would not be located in the top and lighter part of the 蓝 "blue”denotation region; more specific names denote objects occupying smaller volumes in a visual feature space. Smaller volumes correspond to more information provided by the word to a listener, and higher utterance costs for a speaker. Best viewed in color.
  • Figure 3: Relationship between context ease and word informativeness ($I_w$) in the portion of Monroe2017's English dataset considered in Table \ref{['tab:model_eng_rep']}. Communication in easier contexts can be successful with less informative words.
  • Figure 4: Words like "bright” or "dark” are denoted by non-convex regions, resulting in low informativeness ($I_w$) scores.
  • Figure 5: Typicality effects in language production. A word with high I like mint (panel a) can be used when the context is not hard, if the target is very typical for that word. A word with low I like blue (panel b) can solve the ambiguity in a very hard contexts, if the target is much more typical for the color compared to the distractors.
  • ...and 3 more figures