Generating bilingual example sentences with large language models as lexicography assistants

Raphael Merx; Ekaterina Vylomova; Kemal Kurniawan

Generating bilingual example sentences with large language models as lexicography assistants

Raphael Merx, Ekaterina Vylomova, Kemal Kurniawan

TL;DR

It is demonstrated that in-context learning can successfully align LLMs with individual annotator preferences, and the use of pre-trained language models for automated rating of examples is explored, finding that sentence perplexity serves as a good proxy for “typicality” and “intelligibility” in higher-resourced languages.

Abstract

We present a study of LLMs' performance in generating and rating example sentences for bilingual dictionaries across languages with varying resource levels: French (high-resource), Indonesian (mid-resource), and Tetun (low-resource), with English as the target language. We evaluate the quality of LLM-generated examples against the GDEX (Good Dictionary EXample) criteria: typicality, informativeness, and intelligibility. Our findings reveal that while LLMs can generate reasonably good dictionary examples, their performance degrades significantly for lower-resourced languages. We also observe high variability in human preferences for example quality, reflected in low inter-annotator agreement rates. To address this, we demonstrate that in-context learning can successfully align LLMs with individual annotator preferences. Additionally, we explore the use of pre-trained language models for automated rating of examples, finding that sentence perplexity serves as a good proxy for typicality and intelligibility in higher-resourced languages. Our study also contributes a novel dataset of 600 ratings for LLM-generated sentence pairs, and provides insights into the potential of LLMs in reducing the cost of lexicographic work, particularly for low-resource languages.

Generating bilingual example sentences with large language models as lexicography assistants

TL;DR

Abstract

Paper Structure (37 sections, 2 figures, 7 tables)

This paper contains 37 sections, 2 figures, 7 tables.

Introduction
Background
LLMs for synthetic data generation.
Automated extraction and generation of dictionary examples.
Research gap.
LLM generation of bilingual example sentences
Methodology for generation
Word selection
Example generation
Annotator selection and training
Annotation
Quality of LLM-generated examples
Per language
Per LLM
Per GDEX criteria
...and 22 more sections

Figures (2)

Figure 1: Overview of our process for generating example sentence pairs using LLMs.
Figure 2: Rating distributions (GPT-4o and Llama 3.1 combined) for GDEX criteria and translation correctness.

Generating bilingual example sentences with large language models as lexicography assistants

TL;DR

Abstract

Generating bilingual example sentences with large language models as lexicography assistants

Authors

TL;DR

Abstract

Table of Contents

Figures (2)