Table of Contents
Fetching ...

LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts

Yang Liu, Jiaye Yang, Weikang Li, Jiahui Liang, Yang Li, Lingyong Yan

TL;DR

LM-Lexicon is introduced, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture that achieves substantial improvements over existing methods on five widely used benchmarks.

Abstract

We introduce LM-Lexicon, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture. By decomposing the definition modeling task into specialized semantic domains, where small language models are trained as domain experts, LM-Lexicon achieves substantial improvements (+7% BLEU score compared with the prior state-of-the-art model) over existing methods on five widely used benchmarks. Empirically, we demonstrate that 1) the clustering strategy enables fine-grained expert specialization with nearly 10% improvement in definition quality; 2) the semantic-aware domain-level routing mechanism achieves higher expert efficacy (+1%) than conventional token-level routing; and 3) further performance gains can be obtained through test-time compute and semantic expert scaling. Our work advances definition modeling while providing insights into the development of efficient language models for semantic-intensive applications.

LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts

TL;DR

LM-Lexicon is introduced, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture that achieves substantial improvements over existing methods on five widely used benchmarks.

Abstract

We introduce LM-Lexicon, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture. By decomposing the definition modeling task into specialized semantic domains, where small language models are trained as domain experts, LM-Lexicon achieves substantial improvements (+7% BLEU score compared with the prior state-of-the-art model) over existing methods on five widely used benchmarks. Empirically, we demonstrate that 1) the clustering strategy enables fine-grained expert specialization with nearly 10% improvement in definition quality; 2) the semantic-aware domain-level routing mechanism achieves higher expert efficacy (+1%) than conventional token-level routing; and 3) further performance gains can be obtained through test-time compute and semantic expert scaling. Our work advances definition modeling while providing insights into the development of efficient language models for semantic-intensive applications.
Paper Structure (43 sections, 4 equations, 12 figures, 12 tables, 1 algorithm)

This paper contains 43 sections, 4 equations, 12 figures, 12 tables, 1 algorithm.

Figures (12)

  • Figure 1: Four examples of the term, context (input), and definition (output) for definition modeling task.
  • Figure 2: Diagram of LM-Lexicon (i.e., Specialize-then-Synthesize) framework.
  • Figure 3: Four-cluster UMAP plot of 10K random definitions of terms in 3D-EX (§\ref{['sec:experimental-setup']}). Each cluster is assigned manually with a [label] by their major constituents.
  • Figure 4: Best-of-N repeated sampling results (Bleu) on five benchmarks evaluated by oracle verifier.
  • Figure 5: Scaling performance gains and human evaluation results. The left figure: Scaling test performance on 3D-EX, with varying number of experts. The right figure: Human evaluation results across five criteria.
  • ...and 7 more figures