LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts

Yang Liu; Jiaye Yang; Weikang Li; Jiahui Liang; Yang Li; Lingyong Yan

LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts

Yang Liu, Jiaye Yang, Weikang Li, Jiahui Liang, Yang Li, Lingyong Yan

TL;DR

LM-Lexicon is introduced, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture that achieves substantial improvements over existing methods on five widely used benchmarks.

Abstract

We introduce LM-Lexicon, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture. By decomposing the definition modeling task into specialized semantic domains, where small language models are trained as domain experts, LM-Lexicon achieves substantial improvements (+7% BLEU score compared with the prior state-of-the-art model) over existing methods on five widely used benchmarks. Empirically, we demonstrate that 1) the clustering strategy enables fine-grained expert specialization with nearly 10% improvement in definition quality; 2) the semantic-aware domain-level routing mechanism achieves higher expert efficacy (+1%) than conventional token-level routing; and 3) further performance gains can be obtained through test-time compute and semantic expert scaling. Our work advances definition modeling while providing insights into the development of efficient language models for semantic-intensive applications.

LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts

TL;DR

Abstract

Paper Structure (43 sections, 4 equations, 12 figures, 12 tables, 1 algorithm)

This paper contains 43 sections, 4 equations, 12 figures, 12 tables, 1 algorithm.

Introduction
Related Work
Upcycling to Mixture-of-Experts.
Definition Modeling.
Methodology
Overview of LM-Lexicon
Learning Domain-specific Semantic Experts
Dataset Construction.
Clustering.
Experts Training.
Merging Experts into a Unified MoE
Model Merging.
Experiments
Implementation Details
Datasets.
...and 28 more sections

Figures (12)

Figure 1: Four examples of the term, context (input), and definition (output) for definition modeling task.
Figure 2: Diagram of LM-Lexicon (i.e., Specialize-then-Synthesize) framework.
Figure 3: Four-cluster UMAP plot of 10K random definitions of terms in 3D-EX (§\ref{['sec:experimental-setup']}). Each cluster is assigned manually with a [label] by their major constituents.
Figure 4: Best-of-N repeated sampling results (Bleu) on five benchmarks evaluated by oracle verifier.
Figure 5: Scaling performance gains and human evaluation results. The left figure: Scaling test performance on 3D-EX, with varying number of experts. The right figure: Human evaluation results across five criteria.
...and 7 more figures

LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts

TL;DR

Abstract

LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts

Authors

TL;DR

Abstract

Table of Contents

Figures (12)