Table of Contents
Fetching ...

Hebrew Diacritics Restoration using Visual Representation

Yair Elboher, Yuval Pinter

TL;DR

This work reframes Hebrew diacritization as a zero-shot word-level classification problem and introduces DiVRit, a visual-language model that processes undiacritized Hebrew words as images. By coupling a visual candidate encoder (based on Hebrew PIXEL) with a contextual text-based encoder (AlephBertGimmel-small) and training with a diacritization-focused objective, the method selects the best diacritized candidate from a generated set, without relying on explicit linguistic rules. Experimental results show competitive performance, especially in oracle settings where the correct candidate is guaranteed to be present, and reveal the importance of pretraining, finetuning duration, and candidate-generation quality for generalization. The findings highlight the potential of visual representations to advance diacritization tasks and point toward extending the approach to other diacritics-rich languages and modalities.

Abstract

Diacritics restoration in Hebrew is a fundamental task for ensuring accurate word pronunciation and disambiguating textual meaning. Despite the language's high degree of ambiguity when unvocalized, recent machine learning approaches have significantly advanced performance on this task. In this work, we present DIVRIT, a novel system for Hebrew diacritization that frames the task as a zero-shot classification problem. Our approach operates at the word level, selecting the most appropriate diacritization pattern for each undiacritized word from a dynamically generated candidate set, conditioned on the surrounding textual context. A key innovation of DIVRIT is its use of a Hebrew Visual Language Model, which processes undiacritized text as an image, allowing diacritic information to be embedded directly within the input's vector representation. Through a comprehensive evaluation across various configurations, we demonstrate that the system effectively performs diacritization without relying on complex, explicit linguistic analysis. Notably, in an ``oracle'' setting where the correct diacritized form is guaranteed to be among the provided candidates, DIVRIT achieves a high level of accuracy. Furthermore, strategic architectural enhancements and optimized training methodologies yield significant improvements in the system's overall generalization capabilities. These findings highlight the promising potential of visual representations for accurate and automated Hebrew diacritization.

Hebrew Diacritics Restoration using Visual Representation

TL;DR

This work reframes Hebrew diacritization as a zero-shot word-level classification problem and introduces DiVRit, a visual-language model that processes undiacritized Hebrew words as images. By coupling a visual candidate encoder (based on Hebrew PIXEL) with a contextual text-based encoder (AlephBertGimmel-small) and training with a diacritization-focused objective, the method selects the best diacritized candidate from a generated set, without relying on explicit linguistic rules. Experimental results show competitive performance, especially in oracle settings where the correct candidate is guaranteed to be present, and reveal the importance of pretraining, finetuning duration, and candidate-generation quality for generalization. The findings highlight the potential of visual representations to advance diacritization tasks and point toward extending the approach to other diacritics-rich languages and modalities.

Abstract

Diacritics restoration in Hebrew is a fundamental task for ensuring accurate word pronunciation and disambiguating textual meaning. Despite the language's high degree of ambiguity when unvocalized, recent machine learning approaches have significantly advanced performance on this task. In this work, we present DIVRIT, a novel system for Hebrew diacritization that frames the task as a zero-shot classification problem. Our approach operates at the word level, selecting the most appropriate diacritization pattern for each undiacritized word from a dynamically generated candidate set, conditioned on the surrounding textual context. A key innovation of DIVRIT is its use of a Hebrew Visual Language Model, which processes undiacritized text as an image, allowing diacritic information to be embedded directly within the input's vector representation. Through a comprehensive evaluation across various configurations, we demonstrate that the system effectively performs diacritization without relying on complex, explicit linguistic analysis. Notably, in an ``oracle'' setting where the correct diacritized form is guaranteed to be among the provided candidates, DIVRIT achieves a high level of accuracy. Furthermore, strategic architectural enhancements and optimized training methodologies yield significant improvements in the system's overall generalization capabilities. These findings highlight the promising potential of visual representations for accurate and automated Hebrew diacritization.

Paper Structure

This paper contains 16 sections, 1 equation, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Architecture of the diacritizer. A context encoder (right) captures contextual information, while a visual candidate encoder (left) processes an aligned portion of each candidate from the candidate generator. The resulting embeddings are then compared in a scoring layer.
  • Figure 2: KNN-based candidate generator. The figure illustrates an example of the algorithm on a specific OOV word ($k=5, c=2$). Similar diacritization patterns are colored with the same color, and the output candidate set shows each pattern applied to the input word. The green pattern is then removed to yield a $c$-sized candidate set.
  • Figure 3: Coverage rate per candidate set size, i.e. the fraction of times the correct candidate appears in the KNN-generated set of each size.