Table of Contents
Fetching ...

What has LeBenchmark Learnt about French Syntax?

Zdravko Dugonjić, Adrien Pupier, Benjamin Lecouteux, Maximin Coavoux

TL;DR

The paper investigates whether a French wav2vec2-style acoustic model, LeBenchmark, encodes syntactic information despite training on raw speech. It uses layer-wise linear probes on 24 representations to predict part-of-speech tags and unlabeled dependency relations, evaluated on the Orféo treebank. The results show that syntactic information is most extractable in the middle layers (around layers 14–15) with a sharp drop in the final layers, and that local syntactic cues are easier to recover than longer-distance dependencies. These findings shed light on how SSL speech models encode structure in French and suggest that middle layers are most promising for syntax-oriented downstream tasks.

Abstract

The paper reports on a series of experiments aiming at probing LeBenchmark, a pretrained acoustic model trained on 7k hours of spoken French, for syntactic information. Pretrained acoustic models are increasingly used for downstream speech tasks such as automatic speech recognition, speech translation, spoken language understanding or speech parsing. They are trained on very low level information (the raw speech signal), and do not have explicit lexical knowledge. Despite that, they obtained reasonable results on tasks that requires higher level linguistic knowledge. As a result, an emerging question is whether these models encode syntactic information. We probe each representation layer of LeBenchmark for syntax, using the Orféo treebank, and observe that it has learnt some syntactic information. Our results show that syntactic information is more easily extractable from the middle layers of the network, after which a very sharp decrease is observed.

What has LeBenchmark Learnt about French Syntax?

TL;DR

The paper investigates whether a French wav2vec2-style acoustic model, LeBenchmark, encodes syntactic information despite training on raw speech. It uses layer-wise linear probes on 24 representations to predict part-of-speech tags and unlabeled dependency relations, evaluated on the Orféo treebank. The results show that syntactic information is most extractable in the middle layers (around layers 14–15) with a sharp drop in the final layers, and that local syntactic cues are easier to recover than longer-distance dependencies. These findings shed light on how SSL speech models encode structure in French and suggest that middle layers are most promising for syntax-oriented downstream tasks.

Abstract

The paper reports on a series of experiments aiming at probing LeBenchmark, a pretrained acoustic model trained on 7k hours of spoken French, for syntactic information. Pretrained acoustic models are increasingly used for downstream speech tasks such as automatic speech recognition, speech translation, spoken language understanding or speech parsing. They are trained on very low level information (the raw speech signal), and do not have explicit lexical knowledge. Despite that, they obtained reasonable results on tasks that requires higher level linguistic knowledge. As a result, an emerging question is whether these models encode syntactic information. We probe each representation layer of LeBenchmark for syntax, using the Orféo treebank, and observe that it has learnt some syntactic information. Our results show that syntactic information is more easily extractable from the middle layers of the network, after which a very sharp decrease is observed.
Paper Structure (17 sections, 4 figures)

This paper contains 17 sections, 4 figures.

Figures (4)

  • Figure 1: Illustration of the relative head position annotation scheme.
  • Figure 2: POS tagging task accuracy per layer.
  • Figure 3: Relative head distance prediction task accuracy (UAS) per layer.
  • Figure 4: Per-category evaluation of the best layer (layer 14) on the relative head distance prediction task with two evaluation metrics: accuracy (UAS) and F-score.