Fast and General Automatic Differentiation for Finite-State Methods
Lucas Ondel Yang, Tina Raissi, Martin Kocour, Pablo Riera, Caio Corro
TL;DR
The paper tackles the bottleneck of automatic differentiation for semiring-based dynamic programming in structured prediction. By introducing the morphism-trick, it flattens the backward pass when the semiring's additive monoid is real-line isomorphic, enabling semiring-agnostic vector-Jacobian products and large-scale, memory-efficient gradients for finite-state methods. They present a general DP formulation for WFSA weights nu(A) and derive a morphism-based differentiation strategy, yielding orders-of-magnitude speedups over standard AD approaches and providing an open-source implementation in TensorAutomata.jl. The approach extends to several semiring families, including log-semirings and multi-valued semirings, with caveats for idempotent semirings, broadening the applicability of gradient-based learning in structured prediction tasks.
Abstract
We propose a new method, that we coined the ``morphism-trick'', to integrate custom implementations of vector-Jacobian products in automatic differentiation softwares, applicable to a wide range of semiring-based computations. Our approach leads to efficient and semiring-agnostic implementations of the backward pass of dynamic programming algorithms. For the particular case of finite-state methods, we introduce an algorithm that computes and differentiates the $\oplus$-sum of all paths' weight of a finite-state automaton. Results show that, with minimal effort from the user, our novel library allows computing the gradient of a function w.r.t. to the weights of a finite state automaton orders of magnitude faster than state-of-the-art automatic differentiation systems. Implementations are made available via an open-source library distributed under a permissive license.
