Table of Contents
Fetching ...

Nonconvex Nonsmooth Multicomposite Optimization and Its Applications to Recurrent Neural Networks

Lingzi Jin, Xiao Wang, Xiaojun Chen

Abstract

We consider a class of nonconvex nonsmooth multicomposite optimization problems where the objective function consists of a Tikhonov regularizer and a composition of multiple nonconvex nonsmooth component functions. Such optimization problems arise from tangible applications in machine learning and beyond. To define and compute its first-order and second-order d(irectional)-stationary points effectively, we first derive the closed-form expression of the tangent cone for the feasible region of its constrained reformulation. Building on this, we establish its equivalence with the corresponding constrained and $\ell_1$-penalty reformulations in terms of global optimality and d-stationarity. The equivalence offers indirect methods to attain the first-order and second-order d-stationary points of the original problem in certain cases. We apply our results to the training process of recurrent neural networks (RNNs).

Nonconvex Nonsmooth Multicomposite Optimization and Its Applications to Recurrent Neural Networks

Abstract

We consider a class of nonconvex nonsmooth multicomposite optimization problems where the objective function consists of a Tikhonov regularizer and a composition of multiple nonconvex nonsmooth component functions. Such optimization problems arise from tangible applications in machine learning and beyond. To define and compute its first-order and second-order d(irectional)-stationary points effectively, we first derive the closed-form expression of the tangent cone for the feasible region of its constrained reformulation. Building on this, we establish its equivalence with the corresponding constrained and -penalty reformulations in terms of global optimality and d-stationarity. The equivalence offers indirect methods to attain the first-order and second-order d-stationary points of the original problem in certain cases. We apply our results to the training process of recurrent neural networks (RNNs).

Paper Structure

This paper contains 13 sections, 3 theorems, 120 equations, 1 figure.

Key Result

Proposition 3.1

Under Assumption as1, $\Psi$ is directionally differentiable on $\mathbb{R}^{n}$, and the directional derivative of the objective function of eq:1.1 along any $d_{\theta} \in \mathbb{R}^{n}$ is where $u_{\ell}:= \psi_{\ell-1}( \theta, u_{1}, \dots, u_{\ell-1} )$ for all $\ell \in [L]$, and $d_{u_{\ell}} := \psi_{\ell-1}^{\prime} ( \theta, u_{1},\dots,u_{\ell-1}; d_{\theta}, d_{u_{1}} , \dots, d_

Figures (1)

  • Figure 1: Architectures of RNN in \ref{['form2-eq']} and DNN in DNN-CHP

Theorems & Definitions (21)

  • proof
  • Remark 2.1
  • proof
  • Example 3.1
  • Proposition 3.1
  • proof
  • Proposition 3.2
  • proof
  • proof
  • proof
  • ...and 11 more