Adjoint state method

Template:Short description Template:Primary sources The adjoint state method is a numerical method for efficiently computing the gradient of a function or operator in a numerical optimization problem.^[1] It has applications in geophysics, seismic imaging, photonics and more recently in neural networks.^[2]

The adjoint state space is chosen to simplify the physical interpretation of equation constraints.^[3]

Adjoint state techniques allow the use of integration by parts, resulting in a form which explicitly contains the physically interesting quantity. An adjoint state equation is introduced, including a new unknown variable.

The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form. By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast. A nice property is that the number of computations is independent of the number of parameters for which you want the gradient. The adjoint method is derived from the dual problem^[4] and is used e.g. in the Landweber iteration method.^[5]

The name adjoint state method refers to the dual form of the problem, where the adjoint matrix $A^{*} = \overset{T}{\overline{A}}$ is used.

When the initial problem consists of calculating the product $s^{T} x$ and $x$ must satisfy $A x = b$ , the dual problem can be realized as calculating the product Template:Nowrap, where $r$ must satisfy $A^{*} r = s$ . And $r$ is called the adjoint state vector.

General case

The original adjoint calculation method goes back to Jean Cea,^[6] with the use of the Lagrangian of the optimization problem to compute the derivative of a functional with respect to a shape parameter.

For a state variable $u \in 𝒰$ , an optimization variable $v \in 𝒱$ , an objective functional $J : 𝒰 \times 𝒱 \to ℝ$ is defined. The state variable $u$ is often implicitly dependent on $v$ through the (direct) state equation $D_{v} (u) = 0$ (usually the weak form of a partial differential equation), thus the considered objective is $j (v) = J (u_{v}, v)$ , where $u_{v}$ is the solution of the state equation given the optimization variables $v$ . Usually, one would be interested in calculating $\nabla j (v)$ using the chain rule:

\nabla j (v) = \nabla_{v} J (u_{v}, v) + \nabla_{u} J (u_{v}) \nabla_{v} u_{v} .

Unfortunately, the term $\nabla_{v} u_{v}$ is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The Lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of $j$ , the problem

minimize j (v) = J (u_{v}, v)

subject to D_{v} (u_{v}) = 0

has an associate Lagrangian functional $ℒ : 𝒰 \times 𝒱 \times 𝒰 \to ℝ$ defined by

ℒ (u, v, λ) = J (u, v) + ⟨ D_{v} (u), λ ⟩,

where $λ \in 𝒰$ is a Lagrange multiplier or adjoint state variable and $⟨ \cdot, \cdot ⟩$ is an inner product on $𝒰$ . The method of Lagrange multipliers states that a solution to the problem has to be a stationary point of the lagrangian, namely

{\begin{matrix} d_{u} ℒ (u, v, λ; δ_{u}) = d_{u} J (u, v; δ_{u}) + ⟨ δ_{u}, D_{v}^{*} (λ) ⟩ = 0 & \forall δ_{u} \in 𝒰, \\ d_{v} ℒ (u, v, λ; δ_{v}) = d_{v} J (u, v; δ_{v}) + ⟨ d_{v} D_{v} (u; δ_{v}), λ ⟩ = 0 & \forall δ_{v} \in 𝒱, \\ d_{λ} ℒ (u, v, λ; δ_{λ}) = ⟨ D_{v} (u), δ_{λ} ⟩ = 0 & \forall δ_{λ} \in 𝒰, \end{matrix}

where $d_{x} F (x; δ_{x})$ is the Gateaux derivative of $F$ with respect to $x$ in the direction $δ_{x}$ . The last equation is equivalent to $D_{v} (u) = 0$ , the state equation, to which the solution is $u_{v}$ . The first equation is the so-called adjoint state equation,

⟨ δ_{u}, D_{v}^{*} (λ) ⟩ = - d_{u} J (u_{v}, v; δ_{u}) \forall δ_{u} \in 𝒰,

because the operator involved is the adjoint operator of $D_{v}$ , $D_{v}^{*}$ . Resolving this equation yields the adjoint state $λ_{v}$ . The gradient of the quantity of interest $j$ with respect to $v$ is $⟨ \nabla j (v), δ_{v} ⟩ = d_{v} j (v; δ_{v}) = d_{v} ℒ (u_{v}, v, λ_{v}; δ_{v})$ (the second equation with $u = u_{v}$ and $λ = λ_{v}$ ), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator $D_{v}$ is self-adjoint or symmetric since the direct and adjoint state equations differ only by their right-hand side.

Example: Linear case

In a real finite dimensional linear programming context, the objective function could be $J (u, v) = ⟨ A u, v ⟩$ , for $v \in ℝ^{n}$ , $u \in ℝ^{m}$ and $A \in ℝ^{n \times m}$ , and let the state equation be $B_{v} u = b$ , with $B_{v} \in ℝ^{m \times m}$ and $b \in ℝ^{m}$ .

The Lagrangian function of the problem is $ℒ (u, v, λ) = ⟨ A u, v ⟩ + ⟨ B_{v} u - b, λ ⟩$ , where $λ \in ℝ^{m}$ .

The derivative of $ℒ$ with respect to $λ$ yields the state equation as shown before, and the state variable is $u_{v} = B_{v}^{- 1} b$ . The derivative of $ℒ$ with respect to $u$ is equivalent to the adjoint equation, which is, for every $δ_{u} \in ℝ^{m}$ ,

d_{u} [⟨ B_{v} \cdot - b, λ ⟩] (u; δ_{u}) = - ⟨ A^{⊤} v, δ u ⟩ ⟺ ⟨ B_{v} δ_{u}, λ ⟩ = - ⟨ A^{⊤} v, δ u ⟩ ⟺ ⟨ B_{v}^{⊤} λ + A^{⊤} v, δ_{u} ⟩ = 0 ⟺ B_{v}^{⊤} λ = - A^{⊤} v .

Thus, we can write symbolically $λ_{v} = B_{v}^{- ⊤} A^{⊤} v$ . The gradient would be

⟨ \nabla j (v), δ_{v} ⟩ = ⟨ A u_{v}, δ_{v} ⟩ + ⟨ \nabla_{v} B_{v} : λ_{v} \otimes u_{v}, δ_{v} ⟩,

where $\nabla_{v} B_{v} = \frac{\partial B_{i j}}{\partial v_{k}}$ is a third-order tensor, $λ_{v} \otimes u_{v} = λ_{v}^{⊤} u_{v}$ is the dyadic product between the direct and adjoint states and $:$ denotes a double tensor contraction. It is assumed that $B_{v}$ has a known analytic expression that can be differentiated easily.

Numerical consideration for the self-adjoint case

If the operator $B_{v}$ was self-adjoint, $B_{v} = B_{v}^{⊤}$ , the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, a LU decomposition can be used instead to solve the state equation, in $O (m^{3})$ operations for the decomposition and $O (m^{2})$ operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only $O (m^{2})$ operations since the matrices are the same.

References

↑ Template:Cite journal
↑ Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud Neural Ordinary Differential Equations Available online
↑ Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503. free access on GJI website
↑ Template:Cite journal
↑ Template:Cite web
↑ Template:Cite journal

External links

A well written explanation by Errico: What is an adjoint Model?
Another well written explanation with worked examples, written by Bradley [1]
More technical explanation: A review of the adjoint-state method for computing the gradient of a functional with geophysical applications
MIT course [2]
MIT notes [3]

Template:Mathapplied-stub

[1] Template:Cite journal

[2] Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud Neural Ordinary Differential Equations Available online

[Plessix_2006_GJI-3] Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503. free access on GJI website

[4] Template:Cite journal

[5] Template:Cite web

[6] Template:Cite journal

[1]

[2]

[3]

[4]

[5]

[6]

Adjoint state method

Contents

General case

Example: Linear case

Numerical consideration for the self-adjoint case

See also

References

External links

Navigation menu

Adjoint state method

General case

Example: Linear case

Numerical consideration for the self-adjoint case

See also

References

External links

Navigation menu

Search