Adjoint state method

From testwiki
Jump to navigation Jump to search

Template:Short description Template:Primary sources The adjoint state method is a numerical method for efficiently computing the gradient of a function or operator in a numerical optimization problem.[1] It has applications in geophysics, seismic imaging, photonics and more recently in neural networks.[2]

The adjoint state space is chosen to simplify the physical interpretation of equation constraints.[3]

Adjoint state techniques allow the use of integration by parts, resulting in a form which explicitly contains the physically interesting quantity. An adjoint state equation is introduced, including a new unknown variable.

The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form. By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast. A nice property is that the number of computations is independent of the number of parameters for which you want the gradient. The adjoint method is derived from the dual problem[4] and is used e.g. in the Landweber iteration method.[5]

The name adjoint state method refers to the dual form of the problem, where the adjoint matrix A*=AT is used.

When the initial problem consists of calculating the product sTx and x must satisfy Ax=b, the dual problem can be realized as calculating the product Template:Nowrap, where r must satisfy A*r=s. And r is called the adjoint state vector.

General case

The original adjoint calculation method goes back to Jean Cea,[6] with the use of the Lagrangian of the optimization problem to compute the derivative of a functional with respect to a shape parameter.

For a state variable u𝒰, an optimization variable v𝒱, an objective functional J:𝒰×𝒱ℝ is defined. The state variable u is often implicitly dependent on v through the (direct) state equation Dv(u)=0 (usually the weak form of a partial differential equation), thus the considered objective is j(v)=J(uv,v), where uv is the solution of the state equation given the optimization variables v. Usually, one would be interested in calculating j(v) using the chain rule:

j(v)=vJ(uv,v)+uJ(uv)vuv.

Unfortunately, the term vuv is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The Lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of j, the problem

minimize j(v)=J(uv,v)
subject to Dv(uv)=0

has an associate Lagrangian functional β„’:𝒰×𝒱×𝒰ℝ defined by

β„’(u,v,λ)=J(u,v)+Dv(u),λ,

where λ𝒰 is a Lagrange multiplier or adjoint state variable and , is an inner product on 𝒰. The method of Lagrange multipliers states that a solution to the problem has to be a stationary point of the lagrangian, namely

{duβ„’(u,v,λ;δu)=duJ(u,v;δu)+δu,Dv*(λ)=0δu𝒰,dvβ„’(u,v,λ;δv)=dvJ(u,v;δv)+dvDv(u;δv),λ=0δv𝒱,dλβ„’(u,v,λ;δλ)=Dv(u),δλ=0δλ𝒰,

where dxF(x;δx) is the Gateaux derivative of F with respect to x in the direction δx. The last equation is equivalent to Dv(u)=0, the state equation, to which the solution is uv. The first equation is the so-called adjoint state equation,

δu,Dv*(λ)=duJ(uv,v;δu)δu𝒰,

because the operator involved is the adjoint operator of Dv, Dv*. Resolving this equation yields the adjoint state λv. The gradient of the quantity of interest j with respect to v is j(v),δv=dvj(v;δv)=dvβ„’(uv,v,λv;δv) (the second equation with u=uv and λ=λv), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator Dv is self-adjoint or symmetric since the direct and adjoint state equations differ only by their right-hand side.

Example: Linear case

In a real finite dimensional linear programming context, the objective function could be J(u,v)=Au,v, for vℝn, uℝm and Aℝn×m, and let the state equation be Bvu=b, with Bvℝm×m and bℝm.

The Lagrangian function of the problem is β„’(u,v,λ)=Au,v+Bvub,λ, where λℝm.

The derivative of β„’ with respect to λ yields the state equation as shown before, and the state variable is uv=Bv1b. The derivative of β„’ with respect to u is equivalent to the adjoint equation, which is, for every δuℝm,

du[Bvb,λ](u;δu)=Av,δuBvδu,λ=Av,δuBvλ+Av,δu=0Bvλ=Av.

Thus, we can write symbolically λv=BvAv. The gradient would be

j(v),δv=Auv,δv+vBv:λvuv,δv,

where vBv=Bijvk is a third-order tensor, λvuv=λvuv is the dyadic product between the direct and adjoint states and : denotes a double tensor contraction. It is assumed that Bv has a known analytic expression that can be differentiated easily.

Numerical consideration for the self-adjoint case

If the operator Bv was self-adjoint, Bv=Bv, the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, a LU decomposition can be used instead to solve the state equation, in O(m3) operations for the decomposition and O(m2) operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only O(m2) operations since the matrices are the same.

See also

References

  1. ↑ Template:Cite journal
  2. ↑ Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud Neural Ordinary Differential Equations Available online
  3. ↑ Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503. free access on GJI website
  4. ↑ Template:Cite journal
  5. ↑ Template:Cite web
  6. ↑ Template:Cite journal
  • A well written explanation by Errico: What is an adjoint Model?
  • Another well written explanation with worked examples, written by Bradley [1]
  • More technical explanation: A review of the adjoint-state method for computing the gradient of a functional with geophysical applications
  • MIT course [2]
  • MIT notes [3]


Template:Mathapplied-stub