Iterative Solver

  1. Scientific Computing || 02 Week 7 19 1 Introduction to spectral methods 10 46

  2. Solving the Discrete Poisson Equation using Jacobi, SOR, Conjugate Gradients, and the FFT

  3. Iteration methods

  4. Numerical Methods for Elliptic Equations

  5. Iterative Methods: Part 2

  6. FFTW Tutorial

  7. Finite difference method for 1D Poisson equation with mixed boundary conditions

Iterative Solver

Iterative methods use successive approximations to obtain the most accurate solution to a linear system at every iteration step. These methods start with an initial guess and proceed to generate a sequence of approximations, in which the k-th approximation is derived from the previous ones. There are two main types of iterative methods. Stationary methods that are easy to understand and are simple to implement, but are not very effective. Non-stationary methods which are based on the idea of the sequence of orthogonal vectors. We would like to recommend a text by Barrett et al. which provides a good discussion on the implementation of iterative methods for solving a linear system of equations.

Stationary Methods: Gauss-Seidel

The iterative methods work by splitting the matrix \(\mathbf{A}\) into

\[\mathbf{A}=\mathbf{M}-\mathbf{P}\]

Equation

\[\mathbf{A}\mathbf{u}=\mathbf{b}\]

becomes

\[\mathbf{M}\mathbf{u}=\mathbf{P}\mathbf{u}+\mathbf{b}\]

The solution at \((k+1)\) iteration is given by

\[\mathbf{M}\mathbf{u}^{(k+1)}=\mathbf{P}\mathbf{u}^{(k+1)}+\mathbf{b}\]

If we take the difference between the above two equations, we get the evolution of error as \(\epsilon^{(k+1)}=\mathbf{M}^{-1}\mathbf{P}\epsilon^{(k)}\). For the solution to converge to the exact solution and the error to go to zero, the largest eigenvalue of iteration matrix \(\mathbf{M}^{-1}\mathbf{P}\) should be less than 1. The approximate number of iterations required for the error \(\epsilon\) to go below some specified tolerance \(\delta\) is given by

\[Q=\cfrac{\text{ln}(\delta)}{\text{ln}(\lambda_{1})}\]

where \(Q\) is the number of iterations and \(\lambda_{1}\) is the maximum eigenvalue of iteration matrix \(\mathbf{M}^{-1}\mathbf{P}\). If the convergence tolerance is specified to be \(1\times10^{-5}\) then the number of iterations for convergence will be \(Q=1146\) and \(Q=23,020\) for \(\lambda_{1}=0.99\) and \(\lambda_{1}=0.9995\) respectively. Therefore, the maximum eigenvalue of the iteration matrix should be less for faster convergence. When we implement iterative methods, we do not form the matrix \(\mathbf{M}\) and \(\mathbf{P}\). Equation \(\mathbf{M}\mathbf{u}^{(k+1)}=\mathbf{P}\mathbf{u}^{(k+1)}+\mathbf{b}\) is just the matrix representation of the iterative method. In matrix terms, the Gauss-Seidel method can be expressed as

\[\begin{split}\begin{array}{l} \mathbf{M}= \mathbf{D}-\mathbf{L}\\ \mathbf{u}^{(k+1)}=(\mathbf{D}-\mathbf{L})^{-1}\mathbf{P}\mathbf{u}^{(k)}+\mathbf{M}^{-1}\mathbf{b} \end{array}\end{split}\]

where \(\mathbf{D}\), \(\mathbf{L}\) and \(\mathbf{U}\) represent the diagonal, the strictly lower-triangular, and the strictly upper-triangular parts of \(\mathbf{A}\) respectively. We do not explicitly construct the matrices \(\mathbf{D}\), \(\mathbf{L}\) and \(\mathbf{U}\), instead we use the update formula based on data vectors (i.e., we obtain a vector once a matrix operates on a vector).

The update formulas for Gauss-Seidel iterations if we iterate from left to right and from bottom to top can be written as

\[\begin{split}\begin{array}{l} d=\cfrac{-2}{\Delta x^2}+\cfrac{-2}{\Delta y^2}\\ u_{i,j}^{(k+1)}=u_{i,j}^{(k)}+\cfrac{r_{i,j}}{d} \\ r_{i,j}=f_{i,j}^{(k)}-\cfrac{u_{i-1,j}^{(k)}-2u_{i,j}^{(k)}+u_{i+1,j}^{(k)}}{\Delta x^{2}} -\cfrac{u_{i,j-1}^{(k)}-2u_{i,j}^{(k)}+u_{i,j+1}^{(k)}}{\Delta y^{2}} \end{array}\end{split}\]

where we define an operator

\[Au_{i,j}=\cfrac{u_{i-1,j}^{(k)}-2u_{i,j}^{(k)}+u_{i+1,j}^{(k)}}{\Delta x^{2}} +\cfrac{u_{i,j-1}^{(k)}-2u_{i,j}^{(k)}+u_{i,j+1}^{(k)}}{\Delta y^{2}}\]

Jacobi’s Method

Iterative Methods for Solving \(\mathbf{A}\mathbf{x}=\mathbf{b}\) - Jacobi’s Method

Two assumptions made on Jacobi Method:

  1. The system given by

\[\begin{split}\begin{array}{l} a_{11}x_{1}+a_{12}x_{2}+\cdots +a_{1n}x_{n}=b_{1}\\ a_{21}x_{1}+a_{22}x_{2}+\cdots +a_{2n}x_{n}=b_{2}\\ \cdots \\ a_{n1}x_{1}+a_{n2}x_{2}+\cdots +a_{nn}x_{n}=b_{n}\\ \end{array}\end{split}\]

Has a unique solution.

2. The coefficient matrix has no zeros on its main diagonal, namely \(a_{11},a_{22},\cdots,a_{nn}\) are nonzeros.

Main idea of Jacobi:

To begin, solve the \(1^{st}\) equation for \(x_{1}\), the \(2^{nd}\) equation \(x_{2}\) for and so on to obtain the rewritten equations:

\[\begin{split}\begin{array}{l} x_{1}=\cfrac{1}{a_{11}}(b_{1}-a_{12}x_{2}-a_{13}x_{3}-\cdots-a_{1n}x_{n})\\ x_{2}=\cfrac{1}{a_{22}}(b_{2}-a_{21}x_{1}-a_{23}x_{3}-\cdots-a_{2n}x_{n})\\ \cdots \\ x_{n}=\cfrac{1}{a_{nn}}(b_{n}-a_{n1}x_{1}-a_{n2}x_{2}-\cdots-a_{n,n-1}x_{n-1})\\ \end{array}\end{split}\]

Then make an initial guess of the solution \(x^{(0)}=(x_{1}^{(0)},x_{2}^{(0)},x_{3}^{(0)},\cdots,x_{n}^{(0)})\). Substitute these values into the right hand side the of the rewritten equations to obtain the first approximation, \((x_{1}^{(1)},x_{2}^{(1)},x_{3}^{(1)},\cdots,x_{n}^{(1)})\). This accomplishes one iteration. In the same way, the second approximation \((x_{1}^{(2)},x_{2}^{(2)},x_{3}^{(2)},\cdots,x_{n}^{(2)})\) is computed by substituting the first approximation’s \(x\)- values into the right hand side of the rewritten equations.

By repeated iterations, we form a sequence of approximations \(\mathbf{x}^{(k)}=(x_{1}^{(2)},x_{2}^{(2)},x_{3}^{(2)},\cdots,x_{n}^{(2)})^{T},k=1,2,3,\cdots\)

For each \(k \ge 1\), generate the components \(x_{i}^{(k)}\) of \(\mathbf{x}^{(k)}\) from \(\mathbf{x}^{(k-1)}\) by

\[x_{i}^{(k)}=\cfrac{1}{a_{ii}} \bigg[\sum_{j=1,j\ne i}^{n}(-a_{ij}x_{j}^{(k-1)})+b_{i}\bigg]\]

for

\[i=1,2,\cdots,n\]

The Jacobi Method in Matrix Form

\[\begin{split}\mathbf{A}=\begin{bmatrix} a_{11}&a_{12}&\cdots&a_{1n} \\ a_{21}&a_{22}&\cdots&a_{2n} \\ \vdots&\vdots & \ddots &\vdots\\ a_{n1}&a_{n2}&\cdots&a_{nn} \\ \end{bmatrix}\end{split}\]

and

\[\begin{split}\mathbf{b}=\begin{bmatrix} b_{1}\\ b_{2}\\\vdots\\ b_{n} \end{bmatrix}\end{split}\]

for

\[\begin{split}\mathbf{x}=\begin{bmatrix} x_{1}\\ x_{2}\\\vdots\\ x_{n} \end{bmatrix}\end{split}\]

We split \(\mathbf{A}\) into

\[\mathbf{A}=\mathbf{L}+\mathbf{D}+\mathbf{U}\]

where

\[\begin{split}\mathbf{L}=\begin{bmatrix} 0&0&\cdots&0 \\ a_{21}&0&\cdots&0 \\ \vdots&\vdots & \ddots &\vdots\\ a_{n1}&a_{n2}&\cdots&0\\ \end{bmatrix}\quad \mathbf{D}=\begin{bmatrix} a_{11}&0&\cdots&0 \\ 0&a_{22}&\cdots&0 \\ \vdots&\vdots & \ddots &\vdots\\ 0&0&\cdots&a_{nn}\\ \end{bmatrix}\quad \mathbf{U}=\begin{bmatrix} 0&a_{12}&\cdots&a_{1n} \\ 0&0&\cdots&a_{2n} \\ \vdots&\vdots & \ddots &\vdots\\ 0&0&\cdots&0\\ \end{bmatrix}\quad\end{split}\]

\(\mathbf{A}\mathbf{x}=\mathbf{b}\) is transformed into

\[\begin{split}\begin{array}{l} (\mathbf{L}+\mathbf{D}+\mathbf{U})\mathbf{x}=\mathbf{b}\\ {\mathbf{D}}\mathbf{x}=\mathbf{b}-(\mathbf{L}+\mathbf{U})\mathbf{x}\\ \mathbf{x}= {\mathbf{D}^{-1}}\mathbf{b}-{\mathbf{D}^{-1}}(\mathbf{L}+\mathbf{U})\mathbf{x}\\ \end{array}\end{split}\]

where

\[\begin{split}\mathbf{D}^{-1}=\begin{bmatrix} \cfrac{1}{a_{11}} &0&\cdots&0 \\ 0&\cfrac{1}{a_{22}}&\cdots&0 \\ \vdots&\vdots & \ddots &\vdots\\ 0&0&\cdots&\cfrac{1}{a_{nn}}\\ \end{bmatrix}\quad\end{split}\]

The matrix form of Jacobi iterative method is

\[\begin{split}\mathbf{x}^{(k+1)}= -{\mathbf{D}^{-1}}(\mathbf{L}+\mathbf{U})\mathbf{x}^{(k)}+{\mathbf{D}^{-1}}\mathbf{b}\quad k=1,2,3\\\end{split}\]
\[\begin{split}\begin{array}{l} a_{11}x_{1}^{(k+1)}+a_{12}x_{2}^{(k)}+\cdots +a_{1n}x_{n}^{(k)}=b_{1}\\ a_{21}x_{1}^{(k)}+a_{22}x_{2}^{(k+1)}+\cdots +a_{2n}x_{n}^{(k)}=b_{2}\\ \cdots \\ a_{n1}x_{1}^{(k)}+a_{n2}x_{2}^{(k)}+\cdots +a_{nn}x_{n}^{(k+1)}=b_{n}\\ \end{array}\end{split}\]

Gauss-Seidel iterative method

For each \(k \ge 1\), generate the components \(x_{i}^{(k)}\) of \(\mathbf{x}^{(k)}\) from \(\mathbf{x}^{(k-1)}\) by

\[x_{i}^{(k)}=\cfrac{1}{a_{ii}} \bigg[-\sum_{j=1}^{i-1}(a_{ij}x_{j}^{(k)})-\sum_{j=i+1}^{n}(a_{ij}x_{j}^{(k-1)})+b_{i}\bigg]\]

Namely,

\[\begin{split}\begin{array}{l} a_{11}x_{1}^{(k+1)}+a_{12}x_{2}^{(k)}+a_{13}x_{3}^{(k)}+\cdots +a_{1n}x_{n}^{(k)}=b_{1}\\ a_{21}x_{1}^{(k+1)}+a_{22}x_{2}^{(k+1)}+a_{23}x_{3}^{(k)}+\cdots +a_{2n}x_{n}^{(k)}=b_{2}\\ a_{31}x_{1}^{(k+1)}+a_{32}x_{2}^{(k+1)}+a_{33}x_{3}^{(k+1)}+\cdots +a_{2n}x_{n}^{(k)}=b_{3}\\ \cdots \\ a_{n1}x_{1}^{(k+1)}+a_{n2}x_{2}^{(k+1)}+a_{n3}x_{3}^{(k+1)}+\cdots +a_{nn}x_{n}^{(k+1)}=b_{n}\\ \end{array}\end{split}\]

Matrix-based formula

The solution is obtained iteratively via

\[\begin{split}\begin{array}{l} (\mathbf{L}+\mathbf{D})\mathbf{x}^{(k+1)}+\mathbf{U}\mathbf{x}^{(k)}=\mathbf{b}\\ (\mathbf{L}+\mathbf{D})\mathbf{x}^{(k+1)}=\mathbf{b}-\mathbf{U}\mathbf{x}^{(k)}\\ \mathbf{L}_{*}=\mathbf{L}+\mathbf{D}\\ \mathbf{L}_{*}\mathbf{x}^{(k+1)}=\mathbf{b}-\mathbf{U}\mathbf{x}^{(k)}\\ \end{array}\end{split}\]

Why the matrix-based formula works

The system of linear equations may be rewritten as:

\[\begin{split}\begin{array}{l} \mathbf{A}\mathbf{x}=\mathbf{b}\\ (\mathbf{L}_{*}+\mathbf{U})\mathbf{x}=\mathbf{b}\\ \mathbf{L}_{*}\mathbf{x}+\mathbf{U}\mathbf{x}=\mathbf{b}\\ \mathbf{L}_{*}\mathbf{x}=\mathbf{b}-\mathbf{U}\mathbf{x}\\ \mathbf{x}=\mathbf{L}_{*}^{-1}(\mathbf{b}-\mathbf{U}\mathbf{x})\\ \end{array}\end{split}\]

The Gauss–Seidel method now solves the left hand side of this expression for \(\mathbf{x}\), using previous value for \(\mathbf{x}\), on the right hand side. Analytically, this may be written as:

\[\begin{split}\mathbf{x}^{(k+1)}=\mathbf{L}_{*}^{-1}(\mathbf{b}-\mathbf{U}\mathbf{x}^{(k)})\\\end{split}\]

Solving the 1D Poisson equation using finite differences

Consider the 1D Poisson equation

\[\cfrac{d^{2}u}{dx^{2}} =f(x)=-1\]

on \(\Omega=[0,1]\) with boundary conditions

\[\begin{split}\begin{array}{l} u(0)=0\\ u'(1)=\cfrac{du}{dx}\bigg|_{x=1}=0 \end{array}\end{split}\]

which has analytical solution

\[u=x-\cfrac{1}{2}x^{2}\]

Finite differences

We we will use this specific example to investigate various approaches to solving partial differential equations with finite differences, in which we discretize the domain by defining \(N\) equally spaced points

\[\begin{split}\begin{array}{l} \cfrac{u_{i+1}-2u_{i}+u_{i-1}}{\Delta x^{2}} =f_{i}\\ {u_{i+1}-2u_{i}+u_{i-1}}={\Delta x^{2}}f_{i}\\ \end{array}\end{split}\]

The boundary conditions require special care. For \(x=0\) we have a Dirichlet boundary condition which allows us to fix the value \(u_{1}=0\). For \(x=1\) we have a Neumann boundary condition \(du/dx = 0\). This is a symmetry boundary condition, so that in this case we can imagine a ’ghost’ point \(u_{N+1}\) which is always equal to \(u_{N-1}\). This leads to the expression for point \(x_{N}\):

\[u_{N}=u_{N-1}\]
\[\begin{split}\begin{array}{l} \cfrac{u_{i+1}-2u_{i}+u_{i-1}}{\Delta x^{2}} =f_{i}\\ {u_{i+1}-2u_{i}+u_{i-1}}={\Delta x^{2}}f_{i}=b{i}\\ {u_{i+1}-2u_{i}+u_{i-1}}=b_{i}\\ u_{i-1}-2u_{i}+u_{i+1}=b_{i}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} u_{2-1}-2u_{2}+u_{2+1}=b_{2}\\ u_{3-1}-2u_{3}+u_{3+1}=b_{3}\\ \cdots\\ u_{i-1}-2u_{i}+u_{i+1}=b_{i}\\ \cdots\\ u_{N-1-1}-2u_{N-1}+u_{N-1+1}=b_{N-1}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} u_{1}-2u_{2}+u_{3}=b_{2}\\ u_{2}-2u_{3}+u_{4}=b_{3}\\ \cdots\\ u_{i-1}-2u_{i}+u_{i+1}=b_{i}\\ \cdots\\ u_{N-2}-2u_{N-1}+u_{N}=b_{N-1}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} -2u_{2}+u_{3}=b_{2}-u_{1}=\hat{b}_{2}\\ u_{2}-2u_{3}+u_{4}=b_{3}=\hat{b}_{3}\\ \cdots\\ u_{i-1}-2u_{i}+u_{i+1}=b_{i}=\hat{b}_{i}\\ \cdots\\ u_{N-3}-2u_{N-2}+u_{N-1}=b_{N-2}=\hat{b}_{N-2}\\ u_{N-2}-2u_{N-1}=b_{N-1}-u_{N}=\hat{b}_{N-1}\\ \end{array}\end{split}\]
\[\begin{split}\begin{bmatrix} -2&1 &0 &\cdots&\cdots&0\\ 1& -2 & 1&\ddots&&\vdots\\ 0&1& -2 & 1&\ddots&\vdots\\ \vdots&\ddots&\ddots&\ddots&\ddots&0 \\ \vdots&&\ddots&1& -2 & 1\\ 0&\cdots&\cdots& 0&1&-2 \\ \end{bmatrix} \begin{bmatrix} u_{2}\\u_{3}\\\vdots\\u_{i}\\\vdots\\u_{N-2}\\u_{N-1} \end{bmatrix}= \begin{bmatrix} \hat{b}_{2}\\\hat{b}_{3}\\\vdots\\\hat{b}_{i}\\\vdots\\\hat{b}_{N-2}\\\hat{b}_{N-1} \end{bmatrix}\end{split}\]
\[\begin{split}\mathbf{A}= \begin{bmatrix} -2&1 &0 &\cdots&\cdots&0\\ 1& -2 & 1&\ddots&&\vdots\\ 0&1& -2 & 1&\ddots&\vdots\\ \vdots&\ddots&\ddots&\ddots&\ddots&0 \\ \vdots&&\ddots&1& -2 & 1\\ 0&\cdots&\cdots& 0&1&-2 \\ \end{bmatrix}\end{split}\]
\[\begin{split}\begin{array}{l} a_{ii}=-2\\ a_{i,i+1}=1\\ a_{i+1,i}=1\\ \text{else}\\ a_{i,j}=0\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} -2u_{2}^{(k+1)}+u_{3}^{(k)}=\hat{b}_{2}\\ u_{2}^{(k+1)}-2u_{3}^{(k+1)}+u_{4}^{(k)}=\hat{b}_{3}\\ u_{3}^{(k+1)}-2u_{4}^{(k+1)}+u_{5}^{(k)}=\hat{b}_{4}\\ \cdots\\ u_{i-1}^{(k+1)}-2u_{i}^{(k+1)}+u_{i+1}^{(k)}=\hat{b}_{i}\\ \cdots\\ u_{N-3}^{(k+1)}-2u_{N-2}^{(k+1)}+u_{N-1}^{(k)}=\hat{b}_{N-2}\\ u_{N-2}^{(k+1)}-2u_{N-1}^{(k+1)}=\hat{b}_{N-1}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} \displaystyle \cfrac{u_{i-1}-2u_{i}+u_{i+1}}{\Delta x^{2}} =f_{i}\\ \displaystyle \cfrac{u_{i-1}^{(k+1)}-2u_{i}^{(k+1)}+u_{i+1}^{(k)}}{\Delta x^{2}} =f_{i}\\ \displaystyle \cfrac{u_{i-1}^{(k+1)}-2u_{i}^{(k)}+2u_{i}^{(k)}-2u_{i}^{(k+1)}+u_{i+1}^{(k)}}{\Delta x^{2}} =f_{i}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} \displaystyle \cfrac{u_{i-1}^{(k+1)}-2u_{i}^{(k)}-2(u_{i}^{(k+1)}-u_{i}^{(k)})+u_{i+1}^{(k)}}{\Delta x^{2}} =f_{i}\\ \displaystyle \cfrac{u_{i-1}^{(k+1)}-2u_{i}^{(k)}+u_{i+1}^{(k)}}{\Delta x^{2}} -\cfrac{2}{\Delta x^{2}}(u_{i}^{(k+1)}-u_{i}^{(k)})=f_{i}\\ -\cfrac{2}{\Delta x^{2}}(u_{i}^{(k+1)}-u_{i}^{(k)})=f_{i}-\displaystyle \cfrac{u_{i-1}^{(k+1)}-2u_{i}^{(k)}+u_{i+1}^{(k)}}{\Delta x^{2}}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} u_{i}^{(k+1)}-u_{i}^{(k)}=\cfrac{1}{-\cfrac{2}{\Delta x^{2}}} \Bigg\{f_{i}-\displaystyle \cfrac{u_{i-1}^{(k+1)}-2u_{i}^{(k)}+u_{i+1}^{(k)}}{\Delta x^{2}}\Bigg\}\\ u_{i}^{(k+1)}=u_{i}^{(k)}+\cfrac{1}{-\cfrac{2}{\Delta x^{2}}} \Bigg\{f_{i}-\displaystyle \cfrac{u_{i-1}^{(k+1)}-2u_{i}^{(k)}+u_{i+1}^{(k)}}{\Delta x^{2}}\Bigg\}\\ \end{array}\end{split}\]

Let

\[r_{i}=f_{i}-\displaystyle \cfrac{u_{i-1}^{(k+1)}-2u_{i}^{(k)}+u_{i+1}^{(k)}}{\Delta x^{2}}\]

then

\[\begin{split}\begin{array}{l} d_{i}= -\cfrac{2}{\Delta x^{2}}\\ u_{i}^{(k+1)}=u_{i}^{(k)}+\cfrac{1}{-\cfrac{2}{\Delta x^{2}}}r_{i}\\ u_{i}^{(k+1)}=u_{i}^{(k)}+\cfrac{1}{d_{i}}r_{i}\\ \end{array}\end{split}\]

Solving the 2D Poisson equation using finite differences

\[\begin{split}\displaystyle \cfrac{u_{i-1,j}-2u_{i,j}+u_{i+1,j}}{\Delta x^{2}} +\cfrac{u_{i,j-1}-2u_{i,j}+u_{i,j+1}}{\Delta y^{2}} =f_{i,j}\\\end{split}\]
\[\begin{split}\begin{array}{l} \displaystyle \cfrac{u_{2-1,j}-2u_{2,j}+u_{2+1,j}}{\Delta x^{2}} +\cfrac{u_{2,j-1}-2u_{2,j}+u_{2,j+1}}{\Delta y^{2}} =f_{2,j}\\ \displaystyle \cfrac{u_{3-1,j}-2u_{3,j}+u_{3+1,j}}{\Delta x^{2}} +\cfrac{u_{3,j-1}-2u_{3,j}+u_{3,j+1}}{\Delta y^{2}} =f_{3,j}\\ \cdots \\ \displaystyle \cfrac{u_{i-1,j}-2u_{i,j}+u_{i+1,j}}{\Delta x^{2}} +\cfrac{u_{i,j-1}-2u_{i,j}+u_{i,j+1}}{\Delta y^{2}} =f_{i,j}\\ \cdots \\ \displaystyle \cfrac{u_{m-1-1,j}-2u_{m-1,j}+u_{m-1+1,j}}{\Delta x^{2}} +\cfrac{u_{m-1,j-1}-2u_{m-1,j}+u_{m-1,j+1}}{\Delta y^{2}} =f_{m-1,j}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} \displaystyle \cfrac{u_{1,j}-2u_{2,j}+u_{3,j}}{\Delta x^{2}} +\cfrac{u_{2,j-1}-2u_{2,j}+u_{2,j+1}}{\Delta y^{2}} =f_{2,j}\\ \displaystyle \cfrac{u_{2,j}-2u_{3,j}+u_{4,j}}{\Delta x^{2}} +\cfrac{u_{3,j-1}-2u_{3,j}+u_{3,j+1}}{\Delta y^{2}} =f_{3,j}\\ \cdots \\ \displaystyle \cfrac{u_{i-1,j}-2u_{i,j}+u_{i+1,j}}{\Delta x^{2}} +\cfrac{u_{i,j-1}-2u_{i,j}+u_{i,j+1}}{\Delta y^{2}} =f_{i,j}\\ \cdots \\ \displaystyle \cfrac{u_{m-2,j}-2u_{m-1,j}+u_{m,j}}{\Delta x^{2}} +\cfrac{u_{m-1,j-1}-2u_{m-1,j}+u_{m-1,j+1}}{\Delta y^{2}} =f_{m-1,j}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} \displaystyle \cfrac{u_{1,2}-2u_{2,2}+u_{3,2}}{\Delta x^{2}} +\cfrac{u_{2,1}-2u_{2,2}+u_{2,3}}{\Delta y^{2}} =f_{2,2}\\ \displaystyle \cfrac{u_{2,2}-2u_{3,2}+u_{4,2}}{\Delta x^{2}} +\cfrac{u_{3,1}-2u_{3,2}+u_{3,3}}{\Delta y^{2}} =f_{3,2}\\ \cdots \\ \displaystyle \cfrac{u_{i-1,2}-2u_{i,2}+u_{i+1,2}}{\Delta x^{2}} +\cfrac{u_{i,1}-2u_{i,2}+u_{i,3}}{\Delta y^{2}} =f_{i,2}\\ \cdots \\ \displaystyle \cfrac{u_{m-2,2}-2u_{m-1,2}+u_{m,2}}{\Delta x^{2}} +\cfrac{u_{m-1,1}-2u_{m-1,2}+u_{m-1,3}}{\Delta y^{2}} =f_{m-1,2}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} \displaystyle \cfrac{u_{1,3}-2u_{2,3}+u_{3,3}}{\Delta x^{2}} +\cfrac{u_{2,2}-2u_{2,3}+u_{2,4}}{\Delta y^{2}} =f_{2,3}\\ \displaystyle \cfrac{u_{2,3}-2u_{3,3}+u_{4,3}}{\Delta x^{2}} +\cfrac{u_{3,2}-2u_{3,3}+u_{3,4}}{\Delta y^{2}} =f_{3,3}\\ \cdots \\ \displaystyle \cfrac{u_{i-1,3}-2u_{i,3}+u_{i+1,3}}{\Delta x^{2}} +\cfrac{u_{i,2}-2u_{i,3}+u_{i,4}}{\Delta y^{2}} =f_{i,3}\\ \cdots \\ \displaystyle \cfrac{u_{m-2,3}-2u_{m-1,3}+u_{m,3}}{\Delta x^{2}} +\cfrac{u_{m-1,2}-2u_{m-1,3}+u_{m-1,4}}{\Delta y^{2}} =f_{m-1,3}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} \displaystyle \cfrac{u_{1,n-1}-2u_{2,n-1}+u_{3,n-1}}{\Delta x^{2}} +\cfrac{u_{2,n-2}-2u_{2,n-1}+u_{2,n}}{\Delta y^{2}} =f_{2,n-1}\\ \displaystyle \cfrac{u_{2,n-1}-2u_{3,n-1}+u_{4,n-1}}{\Delta x^{2}} +\cfrac{u_{3,n-2}-2u_{3,n-1}+u_{3,n}}{\Delta y^{2}} =f_{3,n-1}\\ \cdots \\ \displaystyle \cfrac{u_{i-1,n-1}-2u_{i,n-1}+u_{i+1,n-1}}{\Delta x^{2}} +\cfrac{u_{i,n-2}-2u_{i,n-1}+u_{i,n}}{\Delta y^{2}} =f_{i,n-1}\\ \cdots \\ \displaystyle \cfrac{u_{m-2,n-1}-2u_{m-1,n-1}+u_{m,n-1}}{\Delta x^{2}} +\cfrac{u_{m-1,n-1}-2u_{m-1,n-1}+u_{m-1,n}}{\Delta y^{2}} =f_{m-1,n-1}\\ \end{array}\end{split}\]
\[\begin{split}\mathbf{u}=\begin{bmatrix} u_{2,2}\\u_{3,2}\\\vdots\\u_{m-1,2}\\ u_{2,3}\\u_{3,3}\\\vdots\\u_{m-1,3}\\ \vdots\\ u_{2,n-1}\\u_{3,n-1}\\\vdots\\u_{m-1,n-1}\\ \end{bmatrix}\quad \mathbf{f}=\begin{bmatrix} f_{2,2}\\f_{3,2}\\\vdots\\f_{m-1,2}\\ f_{2,3}\\f_{3,3}\\\vdots\\f_{m-1,3}\\ \vdots\\ f_{2,n-1}\\f_{3,n-1}\\\vdots\\f_{m-1,n-1}\\ \end{bmatrix}\quad\end{split}\]
\[\begin{split}\displaystyle \cfrac{u_{i-1,j}-2u_{i,j}+u_{i+1,j}}{\Delta x^{2}} +\cfrac{u_{i,j-1}-2u_{i,j}+u_{i,j+1}}{\Delta y^{2}} =f_{i,j}\\\end{split}\]
\[\begin{split}\cfrac{1}{\Delta x^{2}}u_{i-1,j} -(\cfrac{2}{\Delta x^{2}}+\cfrac{2}{\Delta y^{2}})u_{i,j} +\cfrac{1}{\Delta x^{2}}u_{i+1,j} +\cfrac{1}{\Delta y^{2}}u_{i,j-1} +\cfrac{1}{\Delta y^{2}}u_{i,j+1} =f_{i,j}\\\end{split}\]
\[\begin{split}\begin{array}{l} a=\cfrac{1}{\Delta x^{2}}, b=-(\cfrac{2}{\Delta x^{2}}+\cfrac{2}{\Delta y^{2}}), c=\cfrac{1}{\Delta x^{2}}, d=\cfrac{1}{\Delta y^{2}}, e=\cfrac{1}{\Delta y^{2}}\\ au_{i-1,j}+bu_{i,j}+cu_{i+1,j}+du_{i,j-1}+eu_{i,j+1} =f_{i,j}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} bu_{2,2}+cu_{3,2}+eu_{2,3}=f_{2,2}-au_{1,2}-du_{2,1}=\hat{f}_{2,2}\\ au_{2,2}+bu_{3,2}+cu_{4,2}+eu_{3,3}=f_{3,2}-du_{3,1}=\hat{f}_{3,2}\\ au_{3,2}+bu_{4,2}+cu_{5,2}+eu_{4,3}=f_{4,2}-du_{4,1}=\hat{f}_{4,2}\\ \cdots \\ au_{i-1,2}+bu_{i,2}+cu_{i+1,2}+eu_{i,3}=f_{i,2}-du_{i,1}=\hat{f}_{i,2}\\ \cdots \\ au_{m-2,2}+bu_{m-1,2}+cu_{m,2}+eu_{m-1,3}=f_{m-1,2}-du_{m-1,1}=\hat{f}_{m-1,2}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} au_{i-1,j}+bu_{i,j}+cu_{i+1,j}+du_{i,j-1}+eu_{i,j+1}=f_{i,j}\\ au_{i-1,3}+bu_{i,3}+cu_{i+1,3}+du_{i,2}+eu_{i,4}=f_{i,3}\\ bu_{2,3}+cu_{3,3}+du_{2,2}+eu_{2,4}=f_{2,3}-au_{1,3}=\hat{f}_{2,3}\\ du_{2,2}+bu_{2,3}+cu_{3,3}+eu_{2,4}=f_{2,3}-au_{1,3}=\hat{f}_{2,3}\\ au_{2,3}+bu_{3,3}+cu_{4,3}+du_{3,2}+eu_{3,4}=f_{3,3}=\hat{f}_{3,3}\\ du_{3,2}+au_{2,3}+bu_{3,3}+cu_{4,3}+eu_{3,4}=f_{3,3}=\hat{f}_{3,3}\\ au_{3,3}+bu_{4,3}+cu_{5,3}+du_{4,2}+eu_{4,4}=f_{4,3}=\hat{f}_{4,3}\\ \cdots\\ au_{m-2,3}+bu_{m-1,3}+cu_{m,3}+du_{m-1,2}+eu_{m-1,4}=f_{m-1,3}=\hat{f}_{m-1,3}\\ du_{m-1,2}+au_{m-2,3}+bu_{m-1,3}+cu_{m,3}+eu_{m-1,4}=f_{m-1,3}=\hat{f}_{m-1,3}\\ \end{array}\end{split}\]
\[\begin{split}\begin{bmatrix} b& c &0&\cdots &e &\cdots&\cdots&&0\\ a& b &c&\ddots&\ &\ddots&&&\vdots\\ 0&a& b &c&\ddots &&\ddots&&\vdots\\ \vdots&\ddots &\ddots&\ddots&&\ddots&&&e\\ d& &\ddots&\ddots&&\ddots&&&\vdots\\ \vdots&\ddots & & &\ddots&a& b &c&0\\ \vdots& &\ddots&&&\ddots&a& b &c\\ 0&\cdots &&d&\cdots&&0&a& b\\ \end{bmatrix}\end{split}\]
\[\begin{split}au_{i-1,j}^{(k+1)}+bu_{i,j}^{(k+1)}+cu_{i+1,j}^{(k)}+du_{i,j-1}^{(k+1)}+eu_{i,j+1}^{(k)} =f_{i,j}\\\end{split}\]
\[\begin{split}\displaystyle \cfrac{u_{i-1,j}^{(k+1)}-2u_{i,j}^{(k+1)}+u_{i+1,j}^{(k)}}{\Delta x^{2}} +\cfrac{u_{i,j-1}^{(k+1)}-2u_{i,j}^{(k+1)}+u_{i,j+1}^{(k)}}{\Delta y^{2}} =f_{i,j}\\\end{split}\]
\[\begin{split}\begin{array}{l} \displaystyle \cfrac{u_{i-1,j}^{(k+1)}-2u_{i,j}^{(k+1)}+u_{i+1,j}^{(k)}}{\Delta x^{2}} +\cfrac{u_{i,j-1}^{(k+1)}-2u_{i,j}^{(k+1)}+u_{i,j+1}^{(k)}}{\Delta y^{2}} =f_{i,j}\\ \displaystyle \cfrac{u_{i-1,j}^{(k+1)}-2u_{i,j}^{(k)}-2(u_{i,j}^{(k+1)}-u_{i,j}^{(k)})+u_{i+1,j}^{(k)}}{\Delta x^{2}} +\cfrac{u_{i,j-1}^{(k+1)}-2u_{i,j}^{(k)}-2(u_{i,j}^{(k+1)}-u_{i,j}^{(k)})+u_{i,j+1}^{(k)}}{\Delta y^{2}} =f_{i,j}\\ \displaystyle \cfrac{u_{i-1,j}^{(k+1)}-2u_{i,j}^{(k)}+u_{i+1,j}^{(k)}}{\Delta x^{2}}+\cfrac{-2}{\Delta x^{2}}(u_{i,j}^{(k+1)}-u_{i,j}^{(k)})\\ +\cfrac{u_{i,j-1}^{(k+1)}-2u_{i,j}^{(k)}+u_{i,j+1}^{(k)}}{\Delta y^{2}} +\cfrac{-2}{\Delta y^{2}}(u_{i,j}^{(k+1)}-u_{i,j}^{(k)}) =f_{i,j}\\ \displaystyle \cfrac{u_{i-1,j}^{(k+1)}-2u_{i,j}^{(k)}+u_{i+1,j}^{(k)}}{\Delta x^{2}} +\cfrac{u_{i,j-1}^{(k+1)}-2u_{i,j}^{(k)}+u_{i,j+1}^{(k)}}{\Delta y^{2}} +\bigg\{\cfrac{-2}{\Delta x^{2}}+\cfrac{-2}{\Delta y^{2}}\bigg\}(u_{i,j}^{(k+1)}-u_{i,j}^{(k)}) =f_{i,j}\\ \end{array}\end{split}\]
\[\begin{split}\displaystyle \bigg\{\cfrac{-2}{\Delta x^{2}}+\cfrac{-2}{\Delta y^{2}}\bigg\}(u_{i,j}^{(k+1)}-u_{i,j}^{(k)}) =f_{i,j}-\cfrac{u_{i-1,j}^{(k+1)}-2u_{i,j}^{(k)}+u_{i+1,j}^{(k)}}{\Delta x^{2}} -\cfrac{u_{i,j-1}^{(k+1)}-2u_{i,j}^{(k)}+u_{i,j+1}^{(k)}}{\Delta y^{2}}\\\end{split}\]
\[\begin{split}\begin{array}{l} u_{i,j}^{(k+1)}-u_{i,j}^{(k)} =\cfrac{f_{i,j}-\cfrac{u_{i-1,j}^{(k+1)}-2u_{i,j}^{(k)}+u_{i+1,j}^{(k)}}{\Delta x^{2}} -\cfrac{u_{i,j-1}^{(k+1)}-2u_{i,j}^{(k)}+u_{i,j+1}^{(k)}}{\Delta y^{2}}}{\cfrac{-2}{\Delta x^{2}}+\cfrac{-2}{\Delta y^{2}}} \\ u_{i,j}^{(k+1)} =u_{i,j}^{(k)}+\cfrac{f_{i,j}-\cfrac{u_{i-1,j}^{(k+1)}-2u_{i,j}^{(k)}+u_{i+1,j}^{(k)}}{\Delta x^{2}} -\cfrac{u_{i,j-1}^{(k+1)}-2u_{i,j}^{(k)}+u_{i,j+1}^{(k)}}{\Delta y^{2}}}{\cfrac{-2}{\Delta x^{2}}+\cfrac{-2}{\Delta y^{2}}} \\ \end{array}\end{split}\]

Let

\[\begin{split}\begin{array}{l} r_{i,j}=f_{i,j}-\Bigg[\cfrac{u_{i-1,j}^{(k+1)}-2u_{i,j}^{(k)}+u_{i+1,j}^{(k)}}{\Delta x^{2}} +\cfrac{u_{i,j-1}^{(k+1)}-2u_{i,j}^{(k)}+u_{i,j+1}^{(k)}}{\Delta y^{2}}\Bigg]\\ d_{i,j}=\cfrac{-2}{\Delta x^{2}}+\cfrac{-2}{\Delta y^{2}} \end{array}\end{split}\]

then

\[u_{i,j}^{(k+1)}=u_{i,j}^{(k)}+\cfrac{r_{i,j}}{d_{i,j}}\]

Non-Stationary Methods: Conjugate Gradient Algorithm

  1. 共轭梯度法-苏州大学

  2. 数值分析2020春苏州大学

  3. Linear Conjugate Gradient Algorithm

  4. Gradient Descent

  5. 如何通俗地解释梯度下降法

  6. One-Dimensional Gradient Descent

  7. Easiest Way to Understand Gradient Descent Step by Step

  8. The Concept of Conjugate Gradient Descent in Python

  9. Complete Step-by-step Conjugate Gradient Algorithm from Scratch

  10. Conjugate Gradient Descent

  11. Descent method — Steepest descent and conjugate gradient in Python

  12. Finite difference method for 1D Poisson equation with mixed boundary conditions

Non-stationary methods differ from stationary methods in that the iterative matrix changes at every iteration. These methods work by forming a basis of a sequence of matrix powers times the initial residual. The basis is called as the Krylov subspace and mathematically given by \(\mathcal{K}_{n}(\mathbf{A},\mathbf{b})=\text{span}\{\mathbf{b},\mathbf{A}\mathbf{b},\mathbf{A}^{2}\mathbf{b},\cdots,\mathbf{A}^{n-1}\mathbf{b}\}\). The approximate solution to the linear system is found by minimizing the residual over the subspace formed. In this paper, we discuss the conjugate gradient method which is one of the most effective methods for symmetric positive definite systems.

The conjugate gradient method proceeds by calculating the vector sequence of successive approximate solution, residual corresponding the approximate solution, and search direction used in updating the solution and residuals. The approximate solution \(\mathbf{u}^{(k)}\) is updated at every iteration by a scalar multiple \(\alpha_{k}\) of the search direction vector \(\mathbf{p}^{(k)}\):

\[\mathbf{u}^{(k+1)}=\mathbf{u}^{(k)}+\alpha_{k}\mathbf{p}^{(k)}\]

Correspondingly, the residuals \(\mathbf{r}^{(k)}\) are updated as

\[\mathbf{r}^{(k+1)}=\mathbf{r}^{(k)}+\alpha_{k}\mathbf{q}^{(k)}\]

where

\[\mathbf{q}^{(k)}=\mathbf{A}\mathbf{p}^{(k)}\]

The search directions are then updated using the residuals

\[\mathbf{p}^{(k+1)}=\mathbf{r}^{(k+1)}+\beta{k}\mathbf{p}^{(k)}\]

where the choice \(\beta_{k}=\cfrac{{\mathbf{r}^{(k)}}^{\mathbf{T}}\mathbf{r}^{(k)}}{{\mathbf{r}^{(k-1)}}^{\mathbf{T}}\mathbf{r}^{(k-1)}}\) ensures that \(\mathbf{p}^{(k+1)}\) and \(\mathbf{r}^{(k+1)}\) are orthogonal to all previous \(\mathbf{A}\mathbf{p}^{(k)}\) and \(\mathbf{r}^{(k)}\) respectively.

Conjugate gradient algorithm

  1. Given \(\mathbf{b}\)

  2. Given matrix operator \(\mathbf{A}\)

  3. \(\mathbf{u}^{(0)}=\mathbf{b}\)

  4. \(\mathbf{r}^{(0)}=\mathbf{b}-\mathbf{A}\mathbf{u}^{(0)}\)

  5. \(\mathbf{p}^{(0)}=\mathbf{r}^{(0)}\)

  6. \(k=0\)

  7. \(\rho_{0}={\mathbf{r}^{(0)}}^{\mathbf{T}}\mathbf{r}^{(0)}\)

  8. while tolerance met ( or \(k<N\) ) do

  9. \(\mathbf{q}^{(k)}=\mathbf{A}\mathbf{p}^{(k)}\)

  10. \(\alpha_{k}=\cfrac{\rho_{k}}{{\mathbf{p}^{(k)}}^{\mathbf{T}}\mathbf{q}^{(k)}}\)

  11. \(\mathbf{u}^{(k+1)}=\mathbf{u}^{(k)}+\alpha_{k}\mathbf{p}^{(k)}\)

  12. \(\mathbf{r}^{(k+1)}=\mathbf{r}^{(k)}-\alpha_{k}\mathbf{q}^{(k)}\)

  13. \(\rho_{k+1}={\mathbf{r}^{(k+1)}}^{\mathbf{T}}\mathbf{r}^{(k+1)}\)

  14. \(\beta_{k}=\cfrac{\rho_{k+1}}{\rho_{k}}\)

  15. \(\mathbf{p}^{(k+1)}=\mathbf{r}^{(k+1)}+\beta_{k}\mathbf{p}^{(k)}\)

  16. check convergence; continue if necessary

  17. \(k\gets k+1\)

  18. end while

Linear Conjugate Gradient Algorithm

Steepest Descent Algorithm

  1. Given any initial value \(\mathbf{x}_{0}\) calculate the residual \(\mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}\).

  2. Go along the selected direction \(\mathbf{p}=\mathbf{r}_{0}\), calculate

\[\alpha =\cfrac{(\mathbf{r}_{0},\mathbf{r}_{0})}{(\mathbf{A}\mathbf{r}_{0},\mathbf{r}_{0})}\]

update

\[\mathbf{x}_{1}=\mathbf{x}_{0}+\alpha \mathbf{r}_{0}\]
  1. Repeat

The Method of Steepest Descent

Suppose we want to find the minimizer of an objective function, having the quadratic form:

\[f(\mathbf{x})=\cfrac{1}{2} \mathbf{x}^{\text{T}}\mathbf{A}\mathbf{x}-\mathbf{x}^{\text{T}} \mathbf{b}\]

where \(\mathbf{A}\) is a \(n\times n\) symmetric positive definite matrix. Since \(\mathbf{A}\) is positive definite, the minimum value point of the quadratic function \(f(\mathbf{x})\) must exist and be a stagnation point:

\[f(\mathbf{x}^{*})=\text{min}f(\mathbf{x})\quad\Leftrightarrow \quad \nabla f(\mathbf{x}^{*})=\mathbf{A}\mathbf{x}^{*}-\mathbf{b}=0\]

Calculate the gradient first

\[\begin{split}\begin{array}{l} \displaystyle f(\mathbf{x})=\cfrac{1}{2}\sum_{j=1}^{n}\sum_{k=1}^{n}a_{kj}x_{k}x_{j}-\sum_{j=1}^{n}x_{j}b_{j}\\ \displaystyle \cfrac{\partial f}{\partial x_{l}}=\cfrac{1}{2}\sum_{k=1}^{n}a_{kl}x_{k} +\cfrac{1}{2}\sum_{j=1}^{n}a_{lj}x_{j}-b_{l}=\sum_{k=1}^{n}a_{kl}x_{k}-b_{l}\\ \nabla f=\text{grad } f=\mathbf{A}\mathbf{x}-\mathbf{b} \end{array}\end{split}\]
  1. Given any initial value \(\mathbf{x}_{0}\) calculate the residual \(\mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}\).

  2. Go along the selected direction \(\mathbf{p}\), for example, \(\mathbf{p}=\mathbf{r}_{0}\) direction

\[\mathbf{x}_{1}= \mathbf{x}_{0}+\alpha \ast \mathbf{p}\]
\[\begin{split}\begin{array}{l} \nabla f=\text{grad } f=\mathbf{A}\mathbf{x}-\mathbf{b}\\ \mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}\\ \nabla f\bigg|_{x=x_{1}}\cdot\mathbf{p}=(\mathbf{r}_{1},\mathbf{p})=(\mathbf{b}-\mathbf{A}\mathbf{x}_{1},\mathbf{p})=0\\ (\mathbf{b}-\mathbf{A}\mathbf{x}_{1},\mathbf{p})= (\mathbf{b}-\mathbf{A}\mathbf{x}_{0}-\alpha\mathbf{A}\mathbf{p},\mathbf{p})= (\mathbf{r}_{0}-\alpha\mathbf{A}\mathbf{p},\mathbf{p})=0\\ \Rightarrow \alpha =\cfrac{(\mathbf{r}_{0},\mathbf{p})}{(\mathbf{A}\mathbf{p},\mathbf{p})} \end{array}\end{split}\]

Gradient

\[\begin{split}\begin{array}{l} \nabla f=\text{grad } f=\mathbf{A}\mathbf{x}-\mathbf{b}\\ -\nabla f=-\text{grad } f=\mathbf{b}-\mathbf{A}\mathbf{x}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} f(\mathbf{x})=\cfrac{1}{2} \mathbf{x}^{\text{T}}\mathbf{A}\mathbf{x}-\mathbf{x}^{\text{T}} \mathbf{b}\\ f(\mathbf{x}_{1})=\cfrac{1}{2} \mathbf{x}_{1}^{\text{T}}\mathbf{A}\mathbf{x}_{1}-\mathbf{x}_{1}^{\text{T}} \mathbf{b}\\ \mathbf{x}_{1}= \mathbf{x}_{0}+\alpha \ast \mathbf{p}\\ f(\mathbf{x}_{1})=\hat{f}(\alpha)=\cfrac{1}{2} {(\mathbf{x}_{0}+\alpha \ast \mathbf{p})}^{\text{T}}\mathbf{A}{(\mathbf{x}_{0}+\alpha \ast \mathbf{p})}-{(\mathbf{x}_{0}+\alpha \ast \mathbf{p})}^{\text{T}} \mathbf{b}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} f(\mathbf{x}_{1})=\hat{f}(\alpha)=\cfrac{1}{2} {(\mathbf{x}_{0}+\alpha \ast \mathbf{p})}^{\text{T}}\mathbf{A}{(\mathbf{x}_{0}+\alpha \ast \mathbf{p})}-{(\mathbf{x}_{0}+\alpha \ast \mathbf{p})}^{\text{T}} \mathbf{b}\\ \cfrac{d\hat{f}}{d\alpha} =\cfrac{1}{2}\mathbf{p}^{\text{T}}\mathbf{A}{(\mathbf{x}_{0}+\alpha \ast \mathbf{p})} +\cfrac{1}{2} {(\mathbf{x}_{0}+\alpha \ast \mathbf{p})}^{\text{T}}\mathbf{A}\mathbf{p}-\mathbf{p}^{\text{T}}\mathbf{b}\\ \cfrac{d\hat{f}}{d\alpha} =\cfrac{1}{2}\mathbf{p}^{\text{T}}\mathbf{A}\mathbf{x}_{0} +\cfrac{1}{2}\alpha \mathbf{p}^{\text{T}}\mathbf{A}{(\mathbf{p})} +\cfrac{1}{2} {(\mathbf{x}_{0})}^{\text{T}}\mathbf{A}\mathbf{p} +\cfrac{1}{2}\alpha {( \mathbf{p})}^{\text{T}}\mathbf{A}\mathbf{p} -\mathbf{p}^{\text{T}}\mathbf{b}\\ \cfrac{d\hat{f}}{d\alpha} =\cfrac{1}{2}\mathbf{p}^{\text{T}}\mathbf{A}\mathbf{x}_{0} +\cfrac{1}{2} {(\mathbf{x}_{0})}^{\text{T}}\mathbf{A}\mathbf{p} +\alpha {( \mathbf{p})}^{\text{T}}\mathbf{A}\mathbf{p} -\mathbf{p}^{\text{T}}\mathbf{b}\\ \end{array}\end{split}\]
\[{\displaystyle \left(\mathbf {AB} \right)^{\operatorname {T} }=\mathbf {B} ^{\operatorname {T} }\mathbf {A} ^{\operatorname {T} }.}\]
\[\begin{split}\begin{array}{l} {\displaystyle \left(\mathbf {AB} \right)^{\operatorname {T} }=\mathbf {B} ^{\operatorname {T} }\mathbf {A} ^{\operatorname {T} }} \\ (\mathbf{p}^{\text{T}}(\mathbf{A}\mathbf{x}_{0}))^{\text{T}}=(\mathbf{A}\mathbf{x}_{0})^{\text{T}}(\mathbf{p}^{\text{T}})^{\text{T}} =(\mathbf{A}\mathbf{x}_{0})^{\text{T}}\mathbf{p}=\mathbf{x}_{0}^{\text{T}}\mathbf{A}^{\text{T}}\mathbf{p} \end{array}\end{split}\]

where \(\mathbf{A}\) is a \(n\times n\) symmetric positive definite matrix. then

\[\begin{split}\mathbf {A}=\mathbf {A}^{\text{T}}\\\end{split}\]
\[\mathbf{x}_{0}^{\text{T}}\mathbf{A}^{\text{T}}\mathbf{p}=\mathbf{x}_{0}^{\text{T}}\mathbf{A}\mathbf{p}\]

Here again \(\mathbf{p}^{\text{T}}\mathbf{A}\mathbf{x}_{0}\), \({(\mathbf{x}_{0})}^{\text{T}}\mathbf{A}\mathbf{p}\) and \(\mathbf{p}^{\text{T}}\mathbf{b}\) are scalars. then,

\[\begin{split}\begin{array}{l} (\mathbf{p}^{\text{T}}\mathbf{A}\mathbf{x}_{0})^{\text{T}}=\mathbf{p}^{\text{T}}\mathbf{A}\mathbf{x}_{0}\\ ({(\mathbf{x}_{0})}^{\text{T}}\mathbf{A}\mathbf{p})^{\text{T}}={(\mathbf{x}_{0})}^{\text{T}}\mathbf{A}\mathbf{p}\\ (\mathbf{p}^{\text{T}}\mathbf{b})^{\text{T}}=\mathbf{p}^{\text{T}}\mathbf{b} \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} \cfrac{d\hat{f}}{d\alpha} =\cfrac{1}{2}\mathbf{p}^{\text{T}}\mathbf{A}\mathbf{x}_{0} +\cfrac{1}{2} {(\mathbf{x}_{0})}^{\text{T}}\mathbf{A}\mathbf{p} +\alpha {( \mathbf{p})}^{\text{T}}\mathbf{A}\mathbf{p} -\mathbf{p}^{\text{T}}\mathbf{b}=0\\ \cfrac{d\hat{f}}{d\alpha} =\mathbf{p}^{\text{T}}\mathbf{A}\mathbf{x}_{0} +\alpha {( \mathbf{p})}^{\text{T}}\mathbf{A}\mathbf{p} -\mathbf{p}^{\text{T}}\mathbf{b}=0\\ \alpha {( \mathbf{p})}^{\text{T}}\mathbf{A}\mathbf{p}=\mathbf{p}^{\text{T}}(\mathbf{b}-\mathbf{A}\mathbf{x}_{0})\\ \alpha=\cfrac{\mathbf{p}^{\text{T}}(\mathbf{b}-\mathbf{A}\mathbf{x}_{0})} {{( \mathbf{p})}^{\text{T}}\mathbf{A}\mathbf{p}} \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} \mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}\\ -\nabla f(\mathbf{x}_{0})=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}=\mathbf{r}_{0}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} \alpha=\cfrac{\mathbf{p}^{\text{T}}(\mathbf{b}-\mathbf{A}\mathbf{x}_{0})} {{( \mathbf{p})}^{\text{T}}\mathbf{A}\mathbf{p}} \\ \alpha=\cfrac{\mathbf{p}^{\text{T}}(-\nabla f(\mathbf{x}_{0}))} {{( \mathbf{p})}^{\text{T}}\mathbf{A}\mathbf{p}} \\ \alpha=\cfrac{\mathbf{p}^{\text{T}}(\mathbf{r}_{0})} {{( \mathbf{p})}^{\text{T}}\mathbf{A}\mathbf{p}} \\ \end{array}\end{split}\]

Inner product

\[\begin{split}\left \langle \mathbf{x},\mathbf{y} \right \rangle= \mathbf{x}^{\text{T}}\mathbf{y}\\\end{split}\]

then

\[\alpha=\cfrac{\mathbf{p}^{\text{T}}(\mathbf{r}_{0})} {{( \mathbf{p})}^{\text{T}}\mathbf{A}\mathbf{p}}= \cfrac{\left \langle \mathbf{p},\mathbf{r}_{0} \right \rangle}{\left \langle \mathbf{p},\mathbf{A}\mathbf{p} \right \rangle} =\cfrac{\left \langle \mathbf{r}_{0},\mathbf{p} \right \rangle}{\left \langle \mathbf{A}\mathbf{p},\mathbf{p} \right \rangle}\]

We can also write it as

  1. Given any initial value \(\mathbf{x}_{0}\) calculate the residual \(-\nabla f(\mathbf{x}_{0})=\mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}\).

  2. Go along the selected direction \(\mathbf{p}_{0}=\mathbf{r}_{0}=-\nabla f(\mathbf{x}_{0})\), calculate

\[\begin{split}\begin{array}{l} \alpha_{0} =\cfrac{(\mathbf{r}_{0},\mathbf{p}_{0})}{(\mathbf{A}\mathbf{p}_{0},\mathbf{p}_{0})} \\ \mathbf{x}_{1}= \mathbf{x}_{0}+\alpha_{0} \ast \mathbf{p}_{0}\\ \end{array}\end{split}\]
  1. Go along the selected direction \(\mathbf{p}_{1}=\mathbf{r}_{1}=-\nabla f(\mathbf{x}_{1})=\mathbf{b}-\mathbf{A}\mathbf{x}_{1}\), calculate

\[\begin{split}\begin{array}{l} \alpha_{1} =\cfrac{(\mathbf{r}_{1},\mathbf{p}_{1})}{(\mathbf{A}\mathbf{p}_{1},\mathbf{p}_{1})} \\ \mathbf{x}_{2}= \mathbf{x}_{1}+\alpha_{1} \ast \mathbf{p}_{1}\\ \end{array}\end{split}\]
  1. Go along the selected direction \(\mathbf{p}_{k}=\mathbf{r}_{k}=-\nabla f(\mathbf{x}_{k})=\mathbf{b}-\mathbf{A}\mathbf{x}_{k}\), calculate

\[\begin{split}\begin{array}{l} \alpha_{k} =\cfrac{(\mathbf{r}_{k},\mathbf{p}_{k})}{(\mathbf{A}\mathbf{p}_{k},\mathbf{p}_{k})} \\ \mathbf{x}_{k+1}= \mathbf{x}_{k}+\alpha_{k} \ast \mathbf{p}_{k}\\ \end{array}\end{split}\]

A-orthogonality

Let \(\mathbf{A}\) be a symmetric and positive definite matrix. Two vectors \(\mathbf{u}\) and \(\mathbf{v}\) are \(\mathbf{A}\)-orthogonal if

\[\mathbf{u}^{\text{T}}\mathbf{A}\mathbf{v}=0,\quad \mathbf{u}\ne\mathbf{v}\]

Suppose that \(\{\mathbf{p}_{k}\}\) is a sequence of \(n\) mutually conjugate directions. Then the \(\mathbf{p}_{k}\) form a basis of \(\mathbf{R}^{n}\), so we can expand the solution \(\mathbf{x}_{*}\) of \(\mathbf{A}\mathbf{a}=\mathbf{b}\) in this basis:

\[\mathbf{x}_{*} = \sum_{i=1}^{n}\alpha_{i}\mathbf{p}_{i}\]

The coefficients are given by

\[\mathbf{b}= \mathbf{A}\mathbf{x}_{*} = \sum_{i=1}^{n}\alpha_{i}\mathbf{A}\mathbf{p}_{i}\]

The conjugate gradient method

Matrix \(\mathbf{A}\) is symmetric and positive definite, we say that two non-zero vectors \(\mathbf{u}\) and \(\mathbf{v}\) are conjugate (with respect to \(\mathbf{A}\)) if

\[(\mathbf{u},\mathbf{v})_{\mathbf{A}}=(\mathbf{A}\mathbf{u},\mathbf{v})=\mathbf{v}^{\text{T}}\mathbf{A}\mathbf{u}= \sum_{j=1}^{n}\sum_{k=1}^{n}a_{jk}u_{j}v{k}=0\]

So, two vectors are conjugate if they are orthogonal with respect to this inner product. Being conjugate is a symmetric relation: if \(\mathbf{u}\) is conjugate to \(\mathbf{v}\), then \(\mathbf{v}\) is conjugate to \(\mathbf{u}\). (Note: This notion of conjugate is not related to the notion of complex conjugate.)

  1. Given any initial value \(\mathbf{x}_{0}\) calculate the residual \(\mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}\).

  2. Go along the selected direction \(\mathbf{p}_{0}=\mathbf{r}_{0}\), calculate

\[\mathbf{x}_{1}= \mathbf{x}_{0}+\alpha_{0} \ast \mathbf{p}_{0}, \quad \alpha_{0} =\cfrac{(\mathbf{r}_{0},\mathbf{p}_{0})}{(\mathbf{A}\mathbf{p}_{0},\mathbf{p}_{0})}\]
  1. A new residual \(\mathbf{r}_{1}=\mathbf{b}-\mathbf{A}\mathbf{x}_{1}\) is obtained from \(\mathbf{x}_{1}\), and based on \(\mathbf{r}_{1}\), A projection is made on \(\mathbf{p}_{0}\) to obtain a new forward direction \(\mathbf{p}_{1}\)

\[\begin{array}{l} \mathbf{p}_{1}= \mathbf{r}_{1}-\beta_{1} \ast \mathbf{p}_{0}, \quad \beta_{1} =\cfrac{(\mathbf{A}\mathbf{p}_{0},\mathbf{r}_{1})}{(\mathbf{A}\mathbf{p}_{0},\mathbf{p}_{0})} \end{array}\]
  1. Repeat

\[\mathbf{x}_{k+1}= \mathbf{x}_{k}+\alpha_{k} \ast \mathbf{p}_{k}\to \mathbf{r}_{k+1}\to \mathbf{p}_{k+1}\to \mathbf{x}_{k+2}\]
\[\begin{split}\left\{\begin{array}{ll} \mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}, & \mathbf{p}_{0}=\mathbf{r}_{0}\\ \mathbf{x}_{k+1}= \mathbf{x}_{k}+\alpha_{k} \ast \mathbf{p}_{k}\to &\mathbf{r}_{k+1}=\mathbf{p}_{k+1} \end{array}\right. \Rightarrow \left\{\begin{array}{ll} \mathbf{r}_{0},\mathbf{r}_{1},\cdots,\mathbf{r}_{n}\\ \mathbf{p}_{0},\mathbf{p}_{1},\cdots,\mathbf{p}_{n}\\ \end{array}\right.\end{split}\]

Suppose we want to find the minimizer of an objective function, having the quadratic form:

\[f(\mathbf{x})=\cfrac{1}{2} \mathbf{x}^{\text{T}}\mathbf{A}\mathbf{x}-\mathbf{x}^{\text{T}} \mathbf{b}\]

Gradient

\[\begin{split}\begin{array}{l} \nabla f=\text{grad } f=\mathbf{A}\mathbf{x}-\mathbf{b}\\ -\nabla f=-\text{grad } f=\mathbf{b}-\mathbf{A}\mathbf{x}\\ \end{array}\end{split}\]

We can also write it as

  1. Given any initial value \(\mathbf{x}_{0}\) calculate the residual \(-\nabla f(\mathbf{x}_{0})=\mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}\).

  2. Go along the selected direction \(\mathbf{p}_{0}=\mathbf{r}_{0}=-\nabla f(\mathbf{x}_{0})\), calculate

\[\begin{split}\begin{array}{l} \alpha_{0} =\cfrac{(\mathbf{r}_{0},\mathbf{p}_{0})}{(\mathbf{A}\mathbf{p}_{0},\mathbf{p}_{0})} \\ \mathbf{x}_{1}= \mathbf{x}_{0}+\alpha_{0} \ast \mathbf{p}_{0}\\ \end{array}\end{split}\]
  1. A new residual \(-\nabla f(\mathbf{x}_{1})=\mathbf{r}_{1}=\mathbf{b}-\mathbf{A}\mathbf{x}_{1}\) is obtained from \(\mathbf{x}_{1}\), and based on \(\mathbf{r}_{1}\), A projection is made on \(\mathbf{p}_{0}\) to obtain a new forward direction \(\mathbf{p}_{1}\)

\[\begin{array}{l} \mathbf{p}_{1}= \mathbf{r}_{1}-\beta_{1} \ast \mathbf{p}_{0}, \quad \beta_{1} =\cfrac{(\mathbf{A}\mathbf{p}_{0},\mathbf{r}_{1})}{(\mathbf{A}\mathbf{p}_{0},\mathbf{p}_{0})} \end{array}\]
\[\begin{split}\begin{array}{l} \alpha_{1} =\cfrac{(\mathbf{r}_{1},\mathbf{p}_{1})}{(\mathbf{A}\mathbf{p}_{1},\mathbf{p}_{1})} \\ \mathbf{x}_{2}= \mathbf{x}_{1}+\alpha_{1} \ast \mathbf{p}_{1}\\ \end{array}\end{split}\]
  1. A new residual \(-\nabla f(\mathbf{x}_{k})=\mathbf{r}_{k}=\mathbf{b}-\mathbf{A}\mathbf{x}_{k}\) is obtained from \(\mathbf{x}_{k}\), and based on \(\mathbf{r}_{k}\), A projection is made on \(\mathbf{p}_{k-1}\) to obtain a new forward direction \(\mathbf{p}_{k}\)

\[\begin{array}{l} \mathbf{p}_{k}= \mathbf{r}_{k}-\beta_{k} \ast \mathbf{p}_{k-1}, \quad \beta_{k} =\cfrac{(\mathbf{A}\mathbf{p}_{k-1},\mathbf{r}_{k})}{(\mathbf{A}\mathbf{p}_{k-1},\mathbf{p}_{k-1})} \end{array}\]
\[\begin{split}\begin{array}{l} \alpha_{k} =\cfrac{(\mathbf{r}_{k},\mathbf{p}_{k})}{(\mathbf{A}\mathbf{p}_{k},\mathbf{p}_{k})} \\ \mathbf{x}_{k+1}= \mathbf{x}_{k}+\alpha_{k} \ast \mathbf{p}_{k}\\ \end{array}\end{split}\]

Example 1

Solve \(\mathbf{A}\mathbf{x}=\mathbf{b}\)

where

\[\begin{split}\mathbf{A} =\begin{bmatrix} 3&2 \\ 2&6 \end{bmatrix}\quad \mathbf{x}=\begin{bmatrix} x_{1}\\x_{2} \end{bmatrix}\quad \mathbf{b}=\begin{bmatrix} 2\\-8 \end{bmatrix}\end{split}\]

The quadratic form:

\[f(\mathbf{x})=\cfrac{1}{2}\mathbf{x}^{\text{T}}\mathbf{A}\mathbf{x}-\mathbf{b}^{\text{T}}\mathbf{x}\]
\[\begin{split}\begin{array}{l} f(\mathbf{x})=\cfrac{1}{2}\begin{bmatrix} x_{1}&x_{2} \end{bmatrix} \begin{bmatrix} a_{11}& a_{12}\\ a_{21}&a_{22} \end{bmatrix}\begin{bmatrix} x_{1}\\x_{2} \end{bmatrix} -\begin{bmatrix} b_{1}&b_{2} \end{bmatrix}\begin{bmatrix} x_{1}\\x_{2} \end{bmatrix}\\ f(\mathbf{x})=\cfrac{1}{2}\begin{bmatrix} x_{1}&x_{2} \end{bmatrix}\begin{bmatrix} a_{11}x_{1}+a_{12}x_{2}\\a_{21}x_{1}+a_{22}x_{2} \end{bmatrix}-b_{1}x_{1}-b_{2}x_{2}\\ f(\mathbf{x})=\cfrac{1}{2}(a_{11}x_{1}^{2}+a_{12}x_{1}x_{2}+a_{21}x_{1}x_{2}+a_{22}x_{2}^{2})-b_{1}x_{1}-b_{2}x_{2} \end{array}\end{split}\]

In general

\[f(\mathbf{x})=\cfrac{1}{2}a_{ij}x_{i}x_{j}-b_{j}x_{j}\]

The exact solution

\[\begin{split}\mathbf{x}=\begin{bmatrix} x_{1}\\x_{2} \end{bmatrix}= \begin{bmatrix} 2\\-2 \end{bmatrix}\end{split}\]

using steepest descent method

  1. Given any initial value \(\mathbf{x}_{0}\) calculate the residual \(-\nabla f(\mathbf{x}_{0})=\mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}\).

\[\begin{split}\mathbf{x}_{0}= \begin{bmatrix} 0\\0 \end{bmatrix}\end{split}\]
\[\begin{split}-\nabla f(\mathbf{x}_{0})=\mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}=\mathbf{b}= \begin{bmatrix} 2\\-8 \end{bmatrix}\end{split}\]
  1. Go along the selected direction \(\mathbf{p}_{0}=\mathbf{r}_{0}=-\nabla f(\mathbf{x}_{0})\), calculate

\[\begin{split}\begin{array}{l} \alpha_{0} =\cfrac{(\mathbf{r}_{0},\mathbf{p}_{0})}{(\mathbf{A}\mathbf{p}_{0},\mathbf{p}_{0})} \\ \mathbf{r}_{0}=[2,-8]^{\text{T}}\\ \mathbf{A}\mathbf{p}_{0}=[-10,44]^{\text{T}}\\ (\mathbf{r}_{0},\mathbf{p}_{0})=2*2+8*8=68\\ (\mathbf{A}\mathbf{p}_{0},\mathbf{p}_{0})=332\\ \alpha_{0}=68/332=0.2048\\ \mathbf{x}_{1}= \mathbf{x}_{0}+\alpha_{0} \ast \mathbf{p}_{0}=[0.4096,-1.6386]^{\text{T}}\\ \end{array}\end{split}\]
  1. Go along the selected direction \(\mathbf{p}_{1}=\mathbf{r}_{1}=-\nabla f(\mathbf{x}_{1})=\mathbf{b}-\mathbf{A}\mathbf{x}_{1}=[4.0482,1.0120]^{\text{T}}\), calculate

\[\begin{split}\begin{array}{l} \alpha_{1} =\cfrac{(\mathbf{r}_{1},\mathbf{p}_{1})}{(\mathbf{A}\mathbf{p}_{1},\mathbf{p}_{1})} \\ \mathbf{x}_{2}= \mathbf{x}_{1}+\alpha_{1} \ast \mathbf{p}_{1}\\ \end{array}\end{split}\]

using conjugate gradient method

  1. Given any initial value \(\mathbf{x}_{0}\) calculate the residual \(-\nabla f(\mathbf{x}_{0})=\mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}\).

\[\begin{split}\mathbf{x}_{0}= \begin{bmatrix} 0\\0 \end{bmatrix}\end{split}\]
\[\begin{split}-\nabla f(\mathbf{x}_{0})=\mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}=\mathbf{b}= \begin{bmatrix} 2\\-8 \end{bmatrix}\end{split}\]
  1. Go along the selected direction \(\mathbf{p}_{0}=\mathbf{r}_{0}=-\nabla f(\mathbf{x}_{0})=[2,-8]^{\text{T}}\), calculate

\[\begin{split}\begin{array}{l} \alpha_{0} =\cfrac{(\mathbf{r}_{0},\mathbf{p}_{0})}{(\mathbf{A}\mathbf{p}_{0},\mathbf{p}_{0})} \\ \mathbf{r}_{0}=[2,-8]^{\text{T}}\\ \mathbf{A}\mathbf{p}_{0}=[-10,44]^{\text{T}}\\ (\mathbf{r}_{0},\mathbf{p}_{0})=2*2+8*8=68\\ (\mathbf{A}\mathbf{p}_{0},\mathbf{p}_{0})=332\\ \alpha_{0}=68/332=0.2048\\ \mathbf{x}_{1}= \mathbf{x}_{0}+\alpha_{0} \ast \mathbf{p}_{0}=[0.4096,-1.6386]^{\text{T}}\\ \end{array}\end{split}\]
  1. A new residual \(-\nabla f(\mathbf{x}_{1})=\mathbf{r}_{1}=\mathbf{b}-\mathbf{A}\mathbf{x}_{1}=[4.0482,1.0120]^{\text{T}}\) is obtained from \(\mathbf{x}_{1}\), and based on \(\mathbf{r}_{1}\), A projection is made on \(\mathbf{p}_{0}\) to obtain a new forward direction \(\mathbf{p}_{1}\)

\[\begin{split}\begin{array}{l} \quad \beta_{1} =\cfrac{(\mathbf{A}\mathbf{p}_{0},\mathbf{r}_{1})}{(\mathbf{A}\mathbf{p}_{0},\mathbf{p}_{0})}=-0.2561 \\ \mathbf{p}_{1}= \mathbf{r}_{1}-\beta_{1} \ast \mathbf{p}_{0} =[4.5603,-1.0364]^{\text{T}} \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} \alpha_{1} =\cfrac{(\mathbf{r}_{1},\mathbf{p}_{1})}{(\mathbf{A}\mathbf{p}_{1},\mathbf{p}_{1})}=0.3487 \\ \mathbf{x}_{2}= \mathbf{x}_{1}+\alpha_{1} \ast \mathbf{p}_{1}=[2,-2]^{\text{T}}\\ \end{array}\end{split}\]

Solving the 1D Poisson equation using conjugate gradient method

Consider the 1D Poisson equation

\[\cfrac{d^{2}u}{dx^{2}} =f(x)=-1\]

on \(\Omega=[0,1]\) with boundary conditions

\[\begin{split}\begin{array}{l} u(0)=0\\ u'(1)=\cfrac{du}{dx}\bigg|_{x=1}=0 \end{array}\end{split}\]

which has analytical solution

\[u=x-\cfrac{1}{2}x^{2}\]

We we will use this specific example to investigate various approaches to solving partial differential equations with finite differences, in which we discretize the domain by defining \(N\) equally spaced points

\[\begin{split}\begin{array}{l} \cfrac{u_{i+1}-2u_{i}+u_{i-1}}{\Delta x^{2}} =f_{i}\\ {u_{i+1}-2u_{i}+u_{i-1}}={\Delta x^{2}}f_{i}\\ -u_{i-1}+2u_{i}-u_{i+1}=-{\Delta x^{2}}f_{i}={\Delta x^{2}}\\ \end{array}\end{split}\]

The boundary conditions require special care. For \(x = 0\) we have a Dirichlet boundary condition which allows us to fix the value \(u_{1} = 0\). For \(x = 1\) we have a Neumann boundary condition \(du/dx = 0\). This is a symmetry boundary condition, so that in this case we can imagine a ’ghost’ point \(u_{N+1}\) which is always equal to \(u_{N-1}\). This leads to the expression for point \(x_{N}\):

\[\begin{split}\begin{array}{l} u_{N}=u_{N-1}+\cfrac{1}{2}\Delta x^{2}\\ \end{array}\end{split}\]

then

\[\begin{split}\begin{array}{l} -u_{i-1}+2u_{i}-u_{i+1}={\Delta x^{2}}\\ -u_{1}+2u_{2}-u_{3}={\Delta x^{2}}\\ -u_{2}+2u_{3}-u_{4}={\Delta x^{2}}\\ \cdots\\ -u_{N-3}+2u_{N-2}-u_{N-1}={\Delta x^{2}}\\ -u_{N-2}+2u_{N-1}-u_{N}={\Delta x^{2}}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} 0+2u_{2}-u_{3}={\Delta x^{2}}\\ -u_{2}+2u_{3}-u_{4}={\Delta x^{2}}\\ \cdots\\ -u_{N-3}+2u_{N-2}-u_{N-1}={\Delta x^{2}}\\ -u_{N-2}+2u_{N-1}-(u_{N-1}+\cfrac{1}{2}\Delta x^{2} )={\Delta x^{2}}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} 0+2u_{2}-u_{3}={\Delta x^{2}}\\ -u_{2}+2u_{3}-u_{4}={\Delta x^{2}}\\ \cdots\\ -u_{N-3}+2u_{N-2}-u_{N-1}={\Delta x^{2}}\\ -u_{N-2}+u_{N-1}=\cfrac{3}{2}{\Delta x^{2}}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} 2u_{2}-u_{3}={\Delta x^{2}}\\ -u_{2}+2u_{3}-u_{4}={\Delta x^{2}}\\ \cdots\\ -u_{N-3}+2u_{N-2}-u_{N-1}={\Delta x^{2}}\\ -u_{N-2}+u_{N-1}=\cfrac{3}{2}{\Delta x^{2}}\\ \end{array}\end{split}\]

This can be written as

\[\mathbf{A}\mathbf{u}=\mathbf{b}\]

where(for \(N=5\))

\[\begin{split}\mathbf{A}= \begin{bmatrix} 2& -1 & 0& 0& 0\\ -1& 2 & -1& 0& 0\\ 0&-1& 2 & -1& 0\\ 0&0&-1& 2 & -1\\ 0&0 &0 &-1 &1 \end{bmatrix}\end{split}\]
\[\begin{split}\mathbf{u}=\begin{bmatrix} u_{2}\\u_{3}\\u_{4}\\u_{5}\\u_{6} \end{bmatrix}=\mathbf{v}= \begin{bmatrix} v_{1}\\v_{2}\\v_{3}\\v_{4}\\v_{5} \end{bmatrix}\end{split}\]
\[\begin{split}\mathbf{b}= \begin{bmatrix} \Delta x^{2}\\\Delta x^{2}\\\Delta x^{2}\\\Delta x^{2}\\\cfrac{3}{2}\Delta x^{2} \end{bmatrix}\end{split}\]
\[\begin{split}\begin{array}{l} 1&\underbrace{\Delta x}& 2&\underbrace{\Delta x}&3&\underbrace{\Delta x}&4&\underbrace{\Delta x}&5&\underbrace{\Delta x}&6&\underbrace{\Delta x}&7\\ &&v_{1}&&v_{2}&&v_{3}&&v_{4}&&v_{5}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} N_{\text{Points}}=7\\ N=N_{\text{Points}}-2=5\\ \Delta x=\cfrac{1}{N_{\text{Points}}-1} =\cfrac{1}{7-1}=\cfrac{1}{6}\\ \Delta x^{2}=\cfrac{1}{36}\\ \cfrac{3}{2}\Delta x^{2}=\cfrac{3}{72}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} x_{1}=0\Delta x=0\\ x_{2}=1\Delta x=1/6\\ x_{3}=2\Delta x=2/6\\ x_{4}=3\Delta x=3/6\\ x_{5}=4\Delta x=4/6\\ x_{6}=5\Delta x=5/6\\ x_{7}=6\Delta x=1\\ \end{array}\end{split}\]

Solve \(\mathbf{A}\mathbf{x}=\mathbf{b}\)

where

\[\begin{split}\mathbf{A}= \begin{bmatrix} 2& -1 & 0& 0& 0\\ -1& 2 & -1& 0& 0\\ 0&-1& 2 & -1& 0\\ 0&0&-1& 2 & -1\\ 0&0 &0 &-1 &1 \end{bmatrix}\end{split}\]
\[\begin{split}\mathbf{v}= \begin{bmatrix} v_{1}\\v_{2}\\v_{3}\\v_{4}\\v_{5} \end{bmatrix}\end{split}\]
\[\begin{split}\mathbf{b}= \begin{bmatrix} \Delta x^{2}\\\Delta x^{2}\\\Delta x^{2}\\\Delta x^{2}\\\cfrac{1}{2}\Delta x^{2} \end{bmatrix} = \begin{bmatrix} \cfrac{1}{36} \\\cfrac{1}{36} \\\cfrac{1}{36} \\\cfrac{1}{36} \\\cfrac{1}{72} \end{bmatrix}\end{split}\]

The quadratic form:

\[f(\mathbf{v})=\cfrac{1}{2}\mathbf{v}^{\text{T}}\mathbf{A}\mathbf{v}-\mathbf{b}^{\text{T}}\mathbf{v}\]

The exact soltion:

\[\begin{split}u(x)=x-\cfrac{1}{2}x^{2}\\\end{split}\]
\[\begin{split}\mathbf{u}=\begin{bmatrix} u(0)\\u(1/6)\\u(2/6)\\u(3/6)\\u(4/6)\\u(5/6)\\u(1) \end{bmatrix} =\begin{bmatrix} 0\\11/72\\20/72\\27/72\\32/72\\35/72\\36/72 \end{bmatrix} =\begin{bmatrix} 0\\0.1528\\0.2778\\0.3750\\0.4444\\0.4861\\0.5000 \end{bmatrix}\end{split}\]

another form

\[\cfrac{u_{i-1}-2u_{i}+u_{i+1}}{\Delta x^{2}} =f_{i}=-1\]
\[\begin{split}\begin{array}{l} \cfrac{u_{1}-2u_{2}+u_{3}}{\Delta x^{2}} =f_{2}=-1\\ \cfrac{u_{2}-2u_{3}+u_{4}}{\Delta x^{2}} =f_{3}=-1\\ \cdots\\ \cfrac{u_{i-1}-2u_{i}+u_{i+1}}{\Delta x^{2}} =f_{i}=-1\\ \cdots\\ \cfrac{u_{N-3}-2u_{N-2}+u_{N-1}}{\Delta x^{2}} =f_{N-2}=-1\\ \cfrac{u_{N-2}-2u_{N-1}+u_{N}}{\Delta x^{2}} =f_{N-1}=-1\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} u_{1}=0\\ u_{N}=u_{N-1}+\cfrac{1}{2}\Delta x^{2} \\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} u_{1}=0\\ \cfrac{u_{1}-2u_{2}+u_{3}}{\Delta x^{2}} =f_{2}=-1\\ \cfrac{-2u_{2}+u_{3}}{\Delta x^{2}} =f_{2}-\cfrac{u_{1}}{\Delta x^{2}}=-1-\cfrac{u_{1}}{\Delta x^{2}}=-1\\ \cfrac{-2u_{2}+u_{3}}{\Delta x^{2}} =\hat{f}_{2}=f_{2}-\cfrac{u_{1}}{\Delta x^{2}}=-1-\cfrac{u_{1}}{\Delta x^{2}}=-1\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} u_{N}=u_{N-1}+\cfrac{1}{2}\Delta x^{2} \\ \cfrac{u_{N-2}-2u_{N-1}+u_{N}}{\Delta x^{2}} =f_{N-1}=-1\\ \cfrac{u_{N-2}-2u_{N-1}+u_{N-1}+\cfrac{1}{2}\Delta x^{2}}{\Delta x^{2}} =f_{N-1}=-1\\ \cfrac{u_{N-2}-u_{N-1}}{\Delta x^{2}} =\hat{f}_{N-1}=f_{N-1}-\cfrac{1}{2}=-\cfrac{3}{2}\\ \end{array}\end{split}\]

Let

\[\begin{split}\mathbf{u}=\begin{bmatrix} u_{2}\\u_{3}\\\vdots\\u_{N-2}\\u_{N-1} \end{bmatrix}\end{split}\]

then

\[\mathbf{A}\mathbf{u}=\mathbf{f}\]

where

\[\begin{split}\mathbf{f}=\begin{bmatrix} \hat{f}_{2}\\f_{3}\\\vdots\\f_{N-2}\\\hat{f}_{N-1} \end{bmatrix}=\begin{bmatrix} -1\\-1\\\vdots\\-1\\-3/2 \end{bmatrix}\end{split}\]
\[\begin{split}\mathbf{A}\mathbf{u}=\begin{bmatrix} \cfrac{-2u_{2}+u_{3}}{\Delta x^{2}}\\ \cfrac{u_{2}-2u_{3}+u_{4}}{\Delta x^{2}}\\ \vdots\\ \cfrac{u_{N-3}-2u_{N-2}+u_{N-1}}{\Delta x^{2}}\\ \cfrac{u_{N-2}-u_{N-1}}{\Delta x^{2}} \end{bmatrix}\end{split}\]

using the Method of Steepest Descent

Let \(\mathbf{b}=\mathbf{f}\), then

\[g(\mathbf{u})=\cfrac{1}{2} \mathbf{u}^{\text{T}}\mathbf{A}\mathbf{u}-\mathbf{b}^{\text{T}}\mathbf{u}\]
  1. Given any initial value \(\mathbf{u}_{0}\) calculate the residual \(-\nabla g(\mathbf{u}_{0})=\mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{u}_{0}\).

  2. Go along the selected direction \(\mathbf{p}_{0}=\mathbf{r}_{0}=-\nabla g(\mathbf{u}_{0})\), calculate

\[\begin{split}\begin{array}{l} \alpha_{0} =\cfrac{(\mathbf{r}_{0},\mathbf{p}_{0})}{(\mathbf{A}\mathbf{p}_{0},\mathbf{p}_{0})} \\ \mathbf{u}_{1}= \mathbf{u}_{0}+\alpha_{0} \ast \mathbf{p}_{0}\\ \end{array}\end{split}\]

where

\[\begin{split}\mathbf{A}\mathbf{p}_{0}=\begin{bmatrix} \cfrac{-2p_{2}^{(0)}+p_{3}^{(0)}}{\Delta x^{2}}\\ \cfrac{p_{2}^{(0)}-2p_{3}^{(0)}+p_{4}^{(0)}}{\Delta x^{2}}\\ \vdots\\ \cfrac{p_{N-3}^{(0)}-2p_{N-2}^{(0)}+p_{N-1}^{(0)}}{\Delta x^{2}}\\ \cfrac{p_{N-2}^{(0)}-p_{N-1}^{(0)}}{\Delta x^{2}} \end{bmatrix}\end{split}\]
  1. Go along the selected direction \(\mathbf{p}_{1}=\mathbf{r}_{1}=-\nabla g(\mathbf{u}_{1})=\mathbf{b}-\mathbf{A}\mathbf{u}_{1}\), calculate

\[\begin{split}\begin{array}{l} \alpha_{1} =\cfrac{(\mathbf{r}_{1},\mathbf{p}_{1})}{(\mathbf{A}\mathbf{p}_{1},\mathbf{p}_{1})} \\ \mathbf{u}_{2}= \mathbf{u}_{1}+\alpha_{1} \ast \mathbf{p}_{1}\\ \end{array}\end{split}\]

where

\[\begin{split}\mathbf{A}\mathbf{p}_{1}=\begin{bmatrix} \cfrac{-2p_{2}^{(1)}+p_{3}^{(1)}}{\Delta x^{2}}\\ \cfrac{p_{2}^{(1)}-2p_{3}^{(1)}+p_{4}^{(1)}}{\Delta x^{2}}\\ \vdots\\ \cfrac{p_{N-3}^{(1)}-2p_{N-2}^{(1)}+p_{N-1}^{(1)}}{\Delta x^{2}}\\ \cfrac{p_{N-2}^{(1)}-p_{N-1}^{(1)}}{\Delta x^{2}} \end{bmatrix}\end{split}\]
  1. Go along the selected direction \(\mathbf{p}_{k}=\mathbf{r}_{k}=-\nabla g(\mathbf{u}_{k})=\mathbf{b}-\mathbf{A}\mathbf{u}_{k}\), calculate

\[\begin{split}\begin{array}{l} \alpha_{k} =\cfrac{(\mathbf{r}_{k},\mathbf{p}_{k})}{(\mathbf{A}\mathbf{p}_{k},\mathbf{p}_{k})} \\ \mathbf{u}_{k+1}= \mathbf{u}_{k}+\alpha_{k} \ast \mathbf{p}_{k}\\ \end{array}\end{split}\]

where

\[\begin{split}\mathbf{A}\mathbf{p}_{k}=\begin{bmatrix} \cfrac{-2p_{2}^{(k)}+p_{3}^{(k)}}{\Delta x^{2}}\\ \cfrac{p_{2}^{(k)}-2p_{3}^{(k)}+p_{4}^{(k)}}{\Delta x^{2}}\\ \vdots\\ \cfrac{p_{N-3}^{(k)}-2p_{N-2}^{(k)}+p_{N-1}^{(k)}}{\Delta x^{2}}\\ \cfrac{p_{N-2}^{(k)}-p_{N-1}^{(k)}}{\Delta x^{2}} \end{bmatrix}\end{split}\]

The conjugate gradient method from code

  1. Given any initial value \(\mathbf{x}_{0}\) calculate the residual \(-\nabla f(\mathbf{x}_{0})=\mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}\).

  2. Go along the selected direction \(\mathbf{p}_{0}=\mathbf{r}_{0}=-\nabla f(\mathbf{x}_{0})\), calculate

\[\begin{split}\begin{array}{l} \mathbf{r}_{0}=\mathbf{b}-\mathbf{A}\mathbf{x}_{0}\\ \mathbf{p}_{0}=\mathbf{r}_{0}\\ \delta_{new}=(\mathbf{r}_{0},\mathbf{r}_{0})\\ \text{while_loop:}\\ \alpha_{k} =\cfrac{(\mathbf{r}_{k},\mathbf{r}_{k})}{(\mathbf{A}\mathbf{p}_{k},\mathbf{p}_{k})} \\ \mathbf{x}_{k+1}= \mathbf{x}_{k}+\alpha_{k} \ast \mathbf{p}_{k}\\ \mathbf{r}_{k+1}=\mathbf{b}-\mathbf{A}\mathbf{x}_{k+1}\\ \delta_{old}=\delta_{new}\\ \delta_{new}=(\mathbf{r}_{k+1},\mathbf{r}_{k+1})\\ \beta_{k}=\cfrac{\delta_{new}}{\delta_{old}}=\cfrac{(\mathbf{r}_{k+1},\mathbf{r}_{k+1})}{(\mathbf{r}_{k},\mathbf{r}_{k})}\\ \mathbf{p}_{k+1}=\mathbf{r}_{k+1}+\beta_{k}\mathbf{p}_{k} \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} \mathbf{r}_{k}=\mathbf{b}-\mathbf{A}\mathbf{x}_{k}\\ \mathbf{r}_{k+1}=\mathbf{b}-\mathbf{A}\mathbf{x}_{k+1}\\ \mathbf{r}_{k+1}-\mathbf{r}_{k}=-\mathbf{A}(\mathbf{x}_{k+1}-\mathbf{x}_{k})\\ \mathbf{x}_{k+1}= \mathbf{x}_{k}+\alpha_{k} \ast \mathbf{p}_{k}\\ \mathbf{r}_{k+1}-\mathbf{r}_{k}=-\mathbf{A}(\mathbf{x}_{k+1}-\mathbf{x}_{k})=-\alpha_{k} \mathbf{A} \mathbf{p}_{k}\\ \mathbf{r}_{k+1}-\mathbf{r}_{k}=-\alpha_{k} \mathbf{A} \mathbf{p}_{k}\\ \mathbf{r}_{k+1}=\mathbf{r}_{k}-\alpha_{k} \mathbf{A} \mathbf{p}_{k}\\ \end{array}\end{split}\]

Solving the 2D Poisson equation

Consider the 2D Poisson equation

\[\cfrac{\partial^{2}u}{\partial x^{2}}+\cfrac{\partial^{2}u}{\partial x^{2}} =f(x,y)=-1\]

on \(\Omega=[0,1]\times[0,1]\) with boundary conditions

\[\begin{split}\begin{array}{l} u(0,j)=0\\ u'(1,j)=\cfrac{\partial u}{\partial x}\bigg|_{x=1}=0 \end{array}\end{split}\]

which has analytical solution

\[u(x,y)=x-\cfrac{1}{2}x^{2}\]

We we will use this specific example to investigate various approaches to solving partial differential equations with finite differences, in which we discretize the domain by defining \(N\) equally spaced points

\[\begin{split}\begin{array}{l} \cfrac{u_{i+1,j}-2u_{i,j}+u_{i-1,j}}{\Delta x^{2}} + \cfrac{u_{i,j+1}-2u_{i,j}+u_{i,j-1}}{\Delta y^{2}} =f_{i,j}\\ \cfrac{u_{i-1,j}-2u_{i,j}+u_{i+1,j}}{\Delta x^{2}} + \cfrac{u_{i,j-1}-2u_{i,j}+u_{i,j+1}}{\Delta y^{2}} =f_{i,j}\\ \end{array}\end{split}\]

The boundary conditions require special care. For \(x = 0\) we have a Dirichlet boundary condition which allows us to fix the value \(u_{1} = 0\). For \(x = 1\) we have a Neumann boundary condition \(du/dx = 0\). This is a symmetry boundary condition, so that in this case we can imagine a ’ghost’ point \(u_{M+1,j}\) which is always equal to \(u_{M-1,j}\). This leads to the expression for point \(x_{M,j}\)

\[\begin{split}\begin{array}{l} u_{M,j}=u_{M-1,j}+\cfrac{1}{2}\Delta x^{2}\\ \end{array}\end{split}\]

then

\[\begin{split}\begin{array}{l} j=2\\ \cfrac{u_{i-1,2}-2u_{i,2}+u_{i+1,2}}{\Delta x^{2}} + \cfrac{u_{i,2-1}-2u_{i,2}+u_{i,2+1}}{\Delta y^{2}} =f_{i,2}\\ \cfrac{u_{i-1,2}-2u_{i,2}+u_{i+1,2}}{\Delta x^{2}} + \cfrac{u_{i,1}-2u_{i,2}+u_{i,3}}{\Delta y^{2}} =f_{i,2}\\ \cfrac{u_{i-1,2}-2u_{i,2}+u_{i+1,2}}{\Delta x^{2}} + \cfrac{-u_{i,2}+u_{i,3}}{\Delta y^{2}} =f_{i,2}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} j=N-1\\ \cfrac{u_{i-1,N-1}-2u_{i,N-1}+u_{i+1,N-1}}{\Delta x^{2}} + \cfrac{u_{i,N-1-1}-2u_{i,N-1}+u_{i,N-1+1}}{\Delta y^{2}} =f_{N-1,2}\\ \cfrac{u_{i-1,N-1}-2u_{i,N-1}+u_{i+1,N-1}}{\Delta x^{2}} + \cfrac{u_{i,N-2}-2u_{i,N-1}+u_{i,N}}{\Delta y^{2}} =f_{N-1,2}\\ \cfrac{u_{i-1,N-1}-2u_{i,N-1}+u_{i+1,N-1}}{\Delta x^{2}} + \cfrac{u_{i,N-2}-u_{i,N-1}}{\Delta y^{2}} =f_{N-1,2}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} i=2\\ \cfrac{u_{2-1,j}-2u_{2,j}+u_{2+1,j}}{\Delta x^{2}} + \cfrac{u_{2,j-1}-2u_{2,j}+u_{2,j+1}}{\Delta y^{2}} =f_{2,j}\\ \cfrac{u_{1,j}-2u_{2,j}+u_{3,j}}{\Delta x^{2}} + \cfrac{u_{2,j-1}-2u_{2,j}+u_{2,j+1}}{\Delta y^{2}} =f_{2,j}\\ \cfrac{-2u_{2,j}+u_{3,j}}{\Delta x^{2}} + \cfrac{u_{2,j-1}-2u_{2,j}+u_{2,j+1}}{\Delta y^{2}} =f_{2,j}\\ \end{array}\end{split}\]
\[\begin{split}\begin{array}{l} i=M-1\\ \cfrac{u_{M-1-1,j}-2u_{M-1,j}+u_{M-1+1,j}}{\Delta x^{2}} + \cfrac{u_{M-1,j-1}-2u_{M-1,j}+u_{M-1,j+1}}{\Delta y^{2}} =f_{M-1,j}\\ \cfrac{u_{M-2,j}-2u_{M-1,j}+u_{M,j}}{\Delta x^{2}} + \cfrac{u_{M-1,j-1}-2u_{M-1,j}+u_{M-1,j+1}}{\Delta y^{2}} =f_{M-1,j}\\ \cfrac{u_{M-2,j}-2u_{M-1,j}+u_{M-1,j}+\cfrac{1}{2}\Delta x^{2}}{\Delta x^{2}} + \cfrac{u_{M-1,j-1}-2u_{M-1,j}+u_{M-1,j+1}}{\Delta y^{2}} =f_{M-1,j}\\ \cfrac{u_{M-2,j}-u_{M-1,j}}{\Delta x^{2}} + \cfrac{u_{M-1,j-1}-2u_{M-1,j}+u_{M-1,j+1}}{\Delta y^{2}} =f_{M-1,j}-\cfrac{1}{2}\\ \end{array}\end{split}\]