Gram-Schmidt Orthogonalization and Projection

import numpy as np

Gram-Schmidt Procedure

Suppose we have a subspace \(\mathbb{S}\) of \(\mathbb{R}^n\) whose basis consists of \(k\) vectors \(\vec{v}_1,\vec{v}_2, \ldots , \vec{v}_k\).

\[ \mathbb{S} = \operatorname{span}\left\{ \vec{v}_1,\vec{v}_2, \ldots , \vec{v}_k \right\}. \]

The Gram-Schmidt procedure computes from this basis a new basis for \(\mathbb{S}\) that is orthonormal (that is, the vectors of of unit length and mutually orthogonal). That is,

\[ \operatorname{gram\_schmidt}\left(\left\{ \vec{v}_1,\vec{v}_2, \ldots , \vec{v}_k \right\}\right) \rightarrow \left\{ \vec{w}_1,\vec{w}_2, \ldots , \vec{w}_k \right\} \]

where

\(|\vec{w}_i| = 1\), for all \(i\),
\(\vec{w}_i \cdot \vec{w}_j = 0\) for all \(i\ne j\).

The procedure usually constructs the vectors by, setting \(\vec{w}_1 = \vec{v}_1/|\vec{v}_1|\), and then recursively building the rest of the vectors by

\[ \vec{w}_j = \operatorname{perp}_{\operatorname{span}\{\vec{w}_1, \ldots, \vec{w}_{j-1}\}}(\vec{v}_j), \quad 1< j \le k. \]

It follows that, for each \(1\le i \le k\)

\[ \operatorname{span} \left\{ \vec{v}_1, \vec{v}_2 \ldots, \vec{v}_i \right\} = \operatorname{span} \left\{ \vec{w}_1, \vec{w}_2 \ldots, \vec{w}_i \right\} \]

Here is an implementation of the Gram-Schmidt procedure. Even though we state the input is a set of linearly independent vectors, it can be used even if the vectors are linearly dependent. There will just be some zero vectors returned in the columns of \(A\) corresponding to the linearly dependent vectors. Therefore, these columns can be ignored.

Gram-Schmidt Procedure

def gram_schmidt(A):
    '''input: A set of linearly independent vectors stored
              as the columns of matrix A
       outpt: An orthongonal basis for the column space of A.'''
    # get the number of vectors.
    A = np.copy(A).astype(np.float64) # create a local instance of the array
    n = A.shape[1]
    for j in range(n):
        # For the vector in column j, find the perpendicular
        # of the projection onto the previous orthogonal vectors.
        for k in range(j):
            A[:, j] -= np.dot(A[:, k], A[:, j]) * A[:, k]
        # If original vectors aren't lin indep then we can check for this:
        # (1)
        if np.isclose(np.linalg.norm(A[:, j]), 0, rtol=1e-15, atol=1e-14, equal_nan=False):
            A[:, j] = np.zeros(A.shape[0])
        else:    
            A[:, j] = A[:, j] / np.linalg.norm(A[:, j])
    return A

if the perp is 0, then this column is in the span of the previous ones, so the original set of vectors is not linearly independent. We can't normalize the 0 vector so we need to treat this separately. Also, to see if the norm, which is a float, is 0 we us the .isclose function.

Example

Use the Gram-Schmidt procedure on the set

\[\left\{ \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix} , \quad \begin{bmatrix} -1 \\ 2 \\ 1 \end{bmatrix} , \quad \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix} \right\} \]

A = np.array([[1, 1, 0], [-1, 2, 1], [0, 1, 1]]).T
print(gram_schmidt(A))

[[ 0.70710678 -0.63960215  0.30151134]
 [ 0.70710678  0.63960215 -0.30151134]
 [ 0.          0.42640143  0.90453403]]

The columns of this matrix form the orthonormal basis.

As a check, we can work through the Gram-Schmidt process by hand and determine that

\[ \vec{w}_1 = \begin{bmatrix} 1/\sqrt{2} \\ 1/\sqrt{2} \\ 0 \end{bmatrix} , \quad \vec{w}_2= \begin{bmatrix} -3/\sqrt{22} \\ 3/\sqrt{22} \\ 2/\sqrt{22} \end{bmatrix} , \quad \vec{w}_3= \begin{bmatrix} 1/\sqrt{11} \\ -1/\sqrt{11} \\ 3/\sqrt{11} \end{bmatrix} \]

# check against hand calculated solution:
np.array([[1/np.sqrt(2),1/np.sqrt(2),0],[-3/np.sqrt(22), 3/np.sqrt(22), 2/np.sqrt(22)],[1/np.sqrt(11), -1/np.sqrt(11), 3/np.sqrt(11) ]]).T

array([[ 0.70710678, -0.63960215,  0.30151134],
       [ 0.70710678,  0.63960215, -0.30151134],
       [ 0.        ,  0.42640143,  0.90453403]])

Another thing we can check is that the vectors returned by gram_schmidt are orthonormal. This means that the matrix of these vectors must be orthogonal. Recall, a matrix \(Q\) is orthogonal if \(Q^TQ = I\).

# check if Q^T * Q is I
Q = gram_schmidt(A)
print(np.round(Q@Q.T,8)) # (1)

We round to account for floating point approximation errors.

[[ 1.  0. -0.]
 [ 0.  1.  0.]
 [-0.  0.  1.]]

Example

Use the Gram-Schmidt procedure on the set

\[\left\{ \begin{bmatrix} 1 \\ 1 \\ 0 \\ 1 \end{bmatrix} , \quad \begin{bmatrix} 1 \\ 0 \\ 1 \\ 1 \end{bmatrix} , \quad \begin{bmatrix} 2 \\ 1 \\ 1 \\ 1 \end{bmatrix} , \quad \begin{bmatrix} 1 \\ 0 \\ 0 \\ 1 \end{bmatrix} \right\} \]

Note: the third vector is in the span of the first two.

A = np.array([[1, 1, 0, 1], [1, 0, 1, 1], [2, 1, 1, 2], [1, 0, 0, 1]]).T
print(gram_schmidt(A))

[[ 0.57735027  0.25819889  0.          0.31622777]
 [ 0.57735027 -0.51639778  0.         -0.63245553]
 [ 0.          0.77459667  0.         -0.63245553]
 [ 0.57735027  0.25819889  0.          0.31622777]]

The third column is the zero vector and this indicates that \(\vec{v}_3\) is in the span of the previous two vectors. Therefore, we can ignore vector \(\vec{v}_3\).

As a check, we can work through the Gram-Schmidt process by hand and determine that

\[ \vec{w}_1 = \begin{bmatrix} 1/\sqrt{3} \\ 1/\sqrt{3} \\ 0 \\ 1/\sqrt{3} \end{bmatrix} , \quad \vec{w}_2= \begin{bmatrix} 1/\sqrt{15} \\ -2/\sqrt{15} \\ 3/\sqrt{15} \\ 1/\sqrt{15} \end{bmatrix} , \quad \vec{w}_3= \begin{bmatrix} 1/\sqrt{10} \\ -2/\sqrt{10} \\ -2/\sqrt{10} \\ 1/\sqrt{10} \end{bmatrix} \]

# check against handwritten solution:
w1 = np.array([1,1,0,1])/np.sqrt(3)
w2 = np.array([1,-2,3,1])/np.sqrt(15)
w3 = np.array([1,-2,-2,1])/np.sqrt(10)
print(np.vstack([w1,w2,w3]).T)

[[ 0.57735027  0.25819889  0.31622777]
 [ 0.57735027 -0.51639778 -0.63245553]
 [ 0.          0.77459667 -0.63245553]
 [ 0.57735027  0.25819889  0.31622777]]

Gram-Schmidt via QR Decomposition

The Gram-Schmidt can be found hidden in the SciPy's QR decomposition. We'll cover a number of matrix decompositions methods in the next section, but for now we note that QR decomposition is a factorization of the form

\[A = QR\]

where \(Q\) is orthogonal (i.e. columns are orthonormal), and \(R\) is upper triangular. This means that column \(i\) of \(A\) is a linear combination of the first \(i\) columns of \(Q\). Column \(i\) of \(R\) tells us exactly what this linear combination is. This is true for each column \(i\).

The function scipy.linalg.qr returns the two matrices \(Q\) and \(R\) associated with matrix \(A\).

Let's redo the two examples above.

Example

Use scipy.linalg.qr to find the Gram-Schmidt basis for the set

\[\left\{ \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix} , \quad \begin{bmatrix} -1 \\ 2 \\ 1 \end{bmatrix} , \quad \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix} \right\} \]

First form a column matrix, then return the matrix \(Q\) from QR decomposition.

A = np.array([[1, 1, 0], [-1, 2, 1], [0, 1, 1]]).T
Q, R = la.qr(A)
Q

array([[-0.70710678,  0.63960215,  0.30151134],
       [-0.70710678, -0.63960215, -0.30151134],
       [-0.        , -0.42640143,  0.90453403]])

This is the same collection of orthonormal vectors we found above (up to scaling by \(-1\)).

For bases that don't span all of \(\mathbb{R}^n\), QR decomposition adds extra linearly independent columns. We can detect this by looking at the matrix \(R\) that gets returned.

Example

Use scipy.linalg.qr to find the Gram-Schmidt basis for the set

\[\left\{ \begin{bmatrix} 1 \\ 1 \\ 0 \\ 1 \end{bmatrix} , \quad \begin{bmatrix} 1 \\ 0 \\ 1 \\ 1 \end{bmatrix} , \quad \begin{bmatrix} 2 \\ 1 \\ 1 \\ 1 \end{bmatrix} , \quad \begin{bmatrix} 1 \\ 0 \\ 0 \\ 1 \end{bmatrix} \right\} \]

First form a column matrix, then return the matrix \(Q\) from QR decomposition.

A = np.array([[1, 1, 0, 1], [1, 0, 1, 1], [2, 1, 1, 2], [1, 0, 0, 1]]).T
Q, R = la.qr(A)
Q

array([[-0.57735027,  0.25819889, -0.66225429, -0.40177015],
       [-0.57735027, -0.51639778, -0.07871036,  0.62753859],
       [-0.        ,  0.77459667, -0.07871036,  0.62753859],
       [-0.57735027,  0.25819889,  0.74096465, -0.22576844]])

We may think we are done here, but recall the original set of vectors were linearly dependent. We can see this by looking at \(R\). We just round entries to 3 decimal places to make it easier to read.

R.round(3)

array([[-1.732, -1.155, -2.887, -1.155],
       [ 0.   ,  1.291,  1.291,  0.516],
       [ 0.   ,  0.   , -0.   ,  0.079],
       [ 0.   ,  0.   ,  0.   , -0.628]])

The third column has a 0 on the diagonal. This tells us that the third column of \(A\) is in the span of the first two columns of \(Q\), hence it is in the span of the first two columns of \(A\). Therefore, the columns of \(A\) were not linearly independent. We can remove this column from \(A\) and redo the calculation.

A = np.array([[1, 1, 0, 1], [1, 0, 1, 1], [1, 0, 0, 1]]).T
Q, R = la.qr(A)
Q.round(8)

array([[-0.57735027,  0.25819889,  0.31622777, -0.70710678],
       [-0.57735027, -0.51639778, -0.63245553,  0.        ],
       [-0.        ,  0.77459667, -0.63245553,  0.        ],
       [-0.57735027,  0.25819889,  0.31622777,  0.70710678]])

A square matrix is returned but we only care about the first 3 columns. These 3 columns is the same collection of orthonormal vectors we found above.

Projection onto a Subspace

Suppose we have an orthogonal basis \(\{ \vec{w}_1, \vec{w}_2, \ldots, \vec{w}_k \}\) for a subspace \(\mathbb{S}\). The the projection of a vector \(\vec{v}\) onto \(\mathbb{S}\) is given by:

\[ \operatorname{proj}_{\mathbb{S}}(\vec{v}) = \frac{\vec{v}\cdot\vec{w}_1}{\vec{w}_1\cdot\vec{w}_1}\vec{w}_1 + \frac{\vec{v}\cdot\vec{w}_2}{\vec{w}_2\cdot\vec{w}_2}\vec{w}_2 + \cdots + \frac{\vec{v}\cdot\vec{w}_k}{\vec{w}_k\cdot\vec{w}_k}\vec{w}_k \]

We used this in the Gram-Schmidt procedure above. Here we just implement the projection and perpendicular as standalone functions in python so we can use them as needed.

Projection onto a Subspace

def proj(v, A):
    '''input : v is a vector, and A is a matrix whose
               column vectors are form an orthogonal basis.
       output: the projection of v onto the subspace spanned
               columns of A.'''
    num_rows = A.shape[0]
    num_cols = A.shape[1]
    w = np.zeros(num_rows)
    for i in range(num_cols):
        coli = A[:,i]
        w += (np.dot(v,coli)/np.dot(coli,coli))*coli
    return w

def perp(v,A):
    '''input : v is a vector, and A is a matrix whose
               column vectors are form an orthogonal basis.
       output: the perpendicular of the projection of v onto
               the subspace spanned columns of A.'''
    return v - proj(v,A)

Example

Let \(\mathbb{S} = \operatorname{span}\left\{ \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \end{bmatrix}, \begin{bmatrix} 1 \\ -1 \\ 1 \\ -1\end{bmatrix} \right\}\) and let \(\vec{x} = \begin{bmatrix} 2\\5\\-7\\3 \end{bmatrix}\). Determine \(\operatorname{proj}_{\mathbb{S}}(\vec{x})\) and \(\operatorname{perp}_{\mathbb{S}}(\vec{x})\).

Note: Basis is already orthogonal, so we can project directly onto it.

A = np.array([[1,1,1,1],[1,-1,1,-1]]).T
v = np.array([2,5,-7,3])
print('projection of v onto columnspace of A: ', proj(x,A))
print('perpendicular of v onto columnspace of A: ', perp(x,A))

projection of v onto columnspace of A:  [-2.5  4.  -2.5  4. ]
perpendicular of v onto columnspace of A:  [ 4.5  1.  -4.5 -1. ]

In the next example, we use the Gram-Schmidt procedure to first find an orthogonal basis for the subspace, then we project the vector onto this.

Example

Project \(\vec{v} = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}\) onto the space with basis:

\[ \vec{v}_1 = \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix} , \quad \vec{v}_2= \begin{bmatrix} -1 \\ 2 \\ 1 \end{bmatrix} \]

The first thing to do would be to find an orthogonal basis, we could use Gram-Schmidt to do this (see above), but here we didn't normalize:

\[ \vec{w}_1 = \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix} , \quad \vec{w}_2= \begin{bmatrix} -3 \\ 3 \\ 2 \end{bmatrix} \]

Then projecting \(\vec{x}\) onto this basis we get

\[ \operatorname{proj}_{\mathbb{S}}(\vec{v}) = \frac{\vec{v}\cdot\vec{w}_1}{\vec{w}_1\cdot\vec{w}_1}\vec{w}_1 + \frac{\vec{v}\cdot\vec{w}_2}{\vec{w}_2\cdot\vec{w}_2}\vec{w}_2 \]

\[ = \begin{bmatrix} 2/2 - 6/22 \\ 2/2 + 6/22 \\ 2/11 \end{bmatrix} = \begin{bmatrix} 8/11 \\ 14/11 \\ 2/11 \end{bmatrix} = \begin{bmatrix} 0.7272727273 \\ 1.2727272727 \\ 0.1818181818 \end{bmatrix} \]

Also,

\[ \operatorname{perp}_{\mathbb{S}}(\vec{v}) = \vec{v} - \operatorname{proj}_{\mathbb{S}}(\vec{v}) = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} - \begin{bmatrix} 8/11 \\ 14/11 \\ 2/11 \end{bmatrix} = \begin{bmatrix} 3/11 \\ -3/11 \\ 9/11 \end{bmatrix} = \begin{bmatrix} 0.2727272727 \\ -0.2727272727 \\ 0.8181818182 \end{bmatrix} \]

Using our python implementation, we can check that we get the same answer:

v1 = np.array([1,1,0])
v2 = np.array([-1,2,1])
A = np.vstack([v1,v2]).T
Q = gram_schmidt(A)
v = np.array([1,1,1])
print('projection of v onto columnspace of Q:', proj(v,Q))

projection of v onto columnspace of Q: [0.72727273 1.27272727 0.18181818]

print('perpendicular of v onto columnspace of Q:', perp(v,Q))

perpendicular of v onto columnspace of Q: [ 0.27272727 -0.27272727  0.81818182]

`scipy.linalg.orth`

There is a built-in command in SciPy which constructs an orthonormal basis for the column space of a matrix. Since such bases are not unique, the one returned by scipy.linalg.org is unlikely to be the one we obtain from the Gram-Schmidt procedure (or QR decomposition).

For example:

A = np.array([[1,1,0],[1,1,1]]).T
la.orth(A)

array([[-0.6571923 ,  0.26095647],
       [-0.6571923 ,  0.26095647],
       [-0.36904818, -0.92941026]])

There are instances where we just want any orthogonal basis for a space. In such cases we scipy.linalg.orth will suffice (instead of using Gram-Schmidt). For instance, to project a vector \(\vec{v}\) onto a subspace \(\mathbb{S}\), any orthogonal basis will suffice. The answer is independent of the basis used.

Example

Let's revisit our last example above. The only change we'll make is that we will use scipy.linalg.orth instead of gram_schmidt:

v1 = np.array([1.,1,0])
v2 = np.array([-1,2,1])
A = np.vstack([v1,v2]).T
Q = la.orth(A) # (1)
v = np.array([1,1,1])
print('projection of v onto columnspace of Q:', proj(v,Q))

Here we use scipy.linalg.orth instead of gram_schmidt.

projection of v onto columnspace of A: [0.72727273 1.27272727 0.18181818]

This is the same result we found above.

Another example of where we use scipy.linalg.orth instead of Gram-Schmidt is when we orthogonally diagonalize symmetric matrices in the Matrix Decomposition section.