Matrix Decompositions (NumPy)
Eigen Decomposition (diagonalization)
An \(n \times n\) (square) matrix \(A\) is diagonalizable if it can be decomposed (factored) as
for some invertible matrix \(P\) and some diagonal matrix \(D\).
A beautiful result in linear algebra is that such a decomposition is possible if \(A\) has \(n\) linearly independent eigenvectors. Moreover, the columns of \(P\) are the eigenvectors of \(A\) and the diagonal of \(D\) is are the corresponding eigenvalues.
Therefore, to use SciPy to diagonalize a matrix we first compute the eigenvalues and eigenvectors with scipy.linalg.eig
. This returns the 1D array of eigenvalues, which we can us to create the diagonal matrix \(D\) using numpy.diag
. We also get the matrix of eigenvectors from the .eig
call, and this is the matrix \(P\) we want.
Example
Diagonalize the matrix
C = np.array([[-3,5,-5],[-7,9,-5],[-7,7,-3]])
evals, evecs = la.eig(C)
D = np.diag(evals.real) # (1)
P = evecs # (2)
Pinv = la.inv(P)
print(D)
print(P)
- create the diagonal matrix of eigenvalues.
.real
just strips off the imaginary parts since the eigenvalues are real numbers in this case. - create the matrix of eigenvectors
[[-3. 0. 0.]
[ 0. 4. 0.]
[ 0. 0. 2.]]
[[ 5.77350269e-01 -8.33283831e-16 -7.07106781e-01]
[ 5.77350269e-01 -7.07106781e-01 -7.07106781e-01]
[ 5.77350269e-01 -7.07106781e-01 3.33066907e-16]]
Let's check if \(PDP^{-1} = A\).
We can also use this idea to create a matrix with any eigenvalues and eigenvectors we want. For example, suppose we would like a \(3\times 3\) matrix with the following eigenvalues/eigenvectors.
An example of such a matrix would be \(PDP^{-1}\) where
In this case,
Example
Let's check the matrix above has the desired eigenvalues and eigenvectors
A = np.array([[2,2,-1],[5/4,1/2,-5/4],[3/2,-3,-1/2]])
evals, evecs = la.eig(A)
D = np.diag(evals.real)
P = evecs
Pinv = la.inv(P)
print(D)
print(P)
[[-2. 0. 0.]
[ 0. 3. 0.]
[ 0. 0. 1.]]
[[-1.68199171e-16 8.94427191e-01 7.07106781e-01]
[ 4.47213595e-01 4.47213595e-01 -5.76888806e-16]
[ 8.94427191e-01 -2.71947991e-16 7.07106781e-01]]
We can see that the columns are scaled versions of \(\vec{v}_1\), \(\vec{v}_3\), and \(\vec{v}_2\), respectively.
Diagonalization of Symmetric Matrices
If \(A\) is an \(n \times n\) symmetric matrix (i.e. \(A^T = A\)) then it can be diagonalized by an orthogonal matrix \(P\) (i.e. \(P^{-1} = P^T)\).
where \(D\) is the diagonal matrix of eigenvalues, and \(P\) is the matrix of (orthonormal) eigenvectors. A matrix that can be diagonalized by an orthogonal matrix is called orthogonally diagonalizable.
Let's summarize some important theorems about symmetric matrices:
- Eigenvectors corresponding to different eigenvalues of a symmetric matrix are orthogonal.
- Principle Axis Theorem: a matrix is symmetric if and only if it is orthogonally diagonalizable.
- All the eigenvalues of a symmetric matrix are real numbers.
Example
Orthogonally diagonalize the symmetric matrix
Solution: First we summarize how we would do this one (check the results for yourself). The eigenvalues and eigenvectors are
In other words, \(\lambda_1\) has two corresponding eigenvectors (algebraic and geometric multiplicity are both 2), and \(\lambda_2\) has one corresponding eigenvector (algebraic and geometric multiplicity are both 1).
If we formed the matrix \(P\) whose columns are the eigenvectors, then \(P\) would diagonalize \(A\). However, \(P\) would not be orthogonal. We can see this since \(\vec{v}_{1,1}\) and \(\vec{v}_{1,2}\) are not orthogonal. What we do instead, is find an orthonormal basis for the eigenspace \(\mathbb{S}_1 = \operatorname{span}\left\{ \vec{v}_{1,1},\vec{v}_{1,2}\right\}\). We use this orthonormal basis as the columns of \(P\) which correspond to \(\lambda_1\).
Let's now do the calculations.
A = np.array([[5,-4,-2],[-4,5,-2],[-2,-2,8]])
evals, evecs = la.eig(A)
evals = np.round(evals.real,2) # we know the evalues are real numbers
print(evals)
print(evecs)
[ 9. -0. 9.]
[[ 0.74535599 -0.66666667 0. ]
[-0.59628479 -0.66666667 -0.4472136 ]
[-0.2981424 -0.33333333 0.89442719]]
Check if matrix of eigevectors is orthogonal:
- round matrix to 12 decimal places to get rid of any rounding errors.
Note
Luckily, scipy.linalg.eig
already returned an orthonormal collection of eigenvectors.
It is not guaranteed to do this in general, we just lucked out here. However, for symmetric matrices we should use scipy.linal.eigh
to construct an orthonormal collection of eigenvectors. If we use .eig
and the eigenvectors for a specific eigenvalue aren't orthogonal then we have a bit more work to to - see the next example below for how to proceed in this case.
Since the matrix of eigenvectors is orthogonal, it is the matrix \(P\) we are looking for. However, notice there are repeated eigenvalues and they are spaced apart. This means the corresponding eigenvectors in the matrix are spaced apart (appearing in the first and third column). We can reorder these if we want.
Let's order the eigenvalues so the \(9\)'s appear next to each other, and the corresponding eigenvectors are sorted too.
idx = evals.argsort()[::-1] # (1)
print('sorted indicies: ', idx)
evals = evals[idx] # resort evals by idx
evecs = evecs[:,idx] # resort evecs by idx
print(evals)
print(evecs)
- This looks at what it would take to sort evals in descending order (indicated by the
-
), and then stores the index order which would put evals in descending order. We can then use this index list to sort both evals and evecs in the same way.
sorted indicies: [1 0 2]
[ 9. 9. -0.]
[[ 0.74535599 0. -0.66666667]
[-0.59628479 -0.4472136 -0.66666667]
[-0.2981424 0.89442719 -0.33333333]]
Now we have evals and evecs sorted.
Let's set \(P\) to be the matrix of eigenvectors and check it diagonalizes \(A\).
array([[ 1.00000000e+00, 1.04854397e-16, 8.94346325e-17],
[ 1.04854397e-16, 1.00000000e+00, -3.08395285e-18],
[ 8.94346325e-17, -3.08395285e-18, 1.00000000e+00]])
Which is essentially the identity matrix (up to rounding error).
array([[ 9.00000000e+00, -7.76318940e-17, -5.93963572e-16],
[-1.10732349e-16, 9.00000000e+00, -1.01092166e-15],
[-2.48253415e-16, -8.27511384e-16, 7.40148683e-17]])
Which is essentially the diagonal matrix of eigenvalues (up to rounding error). We can check this is the diagonal matrix of eigenvalues by using the np.allclose
, which checks if corresponding entries in an array are equal (up to a tolerance).
Now that we got a few of the subtleties out of the way, let's do another example and focus just on the computations. Again this example has an eigenvalue with algebraic/geometric multiplicty 2. We do the calculations in two ways:
- (preferred) Since this is a symmetric matrix we compute eigenvectors using
scipy.linalg.eigh
, which returns and orthonormal collection of eigenvectors. - Using
scipy.linalg.eig
) python returns two eigenvectors associated with a single eigenvalue , and these vectors are not orthogonal. So we will illustrate how to fix this usingscipy.linalg.orth
, as well as usinggram_schmidt
.
Example
Orthogonally diagonalize the symmetric matrix
Method 1: scipy.linalg.eigh
A = np.array([[2,-1,-1],[-1,2,-1],[-1,-1,2]])
evals, evecs = la.eigh(A)
evals = evals.real
print(evals)
P = evecs
print(P)
[2.22044605e-15 3.00000000e+00 3.00000000e+00]
[[ 0.57735027 0.81649658 0. ]
[ 0.57735027 -0.40824829 -0.70710678]
[ 0.57735027 -0.40824829 0.70710678]]
Therefore,
Method 2: Using scipy.linalg.eig
A = np.array([[2,-1,-1],[-1,2,-1],[-1,-1,2]])
evals, evecs = la.eig(A)
evals = evals.real
print(evals)
P = evecs
This isn't the identity, so the eigenbasis for \(\lambda = 3\) isn't an orthogonal basis. Let's fix this.
First sort the eigenvalues and eigenvectors so eigenvalues are in decreasing order.
Method 2.1: Using scipy.linalg.orth
:
Now grab the first two eigenvectors (those corresponding to \(\lambda=3\)) and find an orthonormal basis for the eigenspace. Then make a new matrix with the orthonormal bases for each eigenspace.
[[-0.6731049 -0.46216498 -0.57735027]
[ 0.73679906 -0.35184345 -0.57735027]
[-0.06369416 0.81400843 -0.57735027]]
Let's check that \(P\) diagonalizes \(A\).
array([[ 3.00000000e+00, -5.31972712e-16, 3.42394613e-16],
[-7.64376814e-16, 3.00000000e+00, -3.32289241e-16],
[ 2.59546329e-16, -2.97933590e-16, -6.40987562e-17]])
Which is the diagonal matrix of eigenvalues.
Therefore,
Method 2.2: Using gram_schmidt
:
Since the Gram Schmidt procedure constructs an orthogonal basis by working through the columns of \(A\) from left to right, and also since eigenspaces for an symmetric matrix are orthogonal, then Gram Schmidt will return an orthogonal matrix where each grouping of vectors is still and eigenbasis for the corresponding eigenvector. This means we don't have to seperate off, and process, each eigenspace individually.
gram_schmidt
is our own built function. See the Gram Schmidt Orthogonalization section for its implementation.
[[ 0.29329423 0.76200076 -0.57735027]
[-0.80655913 -0.12700013 -0.57735027]
[ 0.5132649 -0.63500064 -0.57735027]]
Check if \(P\) diagonalizes \(A\):
array([[3.00000000e+00, 1.48135454e-18, 2.55018705e-16],
[4.20044244e-17, 3.00000000e+00, 1.48298061e-16],
[3.66324758e-16, 2.96096777e-16, 7.39557099e-32]])
This looks like the diagonal matrix of eigenvalues. Let's get rid of the round-off errors to see they really are the same.
Therefore,
Notice the matrix that we found here for \(P\) is different from the ones we found above. This is because two columns of \(P\) can be any orthonormal basis for the eigenspace for \(\lambda = 3\), of which there are many.
Lastly, we'll do a random example. In this case it is likely there will not be repeated eigenvalues, therefore the matrix of eigenvectors will be the orthogonal matrix we are looking for.
Example
To generate a random symmetric matrix we start with a random \(5\times 5\) matrix \(B\) and then form \(BB^T\) which is symmetric.
B = np.random.randint(-3,3,size = (5,5))
A = B@B.T # random symmetric matrix
print(A)
evals, evecs = la.eigh(A)
evals = evals.real
print('\n eigenvalues : \n')
print(evals)
print('\n Matrix P of eigenvectors : \n')
P = evecs
print(P)
[[ 15 -8 -7 13 9]
[ -8 10 6 -10 -3]
[ -7 6 11 -7 -15]
[ 13 -10 -7 16 6]
[ 9 -3 -15 6 27]]
eigenvalues :
[ 0.31505478 1.77815469 4.83218655 21.1541805 50.92042348]
Matrix P of eigenvectors :
[[-0.1758979 0.52779507 0.61679654 -0.32435171 0.4526039 ]
[-0.44664489 -0.39834996 0.63566325 0.38465841 -0.29965944]
[ 0.76371882 -0.03150937 0.4440324 -0.20641406 -0.41948681]
[ 0.03737476 -0.74552798 0.06602194 -0.49640648 0.43819274]
[ 0.43000576 -0.07710293 0.11819728 0.6766149 0.58083721]]
Let's check that \(P\) is orthogonal.
Let's check that \(P\) diagonalizes \(A\).
[[ 0.31505478 0. -0. 0. 0. ]
[ 0. 1.77815469 0. -0. -0. ]
[-0. -0. 4.83218655 -0. -0. ]
[ 0. -0. 0. 21.1541805 -0. ]
[ 0. -0. -0. -0. 50.92042348]]
This is the diagonal matrix of eigenvalues.
Jordan Decomposition
Not available in NumPy/SciPy. See notes in the SymPy section for how to do it there.
Singular-Value Decomposition (svd)
The theory of orthogonal diagonalization of symmetric matrices is extremely useful and powerful. SVD extends these ideas to matrices that are not symmetric, not diagonalizable, or perhaps not even square! In short, every matrix has a svd.
Definition of SVD
Let \(A\) be an \(m\times n\) matrix of rank \(r\). A svd of \(A\) is a decomposition of the form
where
- \(U\) is an \(m\times m\) (square) orthogonal matrix,
- \(V\) is an \(n\times n\) (square) orthogonal matrix,
- \(\Sigma\) is an \(m\times n\) (rectangular) diagonal matrix (of \(r\) non-zero singular values \(\sigma_1, \ldots, \sigma_r\)).
- \(U\) is an \(m\times r\) (rectangular) column orthogonal matrix,
- \(V\) is an \(n\times r\) (rectangular) column orthogonal matrix,
- \(\Sigma\) is an \(r\times r\) (square) diagonal matrix (of \(r\) non-zero singular values \(\sigma_1, \ldots, \sigma_r\)).
Key Fact: \(U\) and \(V\) give orthonormal bases for all four fundamental subspaces:
- first \(~~~~~r~~~~~\) columns of \(U\): column space of \(A\) (\(= \operatorname{Col}(A)\))
- last \(~m-r~\) columns of \(U\): left nullspace of \(A\) (\(= \operatorname{Null}(A^T)\))
- first \(~~~~~r~~~~~\) columns of \(V\): row space of \(A\) (\(= \operatorname{Col}(A^T)\))
- last \(~n-r~\) columns of \(V\): nullspace of \(A\) (\(= \operatorname{Null}(A)\))
The diagonal (but rectangular) matrix \(\Sigma\) contains the square roots of the nonzero eigenvalues from \(A^TA\) (which are the same as the nonzero eigenvalues from \(AA^T\)). These are called the nonzero singular values of \(A\).
Computing an SVD
To compute \(U\), \(V\) and \(\Sigma\):
- the square roots of the non-zero eigenvalues of \(A^TA\) are the singular values \(\sigma_1, \ldots, \sigma_r\). \(\Sigma\) is the diagonal matrix made up of these values in descending order:
- the columns of \(V\) are the eigenvectors of \(A^TA\) arranged in descending order of their eigenvalues:
- the columns of \(U\) are the eigenvectors of \(AA^T\) arranged in descending order of their eigenvalues. Although, they are usually computed by noticing the first \(r\) columns of \(U\) are \(\vec{u}_i = \frac{1}{\sigma_i} A\vec{v}_i\) and the last \(m-r\) columns are an orthonormal basis for \(\operatorname{Null}(A^T)\):
The condensed svd tosses away:
- the last \(m-r\) columns of \(U\) (i.e. an orthonormal basis of \(\operatorname{Null}(A^T)\))
- the last \(n-r\) columns of \(V\) (i.e. an orthonormal basis of \(\operatorname{Null}(A)\))
- the last \(n-r\) columns of \(\Sigma\) (i.e. the zero columns)
since these don't affect the product. Click the "condensed svd" tab above to see the changes to the definitions of \(U\), \(V\) and \(\Sigma\) (thus making them smaller, hence 'condensed').
Example
The \(2 \times 3\) matrix
has rank \(2\) and decomposes as
where both the first and third matrices are orthogonal. The condesnsed svd is
We use scipy.linalg.svd
to find a decomposition in python. This function returns a 3-tuple \((U, d, V^T )\) where \(d\) is a 1D array of the non-zero singular values sorted in descending order (i.e. the diagonal entries of \(\Sigma\)). \(U\) and \(V\) are the full \(m\times m\) and \(n \times n\) matrices.
Warning
The matrices \(U\) and \(V\) are not unique. For instance, the columns corresponding to the same eigenvalues can be rearranged and scaled by \(\pm 1\). Therefore, scipy.linalg.svd
may return something slightly different than ypu may have expected.
Example
Find a singular value decomposition of
(array([[-0.70710678, -0.70710678],
[-0.70710678, 0.70710678]]),
array([4. , 2.44948974]),
array([[-7.07106781e-01, 1.20868191e-16, -7.07106781e-01],
[ 5.77350269e-01, 5.77350269e-01, -5.77350269e-01],
[-4.08248290e-01, 8.16496581e-01, 4.08248290e-01]]))
Let's check that this gives a decomposition of \(A\).
U, d, Vt = la.svd(A)
# build sigma
Sig = np.zeros(A.shape) # construct mxn zeros matrix
Sig[:d.size,:d.size] = np.diag(d) # fill diagonal with d
print(U,'\n')
print(Sig,'\n')
print(Vt,'\n')
print(U@Sig@Vt)
[[-0.70710678 -0.70710678]
[-0.70710678 0.70710678]]
[[4. 0. 0. ]
[0. 2.44948974 0. ]]
[[-7.07106781e-01 1.20868191e-16 -7.07106781e-01]
[ 5.77350269e-01 5.77350269e-01 -5.77350269e-01]
[-4.08248290e-01 8.16496581e-01 4.08248290e-01]]
[[ 1. -1. 3.]
[ 3. 1. 1.]]
The product is indeed \(A\).
LU Decomposition
The LU decomposition of a matrix \(A\) is a factorization into the formed
where \(P\) is a permutation matrix, \(L\) is a lower triangular matrix with \(1\)'s along the diagonal, and \(U\) is an upper triangular matrix.
A permutation matrix is constructed from the identity matrix \(I\) by permuting its rows.
The command scipy.linalg.lu
returns a 3-tuple \((P,L,U)\).
Example
Find the LU-decomposition of
(array([[0., 0., 1.],
[1., 0., 0.],
[0., 1., 0.]]),
array([[ 1. , 0. , 0. ],
[-0.25, 1. , 0. ],
[ 0.5 , 0.4 , 1. ]]),
array([[ 4. , -1. , 6. ],
[ 0. , -1.25, 4.5 ],
[ 0. , 0. , -0.8 ]]))
Typically, one would call the function using tuple assignment.
[[0. 0. 1.]
[1. 0. 0.]
[0. 1. 0.]]
[[ 1. 0. 0. ]
[-0.25 1. 0. ]
[ 0.5 0.4 1. ]]
[[ 4. -1. 6. ]
[ 0. -1.25 4.5 ]
[ 0. 0. -0.8 ]]
Let's check the product is \(A\).
QR Decomposition
The QR decomposition of a matrix \(A\) is a factorization into the formed
where \(Q\) is an orthogonal matrix, and \(R\) is an upper triangular matrix.
The command scipy.linalg.qr
returns a 2-tuple \((Q,R)\).
Example
Find the QR-decomposition of
(array([[-0.43643578, 0.38939578, -0.81110711],
[-0.87287156, 0.03539962, 0.48666426],
[ 0.21821789, 0.92039002, 0.32444284]]),
array([[-4.58257569, 1.09108945, -6.32831882],
[ 0. , -1.34518542, 4.53115088],
[ 0. , 0. , 0.64888568]]))
Typically, one would call the function using tuple assignment.
[[-0.43643578 0.38939578 -0.81110711]
[-0.87287156 0.03539962 0.48666426]
[ 0.21821789 0.92039002 0.32444284]]
[[-4.58257569 1.09108945 -6.32831882]
[ 0. -1.34518542 4.53115088]
[ 0. 0. 0.64888568]]
Let's check the product is \(A\).