Solving Linear Systems (NumPy)

Most linear algebra courses start by considering how to solve a system of linear equations.

\begin{aligned} a_{0, 0} x_{0} + a_{0, 1} x_{0} + \dots a_{0, n - 1} x_{0} & = b_{0} \\ a_{1, 0} x_{0} + a_{1, 1} x_{0} + \dots a_{1, n - 1} x_{0} & = b_{1} \\ ⋮ \\ a_{n - 1, 0} x_{0} + a_{n - 1, 1} x_{0} + \dots a_{n - 1, n - 1} x_{0} & = b_{m - 1} \end{aligned}

In matrix notation, this linear system of $n$ equations in $n$ unknowns ( $x_{0}, \dots, x_{n - 1}$ ), can be written as

A \vec{x} = \vec{b}

where

A = [\begin{matrix} a_{0, 0} & a_{0, 1} & \dots & a_{0, n - 1} \\ a_{1, 0} & a_{1, 1} & \dots & a_{1, n - 1} \\ ⋮ & ⋮ & ⋮ \\ a_{n - 1, 0} & a_{n - 1, 1} & \dots & a_{n - 1, n - 1} \end{matrix}], \vec{x} = [\begin{matrix} x_{0} \\ x_{1} \\ ⋮ \\ x_{n - 1} \end{matrix}], \vec{b} = [\begin{matrix} b_{0} \\ b_{1} \\ ⋮ \\ b_{n - 1} \end{matrix}]

We'll look at how to use SciPy to numerically solve linear systems corresponding to square matrices. We'll also look at how to implement row operations (Gaussian elimination).

First we import the necessary libraries.

import numpy as np
import scipy.linalg as la

`scipy.linalg.solve`

Let's start with a 3x3 linear system:

\begin{aligned} x_{1} + x_{2} - 2 x_{3} & = 4 \\ 2 x_{2} + x_{3} & = 3 \\ 2 x_{1} + x_{2} - 5 x_{3} & = 7 \end{aligned}

Verify (by substitution) that $\vec{x} = [0, 2, - 1]$ is a solution.

To solve a (square) linear system in python we use the scipy.linalg.solve funtion.

A = np.array([[1,1,-2],[0,2,1],[2,1,-5]])
b = np.array([4,3,7])
la.solve(A,b) # (1)

$A$ must be a nonsingular square matrix to use .solve.

array([ 0.,  2., -1.])

If $\vec{b}$ is a 1D array then .solve returns a 1D array. However, if $\vec{b}$ is a 2D array then .solve returns the solution as a 2D array, as the following shows.

A = np.array([[1,1,-2],[0,2,1],[2,1,-5]])
b = np.array([4,3,7]).reshape(3,1)
la.solve(A,b)

array([[ 0.],
       [ 2.],
       [-1.]])

Note

scipy.linalg.solve requires the matrix to be square and invertible ( $\det (A) \neq 0$ ).

Example

Consider another example: $2 x_{1} + 3 x_{2} = 11$ $3 x_{1} + 6 x_{2} = 7$

A = np.array([[2,3],[3,6]])
b = np.array([11,7])
la.solve(A,b)

array([15.        , -6.33333333])

We can verify this answer is correct by observing $A \vec{x} = \vec{b}$ .

x = la.solve(A,b)
A@x

array([11.,  7.])

Example

Consider two systems of linear equations with the same coefficient matrix: $2 x_{1} + 3 x_{2} = 11 2 x_{1} + 3 x_{2} = - 1$ $3 x_{1} + 6 x_{2} = 7 3 x_{1} + 6 x_{2} = - 3$ Solving these two systems individually, is equivalent to solving the matrix equation

A X = B

where

A = [\begin{matrix} 2 & 3 \\ 3 & 6 \end{matrix}], X = [\begin{matrix} {\vec{x}}_{1} & {\vec{x}}_{2} \end{matrix}], B = [\begin{matrix} {\vec{b}}_{1} & {\vec{b}}_{2} \end{matrix}] = [\begin{matrix} 11 & - 1 \\ 7 & - 3 \end{matrix}]

We can use the .solve command where the b is a matrix.

A = np.array([[2,3],[3,6]])
b = np.array([[11,-1],[7,-3]])
la.solve(A,b)

array([[15.        ,  1.        ],
       [-6.33333333, -1.        ]])

Gaussian Elimination

The general procedure learned to solve a system of linear equations is Gaussian elimination. The goal is to apply row operations to "eliminate" all the variables except for one in each equation (if possible), and this resulting linear system has the same set of solutions as the original, but is easier to solve. We generally do this on the matrix form of the system, and call this the reduced row echelon form (rref) of the matrix. The elementary row operations we use are:

Elementary Row Operations

Replace row $j$ with $k$ times row $i$ added to row $j$ . $R_{j} \leftarrow k R_{i} + R_{j}$
Multiply row $i$ by a scalar $k$ . $R_{i} \leftarrow k R_{i}$
Swap rows $i$ and $j$ . $R_{i} \leftrightarrow R_{j}$

For example, consider the matrix

A = [\begin{matrix} 2 & 1 & 0 \\ 1 & 1 & - 1 \\ 0 & - 1 & 2 \end{matrix}]

Since we are working in python we will index rows starting at $0$ . In other words, an $n \times m$ matrix has rows $R_{0}$ to $R_{n - 1}$ .

Here are the row operations taking it to rref:

A = [\begin{matrix} 2 & 1 & 0 \\ 1 & 1 & - 1 \\ 0 & - 1 & 2 \end{matrix}] \overset{R_{0} \leftrightarrow R_{1}}{\to} [\begin{matrix} 1 & 1 & - 1 \\ 2 & 1 & 0 \\ 0 & - 1 & 2 \end{matrix}] \overset{R_{1} \leftarrow - 2 R_{0} + R_{1}}{\to} [\begin{matrix} 1 & 1 & - 1 \\ 0 & - 1 & 2 \\ 0 & - 1 & 2 \end{matrix}]

\overset{R_{2} \leftarrow - R_{1} + R_{2}}{\to} [\begin{matrix} 1 & 1 & - 1 \\ 0 & - 1 & 2 \\ 0 & 0 & 0 \end{matrix}] \overset{R_{0} \leftarrow R_{1} + R_{0}}{\to} [\begin{matrix} 1 & 0 & 1 \\ 0 & - 1 & 2 \\ 0 & 0 & 0 \end{matrix}] \overset{R_{1} \leftarrow - R_{1}}{\to} [\begin{matrix} 1 & 0 & 1 \\ 0 & 1 & - 2 \\ 0 & 0 & 0 \end{matrix}]

NumPy (and SciPy) do not have row operations implemented. Not to worry, we can do these ourself. To make it simpler to implement we'll use elementary matrices and matrix multiplication. The main idea is as follows: to do a row operation on a matrix $A$ (such as a row swap), we can do the row operation on the identity matrix first, then multiply this new matrix to the left of $A$ . This produces the matrix as if we had done the row operation directly to $A$ .

For example, suppose we want to do the following row operation as a multiplication.

[\begin{matrix} 1 & 1 & - 1 \\ 2 & 1 & 0 \\ 0 & - 1 & 2 \end{matrix}] \overset{R_{1} \leftarrow - 2 R_{0} + R_{1}}{\to} [\begin{matrix} 1 & 1 & - 1 \\ 0 & - 1 & 2 \\ 0 & - 1 & 2 \end{matrix}]

First apply the row operation to the identity matrix:

[\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}] \overset{R_{1} \leftarrow - 2 R_{0} + R_{1}}{\to} [\begin{matrix} 1 & 0 & 0 \\ - 2 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}] = E

Take this new matrix $E$ (called and elementary matrix) and multiply it to the left of the matrix you want to perform the row operation on:

E [\begin{matrix} 1 & 1 & - 1 \\ 2 & 1 & 0 \\ 0 & - 1 & 2 \end{matrix}] = [\begin{matrix} 1 & 0 & 0 \\ - 2 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} 1 & 1 & - 1 \\ 2 & 1 & 0 \\ 0 & - 1 & 2 \end{matrix}] = [\begin{matrix} 1 & 1 & - 1 \\ 0 & - 1 & 2 \\ 0 & - 1 & 2 \end{matrix}]

Actually work through the product to see "how" it is doing the row operation. Therefore, row operations can be performed by multiplying by elementary matrices.

The following is an implementation of the three elementary row operations. Notice, the main body of each function has us performing the row operation on the identity matrix to create the matrix $E$ , then in the return statement we do the matrix multiplication. This performs the row operation on the original matrix.

Elementary Row Operations

def add_row(A,k,i,j):
    "row j <- k times row i added to row j in matrix A."
    n = A.shape[0]
    E = np.eye(n)
    if i == j:
        E[j,j] = k + 1
    else:
        E[j,i] = k
    return E @ A

def scale_row(A,k,i):
    "row i <- k times row i in matrix A"
    n = A.shape[0]
    E = np.eye(n)
    E[i,i] = k
    return E @ A

def swap_row(A,i,j):
    "row i <-> row j in matrix A"
    n = A.shape[0]
    E = np.eye(n)
    if i != j:
        E[i,i], E[i,j], E[j,i], E[j,j] = 0, 1, 1, 0
    return E @ A

Example

We'll redo the example above, this time using our python functions.

A = np.array([[2,1,0],[1,1,-1],[0,-1,2]])
print(A)
print("Swap rows 0 and 1:")
A1 = swap_row(A,0,1)
print(A1)
print("Add -2 times row 0 to row 1:")
A2 = add_row(A1,-2,0,1)
print(A2)
print("Add -1 times row 1 to row 2:")
A3 = add_row(A2,-1,1,2)
print(A3)
print("Add 1 times row 1 to row 0:")
A4 = add_row(A3,1,1,0)
print(A4)
print("Multiply row 1 by -1:")
A5 = scale_row(A4,-1,1)
print(A5)

[[ 2  1  0]
 [ 1  1 -1]
 [ 0 -1  2]]
Swap rows 0 and 1:
[[ 1.  1. -1.]
 [ 2.  1.  0.]
 [ 0. -1.  2.]]
Add -2 times row 0 to row 1:
[[ 1.  1. -1.]
 [ 0. -1.  2.]
 [ 0. -1.  2.]]
Add -1 times row 1 to row 2:
[[ 1.  1. -1.]
 [ 0. -1.  2.]
 [ 0.  0.  0.]]
Add 1 times row 1 to row 0:
[[ 1.  0.  1.]
 [ 0. -1.  2.]
 [ 0.  0.  0.]]
Multiply row 1 by -1:
[[ 1.  0.  1.]
 [ 0.  1. -2.]
 [ 0.  0.  0.]]

Solving a linear system with Gaussian Elimination

We'll consider the following linear system which has more unknowns than equations. This means the coefficient matrix will not be square. Therefore, in this case we couldn't use .solve even if we wanted to.

\begin{aligned} 2 x_{1} + 4 x_{3} + 4 x_{4} & = 14 \\ - x_{1} - x_{3} & = - 3 \\ - 3 x_{1} + 6 x_{4} & = 3 \end{aligned}

Write it as an augmented matrix:

[\begin{array}{ccccc} 2 & 0 & 4 & 4 & 14 \\ - 1 & 0 & - 1 & 0 & - 3 \\ - 3 & 0 & 0 & 6 & 3 \end{array}]

Use row operations to take it to rref.

A = np.array([[2,0,4,4],[-1,0,-1,0],[-3,0,0,6]])
b = np.array([14,-3,3]).reshape(3,1)
aug_mat = np.hstack([A,b]) # (1)
print(aug_mat)

.hstack allows us to horizontally stack two arrays to form the augmented matrix.

[[ 2  0  4  4 14]
 [-1  0 -1  0 -3]
 [-3  0  0  6  3]]

A1 = swap_row(aug_mat,0,1)
print(A1)

[[-1.  0. -1.  0. -3.]
 [ 2.  0.  4.  4. 14.]
 [-3.  0.  0.  6.  3.]]

A2 = add_row(A1,2,0,1)
A3 = add_row(A2,-3,0,2)
A4 = scale_row(A3,-1,0)
print(A4)

[[ 1.  0.  1.  0.  3.]
 [ 0.  0.  2.  4.  8.]
 [ 0.  0.  3.  6. 12.]]

A5 = scale_row(A4,1/3,2)
print(A5)

[[1. 0. 1. 0. 3.]
 [0. 0. 2. 4. 8.]
 [0. 0. 1. 2. 4.]]

A6 = add_row(A5,-2,2,1)
A7 = add_row(A6,-1,2,0)
print(A7)

[[ 1.  0.  0. -2. -1.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  1.  2.  4.]]

A8 = swap_row(A7,1,2)
print(A8)

[[ 1.  0.  0. -2. -1.]
 [ 0.  0.  1.  2.  4.]
 [ 0.  0.  0.  0.  0.]]

At this point we have reduced the matrix to rref. We see there are two free variables ( $x_{2}$ and $x_{4}$ ). The general solution is:

\vec{x} = [\begin{matrix} - 1 \\ 0 \\ 4 \\ 0 \end{matrix}] + s [\begin{matrix} 0 \\ 1 \\ 0 \\ 0 \end{matrix}] + t [\begin{matrix} 2 \\ 0 \\ - 2 \\ 1 \end{matrix}], for s, t \in R

rref

NumPy (SciPy) does not have a built-in function to put a matrix in reduced row echelon form (rref). However, we can implement our own.

To put a matrix into reduced row echelon form, we define rref. rref returns a tuple of two elements. The first is the reduced row echelon form, and the second is a tuple of indices of the pivot columns. The length of this tuple is the rank of the matrix.

def rref(A, tol=1.0e-12):
    ''' input: a matrix (2D array)
       output: rref form of the matrix, and a tuple of pivot columns
    '''   
    m, n = A.shape
    i, j = 0, 0
    jb = [] # list of pivot columns

    while i < m and j < n:
        # Find value and index of largest element in the remainder of column j
        k = np.argmax(np.abs(A[i:m, j])) + i
        p = np.abs(A[k, j])
        if p <= tol:
            # The column is negligible, zero it out
            A[i:m, j] = 0.0
            j += 1
        else:
            # Remember the column index
            jb.append(j)
            if i != k:
                # Swap the i-th and k-th rows
                A[[i, k], j:n] = A[[k, i], j:n]
            # Divide the pivot row i by the pivot element A[i, j]
            A[i, j:n] = A[i, j:n] / A[i, j]
            # Subtract multiples of the pivot row from all the other rows
            for k in range(m):
                if k != i:
                    A[k, j:n] -= A[k, j] * A[i, j:n]
            i += 1
            j += 1
    # Finished
    return A, tuple(jb)

Example

Compute the rref of the matrix

[\begin{matrix} 2 & 1 & 0 \\ 1 & 1 & - 1 \\ 0 & - 1 & 2 \end{matrix}]

A = np.array([[2.,1,0],[1,1,-1],[0,-1,2]])
rref(A)

(array([[ 1.,  0.,  1.],
        [ 0.,  1., -2.],
        [ 0.,  0.,  0.]]),
 (0, 1))

Example

Compute the rref of the matrix

[\begin{matrix} 2 & 0 & 4 & 4 \\ - 1 & 0 & - 1 & 0 \\ - 3 & 0 & 0 & 6 \end{matrix}]

A = np.array([[2.,0,4,4],[-1,0,-1,0],[-3,0,0,6]])
rref(A)

(array([[ 1,  0,  0, -2],
        [ 0,  0,  1,  2],
        [ 0,  0,  0,  0]]),
 (0, 2))

Example

For the linear system, find the rref of the augmented matrix.

[\begin{matrix} 2 & 0 & 4 & 4 \\ - 1 & 0 & - 1 & 0 \\ - 3 & 0 & 0 & 6 \end{matrix}] \vec{x} = [\begin{matrix} 14 \\ - 3 \\ 3 \end{matrix}]

A = np.array([[2,0,4,4],[-1,0,-1,0],[-3,0,0,6]])
x = np.array([-1,2,4,0])
b = np.array([14,-3,3]).reshape(3,1)
aug_mat = np.hstack([A,b])
print(aug_mat)
rref(aug_mat)

[[ 2  0  4  4 14]
 [-1  0 -1  0 -3]
 [-3  0  0  6  3]]

(array([[ 1,  0,  0, -2, -1],
        [ 0,  0,  1,  2,  4],
        [ 0,  0,  0,  0,  0]]),
 (0, 2))

Computing an Inverse with Gaussian Elimination

Recall, if we have an invertible matrix $A$ , then we can compute the inverse using Gaussian elimination as follows:

[A | I] \to_{to rref}^{gaussian elim} [I | B]

Matrix $B$ is the inverse of $A$ .

Note

If you are wondering why the inverse matrix can be computed in this way, it comes back to elementary matrices. Suppose $n$ row operations are applied to take $A$ to the identity $I$ . Let $E_{1}, E_{2}, \dots E_{n}$ be the $n$ elementary matrices corresponding to these row operations. Then, since row operations could be done by multiplying by elementary matrices, it follows that

E_{n} \dots E_{2} E_{1} A = I

Therefore, multiplying $A^{- 1}$ to the right or each side:

A^{- 1} = E_{n} \dots E_{2} E_{1} I

This means $A^{- 1}$ is obtained by doing the row operations on $I$ .

Example

Let's find the inverse of

A = [\begin{matrix} 1 & 2 \\ 3 & 4 \end{matrix}]

A = np.array([[1,2],[3,4]])
I = np.eye(2)
aug_mat = np.hstack([A,I])
print(aug_mat)

[[1. 2. 1. 0.]
 [3. 4. 0. 1.]]

B = add_row(aug_mat,-3,0,1)
print(B)

[[ 1.  2.  1.  0.]
 [ 0. -2. -3.  1.]]

B = add_row(B,1,1,0)
B = scale_row(B,-0.5,1)
print(B)

[[ 1.   0.  -2.   1. ]
 [ 0.   1.   1.5 -0.5]]

Therefore, we have

A^{- 1} = [\begin{matrix} - 2 & 1 \\ 1.5 & - 0.5 \end{matrix}]

We can verify we found the inverse by checking that $A A^{- 1} = I$ .

A_inv = B[:,2:]
A @ A_inv

array([[1., 0.],
       [0., 1.]])

Nullspace

The nullspace of a matrix can be computed with scipy.linalg.null_space. This returns an orthonormal basis for the nullspace.

A = np.array([[1,0,0,1],[1,0,1,0]])
la.null_space(A)

array([[-0.33333333, -0.47140452],
       [-0.81649658,  0.57735027],
       [ 0.33333333,  0.47140452],
       [ 0.33333333,  0.47140452]])