Skip to content

Image Convolutions

import numpy as np
from scipy.ndimage.filters import convolve, convolve1d

Image processing techniques involve changing the colour of a pixel based on the colour of nearby pixels. The underlying mathematical operation used to do this is a convolution. First we'll cover the basics of what convolution is, staring with 1D arrays, then moving on to matrices. Then we will apply convolutions to image processing. We'll quickly uncover two of the most important operations in image processing: edge detection and gaussian blur.

1D convolution

Consider a 1D array \(a = [a_0, a_1, a_2, \ldots, a_n]\) of some arbitrary length \(n\), and a 1D kernel for the filter \(b = [b_0, b_1, b_2]\) of length 3. The convolution of \(a\) by kernel \(b\) is the sequence of length \(n\):

\[a*b = [c_0, c_1, \ldots, c_n]\]

where

\[c_k = a_{k-1}b_2 + a_{k}b_1+ a_{k+1}b_0, \quad k = 0, \ldots n\]

Below is a visual representation to help see what is going on. Notice the filter (the sequence we overlay on \(a\)) is the kernel \(b\) flipped. (This is just a consequence of the definition of \(c_k\) where the index on \(a\) is increasing, but the index on \(b\) is decreasing.)

For example, to determine \(c_3\) of the convolution we centre the filter over \(a_3\), then compute the sum of the overlapping products. (2(2) + 1(1) + 0(-1) = 3)

Image Image

Doing this for each entry, we get that \(c\) is

Image Image

The challenge is what to do at the left and right boundaries. There are a few options for extending the array, we've used a 'reflection' in our calculations above.

  • reflect (d c b a | a b c d | d c b a)
    The sequence is extended by reflecting about the edge of the last entry.

  • constant (k k k k | a b c d | k k k k)
    The sequence is extended by filling all values beyond the edge with some constant value k.

  • nearest (a a a a | a b c d | d d d d)
    The sequence is extended by replicating the last entry.

  • mirror (d c b | a b c d | c b a)
    The sequence is extended by reflecting about the centre of the last entry.

  • wrap (a b c d | a b c d | a b c d)
    The sequence is extended by wrapping around to the opposite edge.

SciPy can compute convolutions with scipy.ndimage.filters.convolve1d.

import numpy as np
from scipy.ndimage.filters import convolve1d
a = np.array([1,2,1,1,0,1,2,2,1])
w = np.array([-1,1,2])
convolve1d1d(a,w)
array([ 1,  3,  4,  3,  1, -1,  2,  5,  4])

A good picture to have in mind when trying to verify how the calculation works is

...,2, 1][1, 2, 1, 1, 0, 1, 2, 2, 1][1, 2...
      [2, 1,-1]

and then imagine the filter sliding to the right along the array. Here the original array is extended using 'reflect'.

Kernels can be any length, we've only done a length 3 example above. The default is to centre the kernel on the pixel, but left justification and right justification are also options.

2D convolution

scipy.ndimage.filters.convolve

Applying a 2D filter to a matrix via convolution is done much the same as in the 1D case. First the kernel is flipped to form the filter (matrix is rotated 180 degrees). Then the filter is overlayed on the matrix, centred on an entry, and the elementwise products are computed then summed. The resulting value is the corresponding value in the convolution.

For example consider the \(9\times 9\) matrix which has a centre \(5\times 5\) square of 0's and a width 2 boundary of 1's. The filter we want to apply is one that replaces a pixels value by a sum of its southern and eastern neighbours. That is, we will apply the filter:

\[\text{filter:} \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 1 & 1 \\ \end{bmatrix} \quad \text{kernel: } K = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 0 \\ \end{bmatrix} \]

For example, to compute the value of the convolution at \((2,2)\) we overlay the filter on the image, centering it at pixel \((2,2)\), then compute elementwise product, and sum the resulting values.

Image Image

This means the value at \((2,2)\) in the convolution is \(2\).

Let's use scipy.ndimage.filters.convolve to compute all the entries in the convolution:

A = np.ones((9,9)) # build matrix of 1's
A[2:-2,2:-2]=np.zeros((5,5)) # carve out the matrix of 0's
K = np.array([[1,1,1],
              [1,1,0],
              [1,0,0]])
convolve(A,K)
array([[6., 6., 6., 6., 6., 6., 6., 6., 6.],
       [6., 5., 4., 3., 3., 3., 4., 5., 6.],
       [6., 4., 2., 1., 1., 1., 3., 5., 6.],
       [6., 3., 1., 0., 0., 0., 3., 5., 6.],
       [6., 3., 1., 0., 0., 0., 3., 5., 6.],
       [6., 3., 1., 0., 0., 0., 3., 5., 6.],
       [6., 4., 3., 3., 3., 3., 5., 6., 6.],
       [6., 5., 5., 5., 5., 5., 6., 6., 6.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6.]])

For the boundary values, the default is to 'reflect' the matrix to fill in missing values outside the boundary.

a   a b c
a | a b c . . .
d | d e f . . .
  | . . .
  | . . .

Common Kernels for Images

In the following examples we'll work with gray scale images. First let's load the necessary packages, and our function to convert to gray scale.

import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage

def rgb2gray(rgb):

    r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
    gray = 0.2989 * r + 0.5870 * g + 0.1140 * b

    return gray

Discrete Differentiation Operators

To measure the change in pixel intensity we can look at how the colour value changes in both the \(x\) and \(y\) direction. Here we'll think of \(x\) as the first index (i.e. row, or vertical index) and \(y\) as the second index (i.e. column, or horizontal index). If we view a gray scale image as a function \(f(x,y)\) then can approximate the partial derivatives \(\frac{\partial f}{\partial x}\) and \(\frac{\partial f}{\partial y}\) at index \((x,y)\) by

\[ f_x = \frac{\partial f}{\partial x} = \frac{f(x+\Delta x, y) - f(x-\Delta x,y) }{2\Delta x} \]
\[ f_y = \frac{\partial f}{\partial y} = \frac{f(x, y+\Delta y) - f(x,y-\Delta y) }{2\Delta y} \]

Here, \(\Delta x = \Delta y = 1\), since this is just the change in the index. We can also ignore the \(2\) in the denominator since this just scales the final pixel values.

Therefore,

\[ f_x(x,y) \approx f(x+1,y) - f(x-1,y) \]
\[ f_y(x,y) \approx f(x,y+1) - f(x,y-1) \]

Hence, we can compute these partials numerically by using the following discrete differentiation operator kernels.

\[ D_x = \begin{bmatrix} 1\\ 0\\ -1\\ \end{bmatrix} \quad \text{ and } \quad D_y = \begin{bmatrix} 1 & 0 & -1\\ \end{bmatrix} \]
Dx = np.array([1,0, -1], np.float32).reshape((3,1))
Dy = np.array([1, 0, -1], np.float32).reshape((1,3))

Let's apply them to this image of a cube.

img = plt.imread('cube.jpg')
img_gray = rgb2gray(img)
img_Dx = ndimage.filters.convolve(img_gray,Dx)
img_Dy = ndimage.filters.convolve(img_gray,Dy)

# draw images
fig = plt.figure(figsize=(12,12))
fig.add_subplot(1,3,1)
plt.imshow(img_gray, cmap='gray')
plt.title("original")
fig.add_subplot(1,3,2)
plt.imshow(img_Dx, cmap='gray')
plt.title("Dx")
fig.add_subplot(1,3,3)
plt.imshow(img_Dy, cmap='gray')
plt.title("Dy")

Image Image

These two operators are picking up intensity changes in the vertical and horizontal directions, respectively. Recall the gradient is \(\vec{\nabla}f = [f_x, f_y]\), and we have just displayed the values of \(f_x\) and \(f_y\) as images. We can combine them together into the gradient norm: \(|\vec{\nabla}f| = \sqrt{f_x^2+ f_y^2}\) (and we can also compute the angle of the gradient vector too, but we won't use that just yet).

def diff_filters(img):
    ''' input: a gray scale image
       output: the gray scale edge detection image, and direction matrix theta'''
    Dx = np.array([1,0, -1], np.float32).reshape((3,1))
    Dy = np.array([1, 0, -1], np.float32).reshape((1,3))

    Ix = ndimage.filters.convolve(img, Dx)
    Iy = ndimage.filters.convolve(img, Dy)

    G = np.hypot(Ix, Iy)
    G = G / G.max() * 255
    theta = np.arctan2(Iy, Ix)

    return (G, theta)
img = plt.imread('cube.jpg')
img_gray = rgb2gray(img)
img_diff = diff_filters(img_gray)[0]

# draw images
fig = plt.figure(figsize=(8,8))
fig.add_subplot(1,2,1)
plt.imshow(img_gray, cmap='gray')
plt.title("original")
fig.add_subplot(1,2,2)
plt.imshow(img_diff, cmap='gray')
plt.title("discrete differential")

Image Image

We can immediately see the effect of considering the change in intensity. We've detected edges in the original image. We'll now look at slight changes to these kernels, and uncover the Sobel Operator which is one of the most commonly used edge detection kernels.

Prewitt Operators

Prewitt Operators are another discrete differential operator which samples a few more pixel values in the vertical and horizontal directions.

\[ P_x = \begin{bmatrix} 1 & 1 & 1\\ 0 & 0 & 0\\ -1 & -1 & -1\\ \end{bmatrix} \quad \text{ and } \quad P_y = \begin{bmatrix} 1 & 0 & -1\\ 1 & 0 & -1\\ 1 & 0 & -1\\ \end{bmatrix} \]
def prewitt_filters(img):
    ''' input: a gray scale image
       output: the gray scale edge detection image, and direction matrix theta'''
    Px = np.array([[1, 1, 1], [0, 0, 0], [-1, -1, -1]], np.float32)
    Py = np.array([[1, 0, -1], [1, 0, -1], [1, 0, -1]], np.float32)

    Ix = ndimage.filters.convolve(img, Px)
    Iy = ndimage.filters.convolve(img, Py)

    G = np.hypot(Ix, Iy)
    G = G / G.max() * 255
    theta = np.arctan2(Iy, Ix)

    return (G, theta)
img = plt.imread('cube.jpg')
img_gray = rgb2gray(img)
img_prewitt = prewitt_filters(img_gray)[0]

# draw images
fig = plt.figure(figsize=(8,8))
fig.add_subplot(1,2,1)
plt.imshow(img_gray, cmap='gray')
plt.title("original")
fig.add_subplot(1,2,2)
plt.imshow(img_prewitt, cmap='gray')
plt.title("prewitt")

Image Image

Sobel Operators

Sobel operators are the sum of the differential and Prewitt operators. It samples six of the neighboring pixels, giving double the weight to the vertical and horizontal neighbours.

\[ K_x = \begin{bmatrix} 1 & 2 & 1\\ 0 & 0 & 0\\ -1 & -2 & -1\\ \end{bmatrix} \quad \text{ and } \quad K_y = \begin{bmatrix} 1 & 0 & -1\\ 2 & 0 & -2\\ 1 & 0 & -1\\ \end{bmatrix} \]
def sobel_filters(img):
    ''' input: a gray scale image
       output: the gray scale edge detection image, and direction matrix theta'''
    Kx = np.array([[1, 2, 1], [0, 0, 0], [-1, -2, -1]], np.float32)
    Ky = np.array([[1, 0, -1], [2, 0, -2], [1, 0, -1]], np.float32)

    Ix = ndimage.filters.convolve(img, Kx)
    Iy = ndimage.filters.convolve(img, Ky)

    G = np.hypot(Ix, Iy)
    G = G / G.max() * 255
    theta = np.arctan2(Iy, Ix)

    return (G, theta)
img = plt.imread('cube.jpg')
img_gray = rgb2gray(img)
img_sobel = sobel_filters(img_gray)[0]

# draw images
fig = plt.figure(figsize=(8,8))
fig.add_subplot(1,2,1)
plt.imshow(img_gray, cmap='gray')
plt.title("original")
fig.add_subplot(1,2,2)
plt.imshow(img_sobel, cmap='gray')
plt.title("sobel")

Image Image

Gaussian Operator

The Gaussian distribution

\[ g(x,y) = \frac{1}{2\pi \sigma^2} \exp{\left(\frac{-x^2-y^2}{2\sigma^2}\right)} \]

is used to construct an \(n \times n\) kernel. This means the closer a pixel is to the central pixel the more weight it contributes to its intensity. This has the effect of blurring (or smoothing) the image.

def gaussian_kernel(kernel_size, sigma=1):
    size = int(kernel_size) // 2
    x, y = np.mgrid[-size:size+1, -size:size+1]
    normal = 1 / (2.0 * np.pi * sigma**2)
    g =  np.exp(-((x**2 + y**2) / (2.0*sigma**2))) * normal
    return g
gaussian_kernel(3)
array([[0.05854983, 0.09653235, 0.05854983],
       [0.09653235, 0.15915494, 0.09653235],
       [0.05854983, 0.09653235, 0.05854983]])
img = plt.imread('cube.jpg')
img_gray = rgb2gray(img)
img_gaussian3 = ndimage.filters.convolve(img_gray,gaussian_kernel(3))
img_gaussian5 = ndimage.filters.convolve(img_gray,gaussian_kernel(5))

# draw images
fig = plt.figure(figsize=(12,12))
fig.add_subplot(1,3,1)
plt.imshow(img_gray, cmap='gray')
plt.title("original")
fig.add_subplot(1,3,2)
plt.imshow(img_gaussian3, cmap='gray')
plt.title("gaussian 3x3")
fig.add_subplot(1,3,3)
plt.imshow(img_gaussian5, cmap='gray')
plt.title("gaussian 5x5")

Image Image

This is the blurring technique known as Gaussian Blur.

2D Convolution with Rank 1 Matrices

A rank 1 matrix can be written as the outer product of two vectors:

\[B = \vec{u}\vec{v}^T\]

The convolution of a matrix A with a rank 1 matrix B can be computed as

\[A*B = A*\vec{u}\vec{v}^T = (A*\vec{u})*\vec{v}^T\]

We call a filter separable if it has a rank 1 kernel, and hence can be written as the outer product of two vectors. Therefore, this means that for an \(n \times n\) separable filter, the \(n^2\) multiplications for a single pixel is replaced by \(n+n = 2n\) multiplications. This will mean an increase in speed for \(n \ge 3\).

The Prewitt, Sobel and Gaussian filters are all separable:

Prewitt Operators:

\[P_x = \begin{bmatrix} 1 & 1 & 1\\ 0 & 0 & 0\\ -1 & -1 & -1\\ \end{bmatrix} = \begin{bmatrix} 1\\ 0\\ -1\\ \end{bmatrix} \begin{bmatrix} 1 & 1 & 1 \end{bmatrix} \]
\[ P_y = \begin{bmatrix} 1 & 0 & -1\\ 1 & 0 & -1\\ 1 & 0 & -1\\ \end{bmatrix} = \begin{bmatrix} 1\\ 1\\ 1\\ \end{bmatrix} \begin{bmatrix} 1 & 0 & -1 \end{bmatrix} \]

Sobel Operators:

\[ K_x = \begin{bmatrix} 1 & 2 & 1\\ 0 & 0 & 0\\ -1 & -2 & -1\\ \end{bmatrix} = \begin{bmatrix} 1\\ 0\\ -1\\ \end{bmatrix} \begin{bmatrix} 1 & 2 & 1 \end{bmatrix} \]
\[ K_y = \begin{bmatrix} 1 & 0 & -1\\ 2 & 0 & -2\\ 1 & 0 & -1\\ \end{bmatrix} = \begin{bmatrix} 1\\ 2\\ 1\\ \end{bmatrix} \begin{bmatrix} 1 & 0 & -1 \end{bmatrix} \]

Gaussian Operator:

\[g(x,y) = \frac{1}{2\pi\sigma^2} e^{\frac{-x^2-y^2}{2\sigma^2}} = \left(\frac{1}{\sqrt{2\pi}\sigma} e^{\frac{-x^2}{2\sigma^2}}\right)\left(\frac{1}{\sqrt{2\pi}\sigma} e^{\frac{-y^2}{2\sigma^2}}\right) \]

For example, let's apply the vertical Sobel operator as a product of two 1D vectors.

img = plt.imread('cube.jpg')
img_gray = rgb2gray(img)

K_xu = np.array([1,0,-1]).reshape((3,1))
K_xv = np.array([1,2,1]).reshape((1,3))

G_1d = convolve(convolve(img_gray,K_xu),K_xv)

plt.imshow(G_1d,cmap="gray")

Image Image

To understand why separable filters are important from an optimization perspective, let's look at applying the Gaussian blur filter to this 4513x3385 image of a ferris wheel. We'll use a 7x7 Gaussian kernel. Applying the 2D kernel would mean doing \(3385(4513)7^2 = 731,962,245\) multiplications, but applying two 1D kernels would only require \(3385(4513)7(2) = 209,132,070\) multiplications. We haven't even counted the additions that have to be done in each method. Let's see that applying two 1D kernels is indeed faster.

def gaussian_kernel_1d(kernel_size, sigma=1):
    size = int(kernel_size) // 2
    x = np.arange(-size, size+1)
    normal = 1 / (np.sqrt(2.0 * np.pi) * sigma)
    g =  np.exp(-((x**2) / (2.0*sigma**2))) * normal
    return g

def gaussian_kernel(kernel_size, sigma=1):
    size = int(kernel_size) // 2
    x, y = np.mgrid[-size:size+1, -size:size+1]
    normal = 1 / (2.0 * np.pi * sigma**2)
    g =  np.exp(-((x**2 + y**2) / (2.0*sigma**2))) * normal
    return g    

First let's verify that the 2D kernel is the outer product of two 1D kernels.

gaussian_kernel_1d(3).reshape((3,1))@gaussian_kernel_1d(3).reshape((1,3))
array([[0.05854983, 0.09653235, 0.05854983],
       [0.09653235, 0.15915494, 0.09653235],
       [0.05854983, 0.09653235, 0.05854983]])
gaussian_kernel(3)
array([[0.05854983, 0.09653235, 0.05854983],
       [0.09653235, 0.15915494, 0.09653235],
       [0.05854983, 0.09653235, 0.05854983]])

Now we'll time the two methods.

import time

img = plt.imread('ferriswheel.jpg')
img_gray = rgb2gray(img)
plt.imshow(img_gray, cmap="gray")

# kernels
Kg1_u = gaussian_kernel_1d(7).reshape((7,1))
Kg1_v = gaussian_kernel_1d(7).reshape((1,7))
Kg2 = gaussian_kernel(7)

Applying two 1D filters:

start_time = time.time()
G_1d = convolve(convolve(img_gray,Kg1_u),Kg1_v)
print("--- %s seconds ---" % (time.time() - start_time))
--- 0.7608718872070312 seconds ---

Applying one 2D filter:

start_time = time.time()
G_2d = convolve(img_gray,Kg2)
print("--- %s seconds ---" % (time.time() - start_time))
--- 1.4453213214874268 seconds ---

It took about half the time to apply two 1D filters.

The moral of this story, is that if a filter is separable then we can apply it as a composition of 1D filters. Many filters won't be separable, however, we can use low rank approximation via SVD to find an approximation of the filter that is a sum of rank 1 matrices. We'll pick this up in the Image Filtering with SVD section.