Image Convolutions
Image processing techniques involve changing the colour of a pixel based on the colour of nearby pixels. The underlying mathematical operation used to do this is a convolution. First we'll cover the basics of what convolution is, staring with 1D arrays, then moving on to matrices. Then we will apply convolutions to image processing. We'll quickly uncover two of the most important operations in image processing: edge detection and gaussian blur.
1D convolution
Consider a 1D array \(a = [a_0, a_1, a_2, \ldots, a_n]\) of some arbitrary length \(n\), and a 1D kernel for the filter \(b = [b_0, b_1, b_2]\) of length 3. The convolution of \(a\) by kernel \(b\) is the sequence of length \(n\):
where
Below is a visual representation to help see what is going on. Notice the filter (the sequence we overlay on \(a\)) is the kernel \(b\) flipped. (This is just a consequence of the definition of \(c_k\) where the index on \(a\) is increasing, but the index on \(b\) is decreasing.)
For example, to determine \(c_3\) of the convolution we centre the filter over \(a_3\), then compute the sum of the overlapping products. (2(2) + 1(1) + 0(-1) = 3)
Doing this for each entry, we get that \(c\) is
The challenge is what to do at the left and right boundaries. There are a few options for extending the array, we've used a 'reflection' in our calculations above.
-
reflect (d c b a | a b c d | d c b a)
The sequence is extended by reflecting about the edge of the last entry. -
constant (k k k k | a b c d | k k k k)
The sequence is extended by filling all values beyond the edge with some constant value k. -
nearest (a a a a | a b c d | d d d d)
The sequence is extended by replicating the last entry. -
mirror (d c b | a b c d | c b a)
The sequence is extended by reflecting about the centre of the last entry. -
wrap (a b c d | a b c d | a b c d)
The sequence is extended by wrapping around to the opposite edge.
SciPy can compute convolutions with scipy.ndimage.filters.convolve1d
.
A good picture to have in mind when trying to verify how the calculation works is
...,2, 1][1, 2, 1, 1, 0, 1, 2, 2, 1][1, 2...
[2, 1,-1]
and then imagine the filter sliding to the right along the array. Here the original array is extended using 'reflect'.
Kernels can be any length, we've only done a length 3 example above. The default is to centre the kernel on the pixel, but left justification and right justification are also options.
2D convolution
scipy.ndimage.filters.convolve
Applying a 2D filter to a matrix via convolution is done much the same as in the 1D case. First the kernel is flipped to form the filter (matrix is rotated 180 degrees). Then the filter is overlayed on the matrix, centred on an entry, and the elementwise products are computed then summed. The resulting value is the corresponding value in the convolution.
For example consider the \(9\times 9\) matrix which has a centre \(5\times 5\) square of 0's and a width 2 boundary of 1's. The filter we want to apply is one that replaces a pixels value by a sum of its southern and eastern neighbours. That is, we will apply the filter:
For example, to compute the value of the convolution at \((2,2)\) we overlay the filter on the image, centering it at pixel \((2,2)\), then compute elementwise product, and sum the resulting values.
This means the value at \((2,2)\) in the convolution is \(2\).
Let's use scipy.ndimage.filters.convolve
to compute all the entries in the convolution:
A = np.ones((9,9)) # build matrix of 1's
A[2:-2,2:-2]=np.zeros((5,5)) # carve out the matrix of 0's
K = np.array([[1,1,1],
[1,1,0],
[1,0,0]])
convolve(A,K)
array([[6., 6., 6., 6., 6., 6., 6., 6., 6.],
[6., 5., 4., 3., 3., 3., 4., 5., 6.],
[6., 4., 2., 1., 1., 1., 3., 5., 6.],
[6., 3., 1., 0., 0., 0., 3., 5., 6.],
[6., 3., 1., 0., 0., 0., 3., 5., 6.],
[6., 3., 1., 0., 0., 0., 3., 5., 6.],
[6., 4., 3., 3., 3., 3., 5., 6., 6.],
[6., 5., 5., 5., 5., 5., 6., 6., 6.],
[6., 6., 6., 6., 6., 6., 6., 6., 6.]])
For the boundary values, the default is to 'reflect' the matrix to fill in missing values outside the boundary.
a a b c
a | a b c . . .
d | d e f . . .
| . . .
| . . .
Common Kernels for Images
In the following examples we'll work with gray scale images. First let's load the necessary packages, and our function to convert to gray scale.
import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage
def rgb2gray(rgb):
r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
return gray
Discrete Differentiation Operators
To measure the change in pixel intensity we can look at how the colour value changes in both the \(x\) and \(y\) direction. Here we'll think of \(x\) as the first index (i.e. row, or vertical index) and \(y\) as the second index (i.e. column, or horizontal index). If we view a gray scale image as a function \(f(x,y)\) then can approximate the partial derivatives \(\frac{\partial f}{\partial x}\) and \(\frac{\partial f}{\partial y}\) at index \((x,y)\) by
Here, \(\Delta x = \Delta y = 1\), since this is just the change in the index. We can also ignore the \(2\) in the denominator since this just scales the final pixel values.
Therefore,
Hence, we can compute these partials numerically by using the following discrete differentiation operator kernels.
Dx = np.array([1,0, -1], np.float32).reshape((3,1))
Dy = np.array([1, 0, -1], np.float32).reshape((1,3))
Let's apply them to this image of a cube.
img = plt.imread('cube.jpg')
img_gray = rgb2gray(img)
img_Dx = ndimage.filters.convolve(img_gray,Dx)
img_Dy = ndimage.filters.convolve(img_gray,Dy)
# draw images
fig = plt.figure(figsize=(12,12))
fig.add_subplot(1,3,1)
plt.imshow(img_gray, cmap='gray')
plt.title("original")
fig.add_subplot(1,3,2)
plt.imshow(img_Dx, cmap='gray')
plt.title("Dx")
fig.add_subplot(1,3,3)
plt.imshow(img_Dy, cmap='gray')
plt.title("Dy")
These two operators are picking up intensity changes in the vertical and horizontal directions, respectively. Recall the gradient is \(\vec{\nabla}f = [f_x, f_y]\), and we have just displayed the values of \(f_x\) and \(f_y\) as images. We can combine them together into the gradient norm: \(|\vec{\nabla}f| = \sqrt{f_x^2+ f_y^2}\) (and we can also compute the angle of the gradient vector too, but we won't use that just yet).
def diff_filters(img):
''' input: a gray scale image
output: the gray scale edge detection image, and direction matrix theta'''
Dx = np.array([1,0, -1], np.float32).reshape((3,1))
Dy = np.array([1, 0, -1], np.float32).reshape((1,3))
Ix = ndimage.filters.convolve(img, Dx)
Iy = ndimage.filters.convolve(img, Dy)
G = np.hypot(Ix, Iy)
G = G / G.max() * 255
theta = np.arctan2(Iy, Ix)
return (G, theta)
img = plt.imread('cube.jpg')
img_gray = rgb2gray(img)
img_diff = diff_filters(img_gray)[0]
# draw images
fig = plt.figure(figsize=(8,8))
fig.add_subplot(1,2,1)
plt.imshow(img_gray, cmap='gray')
plt.title("original")
fig.add_subplot(1,2,2)
plt.imshow(img_diff, cmap='gray')
plt.title("discrete differential")
We can immediately see the effect of considering the change in intensity. We've detected edges in the original image. We'll now look at slight changes to these kernels, and uncover the Sobel Operator which is one of the most commonly used edge detection kernels.
Prewitt Operators
Prewitt Operators are another discrete differential operator which samples a few more pixel values in the vertical and horizontal directions.
def prewitt_filters(img):
''' input: a gray scale image
output: the gray scale edge detection image, and direction matrix theta'''
Px = np.array([[1, 1, 1], [0, 0, 0], [-1, -1, -1]], np.float32)
Py = np.array([[1, 0, -1], [1, 0, -1], [1, 0, -1]], np.float32)
Ix = ndimage.filters.convolve(img, Px)
Iy = ndimage.filters.convolve(img, Py)
G = np.hypot(Ix, Iy)
G = G / G.max() * 255
theta = np.arctan2(Iy, Ix)
return (G, theta)
img = plt.imread('cube.jpg')
img_gray = rgb2gray(img)
img_prewitt = prewitt_filters(img_gray)[0]
# draw images
fig = plt.figure(figsize=(8,8))
fig.add_subplot(1,2,1)
plt.imshow(img_gray, cmap='gray')
plt.title("original")
fig.add_subplot(1,2,2)
plt.imshow(img_prewitt, cmap='gray')
plt.title("prewitt")
Sobel Operators
Sobel operators are the sum of the differential and Prewitt operators. It samples six of the neighboring pixels, giving double the weight to the vertical and horizontal neighbours.
def sobel_filters(img):
''' input: a gray scale image
output: the gray scale edge detection image, and direction matrix theta'''
Kx = np.array([[1, 2, 1], [0, 0, 0], [-1, -2, -1]], np.float32)
Ky = np.array([[1, 0, -1], [2, 0, -2], [1, 0, -1]], np.float32)
Ix = ndimage.filters.convolve(img, Kx)
Iy = ndimage.filters.convolve(img, Ky)
G = np.hypot(Ix, Iy)
G = G / G.max() * 255
theta = np.arctan2(Iy, Ix)
return (G, theta)
img = plt.imread('cube.jpg')
img_gray = rgb2gray(img)
img_sobel = sobel_filters(img_gray)[0]
# draw images
fig = plt.figure(figsize=(8,8))
fig.add_subplot(1,2,1)
plt.imshow(img_gray, cmap='gray')
plt.title("original")
fig.add_subplot(1,2,2)
plt.imshow(img_sobel, cmap='gray')
plt.title("sobel")
Gaussian Operator
The Gaussian distribution
is used to construct an \(n \times n\) kernel. This means the closer a pixel is to the central pixel the more weight it contributes to its intensity. This has the effect of blurring (or smoothing) the image.
def gaussian_kernel(kernel_size, sigma=1):
size = int(kernel_size) // 2
x, y = np.mgrid[-size:size+1, -size:size+1]
normal = 1 / (2.0 * np.pi * sigma**2)
g = np.exp(-((x**2 + y**2) / (2.0*sigma**2))) * normal
return g
array([[0.05854983, 0.09653235, 0.05854983],
[0.09653235, 0.15915494, 0.09653235],
[0.05854983, 0.09653235, 0.05854983]])
img = plt.imread('cube.jpg')
img_gray = rgb2gray(img)
img_gaussian3 = ndimage.filters.convolve(img_gray,gaussian_kernel(3))
img_gaussian5 = ndimage.filters.convolve(img_gray,gaussian_kernel(5))
# draw images
fig = plt.figure(figsize=(12,12))
fig.add_subplot(1,3,1)
plt.imshow(img_gray, cmap='gray')
plt.title("original")
fig.add_subplot(1,3,2)
plt.imshow(img_gaussian3, cmap='gray')
plt.title("gaussian 3x3")
fig.add_subplot(1,3,3)
plt.imshow(img_gaussian5, cmap='gray')
plt.title("gaussian 5x5")
This is the blurring technique known as Gaussian Blur.
2D Convolution with Rank 1 Matrices
A rank 1 matrix can be written as the outer product of two vectors:
The convolution of a matrix A with a rank 1 matrix B can be computed as
We call a filter separable if it has a rank 1 kernel, and hence can be written as the outer product of two vectors. Therefore, this means that for an \(n \times n\) separable filter, the \(n^2\) multiplications for a single pixel is replaced by \(n+n = 2n\) multiplications. This will mean an increase in speed for \(n \ge 3\).
The Prewitt, Sobel and Gaussian filters are all separable:
Prewitt Operators:
Sobel Operators:
Gaussian Operator:
For example, let's apply the vertical Sobel operator as a product of two 1D vectors.
img = plt.imread('cube.jpg')
img_gray = rgb2gray(img)
K_xu = np.array([1,0,-1]).reshape((3,1))
K_xv = np.array([1,2,1]).reshape((1,3))
G_1d = convolve(convolve(img_gray,K_xu),K_xv)
plt.imshow(G_1d,cmap="gray")
To understand why separable filters are important from an optimization perspective, let's look at applying the Gaussian blur filter to this 4513x3385 image of a ferris wheel. We'll use a 7x7 Gaussian kernel. Applying the 2D kernel would mean doing \(3385(4513)7^2 = 731,962,245\) multiplications, but applying two 1D kernels would only require \(3385(4513)7(2) = 209,132,070\) multiplications. We haven't even counted the additions that have to be done in each method. Let's see that applying two 1D kernels is indeed faster.
def gaussian_kernel_1d(kernel_size, sigma=1):
size = int(kernel_size) // 2
x = np.arange(-size, size+1)
normal = 1 / (np.sqrt(2.0 * np.pi) * sigma)
g = np.exp(-((x**2) / (2.0*sigma**2))) * normal
return g
def gaussian_kernel(kernel_size, sigma=1):
size = int(kernel_size) // 2
x, y = np.mgrid[-size:size+1, -size:size+1]
normal = 1 / (2.0 * np.pi * sigma**2)
g = np.exp(-((x**2 + y**2) / (2.0*sigma**2))) * normal
return g
First let's verify that the 2D kernel is the outer product of two 1D kernels.
array([[0.05854983, 0.09653235, 0.05854983],
[0.09653235, 0.15915494, 0.09653235],
[0.05854983, 0.09653235, 0.05854983]])
array([[0.05854983, 0.09653235, 0.05854983],
[0.09653235, 0.15915494, 0.09653235],
[0.05854983, 0.09653235, 0.05854983]])
Now we'll time the two methods.
import time
img = plt.imread('ferriswheel.jpg')
img_gray = rgb2gray(img)
plt.imshow(img_gray, cmap="gray")
# kernels
Kg1_u = gaussian_kernel_1d(7).reshape((7,1))
Kg1_v = gaussian_kernel_1d(7).reshape((1,7))
Kg2 = gaussian_kernel(7)
Applying two 1D filters:
start_time = time.time()
G_1d = convolve(convolve(img_gray,Kg1_u),Kg1_v)
print("--- %s seconds ---" % (time.time() - start_time))
Applying one 2D filter:
start_time = time.time()
G_2d = convolve(img_gray,Kg2)
print("--- %s seconds ---" % (time.time() - start_time))
It took about half the time to apply two 1D filters.
The moral of this story, is that if a filter is separable then we can apply it as a composition of 1D filters. Many filters won't be separable, however, we can use low rank approximation via SVD to find an approximation of the filter that is a sum of rank 1 matrices. We'll pick this up in the Image Filtering with SVD section.