Skip to content

Image Processing Basics with NumPy

Getting Started with Images in Python

An image consists of a rectangular array of pixels where each one is assigned a colour. For example, here is an image with 9 pixels, each pixel is assigned a specific colour.

Image Image

Pixels are indexed from left-to-right, top-to-bottom, so the top-left pixel has index \((0,0)\), the second pixel in the top row (light blue pixel) has index \((0,1)\), and so on. Pixel \((2,2)\) is the colour green.

We can represent this image as \(3\times 3\) matrix where each entry is a colour. Colours can be represented in many ways: HEX, RGB, HSL, CYMK. We'll represent a colour by a 3D vector of RGB values. Each value is from 0 to 255. The components of this vector are know as the colour channels, and the higher the value the more colour that comes from this channel. So \([0,0,255]\) is the colour blue. Two special colours to note: [255,255,255] is white, and [0,0,0] is black.

For example, our image above is represented by the 3D array:

np.array([
          [ [234, 12, 16], [12, 168, 234], [255, 179, 0] ],
          [ [14, 242, 6],  [255, 1, 240],  [251, 255, 1] ],
          [ [1, 193, 255], [179, 9, 229],  [1, 255, 93] ]
         ])

To select a particular value in this array we can use its index. The indices \((i,j,c)\) are: row index i, column index j, and colour channel c. This means the value at \((1,1,0)\) is the amount of red in that centre pixels colour - which is \(255\). Similarly, the value at \((1,1,1)\) is the amount of green in that centre pixels colour - which is \(1\) (so basically no green is in that pixel).

At first, it may be difficult to wrap your head around a 3D array. After all, visualizing in 3D can be challenging. However, it is best to view an image as a matrix of pixels, where the third dimension represents colour. Therefore, using RGB vectors to represent colour, one could view an image as 3 matrices, each one corresponding to an RGB channel.

Image Image

It may also help to view them as stacked in the 3rd dimension.

Image Image

Since we can represent images as matrices, this allows us to use all the tools from linear algebra that we have developed. Let's get started...

The photo we will be using is this ferris wheel.

First we load the main libraries we will need.

import numpy as np
import matplotlib.pyplot as plt

We will use np.arrays to represent images, and matplotlib to display images. We can do this with no additional packages required. However, two common packages for working with images are imageio and PIL (Python Imaging Library). These libraries provide additional functionality, but our goal is to work directly on the NumPy array, writing any image manipulation function ourselves, so we won't go into any of the more powerful features. Therefore, in what follows we will provide the code to work with images using any of the three methods: the basic tools which we'll just refer to as matplotlib, the imageio package, or the pil package. Note, these titles are a little misleading because matplotlib and numpy are used in the background in all three cases.

Opening, Displaying and Saving Images in Python

To open an image into python (as a 3D array), and then display it to the screen.

img = plt.imread('ferriswheel.jpg')

# create figure to plot image to
plt.figure(figsize=(4,4))
plt.imshow(img)
import imageio

img = imageio.imread('ferriswheel.jpg')

# create figure to plot image to
plt.figure(figsize=(4,4))
plt.imshow(img)
from PIL import Image, ImageOps

img = np.array(Image.open('ferriswheel.jpg'))

# create figure to plot image to
plt.figure(figsize=(4,4))
plt.imshow(img)

Image Image

To save an image to a file.

path = 'saved-ferriswheel.jpg'
plt.imsave(path, img)
path = 'saved-ferriswheel.jpg'
imageio.imsave(path,img)
path = 'saved-ferriswheel.jpg'
pil_img = Image.fromarray(img) # convert from array to PIL Image type
pil_img.save(path)

Details of an Image

img = plt.imread('ferriswheel.jpg')
print('# of dims: ',img.ndim)     # dimension of an image
print('Img shape: ',img.shape)    # shape of an image
print('Dtype: ',img.dtype)        # type of data stored in image
print('type: ', type(img))        # datatype of image object
print(img[20, 20])                # pixel value at [R, G, B]
print(img[:, :, 2].min())         # min pixel value at channel B
# of dims:  3
Img shape:  (3385, 4513, 3)
Dtype:  uint8
type:  <class 'numpy.ndarray'>
[ 62 139 211]
0
img = imageio.imread('ferriswheel.jpg')
print('# of dims: ',img.ndim)     # dimension of an image
print('Img shape: ',img.shape)    # shape of an image
print('Dtype: ',img.dtype)        # type of data stored in image
print('type: ', type(img))        # datatype of image object
print(img[20, 20])                # pixel value at [R, G, B]
print(img[:, :, 2].min())         # min pixel value at channel B
# of dims:  3
Img shape:  (3385, 4513, 3)
Dtype:  uint8
type:  <class 'imageio.core.util.Array'>
[ 62 139 211]
0
img = np.array(Image.open('ferriswheel.jpg'))
print('# of dims: ',img.ndim)     # dimension of an image
print('Img shape: ',img.shape)    # shape of an image
print('Dtype: ',img.dtype)        # type of data stored in image
print('type: ', type(img))        # datatype of image object
print(img[20, 20])                # pixel value at [R, G, B]
print(img[:, :, 2].min())         # min pixel value at channel B
# of dims:  3
Img shape:  (3385, 4513, 3)
Dtype:  uint8
type:  <class 'numpy.ndarray'>
[ 62 139 211]
0

Converting an Image to Gray Scale

To convert an image from colour to gray scale, we just sum the rgb components and rescale out of \(255^3\). This returns a 2D array (matrix) of gray scale values.

img = plt.imread('ferriswheel.jpg')
img_gray = img.sum(axis=-1) / (255*3)    # converting to grayscale

# create figure to plot image to
plt.figure(figsize=(4,4))
plt.imshow(img_gray, cmap='gray')
img = imageio.imread('ferriswheel.jpg')
img_gray = img.sum(axis=-1) / (255*3)    # converting to grayscale

# create figure to plot image to
plt.figure(figsize=(4,4))
plt.imshow(img_gray, cmap='gray')
img = np.array(Image.open('ferriswheel.jpg'))
img_gray = img.sum(axis=-1) / (255*3)    # converting to grayscale

# create figure to plot image to
plt.figure(figsize=(4,4))
plt.imshow(img_gray, cmap='gray')

Image Image

Above we compressed the rgb vector into a scalar essentially by equally weighting the rgb channels:

\(\text{gray_value} = \overrightarrow{rgb} \cdot [\frac{1}{3},\frac{1}{3},\frac{1}{3}]\)

However, it is common to construct the gray scale image by weighting the channels as \(g>r>b\):

\(\text{gray_value} = \overrightarrow{rgb} \cdot [0.2989,0.5870,0.1140]\)

Let's make a function called rgb2gray to do this, then convert the oolour image to gray scale.

def rgb2gray(rgb):

    r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
    gray = 0.2989 * r + 0.5870 * g + 0.1140 * b

    return gray
img = plt.imread('ferriswheel.jpg')
img_gray = rgb2gray(img)

plt.figure(figsize=(4,4))
plt.imshow(img_gray, cmap="gray")

Image Image

Separating an Image into Color Channels

To separate an image into its colour channels we can zero out the values in the other two channels. This keeps the image as an array of RGB values.

img = plt.imread('ferriswheel.jpg')
img_R, img_G, img_B = img.copy(), img.copy(), img.copy()
img_R[:, :, (1, 2)] = 0  # zero out GB channels
img_G[:, :, (0, 2)] = 0  # zero out RB channels
img_B[:, :, (0, 1)] = 0  # zero out RG channels
img_rgb = np.concatenate((img_R,img_G,img_B), axis=1)
plt.figure(figsize=(8, 8))
plt.imshow(img_rgb)
img = imageio.imread('ferriswheel.jpg')
img_R, img_G, img_B = img.copy(), img.copy(), img.copy()
img_R[:, :, (1, 2)] = 0  # zero out GB channels
img_G[:, :, (0, 2)] = 0  # zero out RB channels
img_B[:, :, (0, 1)] = 0  # zero out RG channels
img_rgb = np.concatenate((img_R,img_G,img_B), axis=1)
plt.figure(figsize=(8, 8))
plt.imshow(img_rgb)
img = np.array(Image.open('ferriswheel.jpg'))
img_R, img_G, img_B = img.copy(), img.copy(), img.copy()
img_R[:, :, (1, 2)] = 0  # zero out GB channels
img_G[:, :, (0, 2)] = 0  # zero out RB channels
img_B[:, :, (0, 1)] = 0  # zero out RG channels
img_rgb = np.concatenate((img_R,img_G,img_B), axis=1)
plt.figure(figsize=(8, 8))
plt.imshow(img_rgb)

Image Image

Alternatively, we can split the 3D array into three 2D arrays, one for each colour channel.
Note, this is different from above where we kept the entries as RGB vectors, where just two entries were 0. This means python will still treat it as a colour image when plotting. Below, we extract each colour channel as a matrix. This means when plotting python will treat each colour channel matrix as grayscale. We print out the shapes of each matrix for emphasis.

img = plt.imread('ferriswheel.jpg')
print(img.shape)
img_rch = img[:,:,0] # grab just the red channel matrix
img_gch = img[:,:,1] # grab just the green channel matrix
img_bch = img[:,:,2] # grab just the blue channel matrix
print(img_rch.shape)
img = imageio.imread('ferriswheel.jpg')
print(img.shape)
img_rch = img[:,:,0] # grab just the red channel matrix
img_gch = img[:,:,1] # grab just the green channel matrix
img_bch = img[:,:,2] # grab just the blue channel matrix
print(img_rch.shape)
img = np.array(Image.open('ferriswheel.jpg'))
print(img.shape)
img_rch = img[:,:,0] # grab just the red channel matrix
img_gch = img[:,:,1] # grab just the green channel matrix
img_bch = img[:,:,2] # grab just the blue channel matrix
print(img_rch.shape)
(3385, 4513, 3)
(3385, 4513)

We can then restack these three scalar matrices into a colour matrix.

img_stacked = np.stack([img_rch, img_gch, img_bch], axis = 2)
plt.imshow(img_stacked)
img_stacked = np.stack([img_rch, img_gch, img_bch], axis = 2)
plt.imshow(img_stacked)
img_stacked = np.stack([img_rch, img_gch, img_bch], axis = 2)
plt.imshow(img_stacked)

Image Image

If we used this method to extract colour channels, then we could restack with the zeros matrix to create the colour image with only one, or two colour channels.

Let's crete an image with only the red and green channels.

img = plt.imread('ferriswheel.jpg')

zero = np.zeros(img.shape[0:2]).astype(np.uint8) # zeros matrix
img_rch = img[:,:,0] # grab just the red channel matrix
img_gch = img[:,:,1] # grab just the green channel matrix
img_bch = img[:,:,2] # grab just the blue channel matrix

# stack only the red and green channels, zero out the blue channel
img_stacked = np.stack([img_rch, img_gch, zero], axis = 2)
plt.imshow(img_stacked)

Image Image

Now for a little fun. Let's swap the red and green channels.

img = plt.imread('ferriswheel.jpg')

img_rch = img[:,:,0] # grab just the red channel matrix
img_gch = img[:,:,1] # grab just the green channel matrix
img_bch = img[:,:,2] # grab just the blue channel matrix

img_stacked = np.stack([img_gch, img_rch, img_bch], axis = 2)
plt.imshow(img_stacked)

Image Image

Of course, we could have done this with array indexing alone. Here is a simple way to permute indices in an array.

img_perm = img[:,:,(1,0,2)]
plt.imshow(img_perm)

Image Image

Cropping an Image

Use array slicing to crop the image.

img = plt.imread('ferriswheel.jpg')

h, w, c = img.shape # height, width, channels
print(h,w)
# crop to heigth,width = 1000,1300
img_crop = img[600:1600,1500:2800,:]
print(img_crop.shape)
plt.imshow(img_crop)
plt.show()
img = imageio.imread('ferriswheel.jpg')

h, w, c = img.shape # height, width, channels
print(h,w)
# crop to heigth,width = 1000,1300
img_crop = img[600:1600,1500:2800,:]
print(img_crop.shape)
plt.imshow(img_crop)
plt.show()
img = np.array(Image.open('ferriswheel.jpg'))

h, w, c = img.shape # height, width, channels
print(h,w)
# crop to heigth,width = 1000,1300
img_crop = img[600:1600,1500:2800,:]
print(img_crop.shape)
plt.imshow(img_crop)
plt.show()    

Image Image

Negative of an Image

Since colour values are between 0 and 255, we can negate an image by taking colour value \(m\) to \(255-m\). We'll do this for both gray scale and colour images.

img = plt.imread('ferriswheel.jpg')

fig = plt.figure(figsize=(10, 10))

# gray
img_gray = rgb2gray(img)  # convert to grayscale using function above
img_gray = 255 - img_gray
fig.add_subplot(1, 2, 1)
plt.imshow(img_gray, cmap='gray')
plt.title('Negative of Gray image')

# colour
img = 255 - img
fig.add_subplot(1, 2, 2)
plt.imshow(img)
plt.title('Negative of RGB image')
img = imageio.imread('ferriswheel.jpg')

fig = plt.figure(figsize=(10, 10))

# gray
img_gray = rgb2gray(img)  # convert to grayscale using function above
img_gray = 255 - img_gray
fig.add_subplot(1, 2, 1)
plt.imshow(img_gray, cmap='gray')
plt.title('Negative of Gray image')

# colour
img = 255 - img
fig.add_subplot(1, 2, 2)
plt.imshow(img)
plt.title('Negative of RGB image')
img = np.array(Image.open('ferriswheel.jpg'))

fig = plt.figure(figsize=(10, 10))

# gray
img_gray = rgb2gray(img)  # convert to grayscale using function above
img_gray = 255 - img_gray
fig.add_subplot(1, 2, 1)
plt.imshow(img_gray, cmap='gray')
plt.title('Negative of Gray image')

# colour
img = 255 - img
fig.add_subplot(1, 2, 2)
plt.imshow(img)
plt.title('Negative of RGB image')

Image Image

Blending (Summing) Two Images

To blend two images we just add the arrays. However, we must keep in mind the RGB values are to be in the range 0 to 255, so we'll use a weighted sum. Also, the data type for the RGB values is uint8 so once we sum the two arrays we need to convert the entries to the right data type prior to calling imshow.

Let's blend 30% of waves.jpg with 70% of ferriswheel.jpg. First we have to make the images have the same shape (when using PIL we use the resize command).

img = plt.imread('ferriswheel.jpg')
img0 = plt.imread('waves.jpg')

# resize matrices so they have the same shape
h = min(img.shape[0],img0.shape[0])
w = min(img.shape[1],img0.shape[1])
img = img[-h:,-w:,:]  # crop image from bottom-right
img0 = img0[-h:,-w:,:] # crop image from bottom-right

print(img.dtype)
# uint8
img_blend = (img * 0.7 + img0 * 0.3).astype(np.uint8)   # Blending them in

plt.figure(figsize=(4, 4))
plt.imshow(img_blend)
img = imageio.imread('ferriswheel.jpg')
img0 = imageio.imread('waves.jpg')

# resize matrices so they have the same shape
h = min(img.shape[0],img0.shape[0])
w = min(img.shape[1],img0.shape[1])
img = img[-h:,-w:,:]  # crop image from bottom-right
img0 = img0[-h:,-w:,:] # crop image from bottom-right

print(img.dtype)
# uint8
img_blend = (img * 0.7 + img0 * 0.3).astype(np.uint8)   # Blending them in

plt.figure(figsize=(4, 4))
plt.imshow(img_blend)
img = np.array(Image.open('ferriswheel.jpg'))
img0 = np.array(Image.open('waves.jpg').resize(img.shape[1::-1])) # (1)
print(img.dtype)
# uint8
img_blend = (img * 0.7 + img0 * 0.3).astype(np.uint8)   # Blending them in

plt.figure(figsize=(10, 10))
plt.imshow(img_blend)
  1. PIL's resize takes 2 arguments (WIDTH, HEIGHT) and resizes the image to those dimensions. Notice the order of these arguments: width first, then height. This is why we have to reverse the order of the shape tuple.

Image Image

Rotating and Reflecting an Image

To rotate or reflect and image we work directly on the matrix (numpy.array) itself. There are a number of numpy.array functions that will come in handy.

Rotate:

Flip:

Let's start by rotating 90 degrees clockwise. Here we take k=-1.

img = plt.imread('ferriswheel.jpg')

img_rot = np.rot90(img,-1)

# draw image
fig = plt.figure(figsize=(8,8))

fig.add_subplot(1,2,1)
plt.imshow(img)
plt.title('original')

fig.add_subplot(1,2,2)
plt.imshow(img_rot)
plt.title('rotated')
img = imageio.imread('ferriswheel.jpg')

img_rot = np.rot90(img,-1)

# draw image
fig = plt.figure(figsize=(8,8))

fig.add_subplot(1,2,1)
plt.imshow(img)
plt.title('original')

fig.add_subplot(1,2,2)
plt.imshow(img_rot)
plt.title('rotated')
img = np.array(Image.open('ferriswheel.jpg'))

img_rot = np.rot90(img,-1)

# draw image
fig = plt.figure(figsize=(8,8))

fig.add_subplot(1,2,1)
plt.imshow(img)
plt.title('original')

fig.add_subplot(1,2,2)
plt.imshow(img_rot)
plt.title('rotated')

Image Image

To flip vertically (i.e. across a horizontal midline) we use axis=0.

img = plt.imread('ferriswheel.jpg')

img_flip = np.flip(img,axis = 0) # same as np.flipud(img)

# draw image
fig = plt.figure(figsize=(8,8))

fig.add_subplot(1,2,1)
plt.imshow(img)
plt.title('original')

fig.add_subplot(1,2,2)
plt.imshow(img_flip)
plt.title('flipped')
img = imageio.imread('ferriswheel.jpg')

img_flip = np.flip(img,axis = 0) # same as np.flipud(img)

# draw image
fig = plt.figure(figsize=(8,8))

fig.add_subplot(1,2,1)
plt.imshow(img)
plt.title('original')

fig.add_subplot(1,2,2)
plt.imshow(img_flip)
plt.title('flipped')
img = np.array(Image.open('ferriswheel.jpg'))

img_flip = np.flip(img,axis = 0) # same as np.flipud(img)

# draw image
fig = plt.figure(figsize=(8,8))

fig.add_subplot(1,2,1)
plt.imshow(img)
plt.title('original')

fig.add_subplot(1,2,2)
plt.imshow(img_flip)
plt.title('flipped')

Image Image

To flip horizontally (i.e. across a vertical midline) we use axis=1.

img = plt.imread('ferriswheel.jpg')

img_flip = np.flip(img,axis = 1) # same as np.fliplr(img)

# draw image
fig = plt.figure(figsize=(8,8))

fig.add_subplot(1,2,1)
plt.imshow(img)
plt.title('original')

fig.add_subplot(1,2,2)
plt.imshow(img_flip)
plt.title('flipped')
img = imageio.imread('ferriswheel.jpg')

img_flip = np.flip(img,axis = 1) # same as np.fliplr(img)

# draw image
fig = plt.figure(figsize=(8,8))

fig.add_subplot(1,2,1)
plt.imshow(img)
plt.title('original')

fig.add_subplot(1,2,2)
plt.imshow(img_flip)
plt.title('flipped')
img = np.array(Image.open('ferriswheel.jpg'))

img_flip = np.flip(img,axis = 1) # same as np.fliplr(img)

# draw image
fig = plt.figure(figsize=(8,8))

fig.add_subplot(1,2,1)
plt.imshow(img)
plt.title('original')

fig.add_subplot(1,2,2)
plt.imshow(img_flip)
plt.title('flipped')

Image Image

Just for fun, let's flip the colour channels. To do this take axis=2.

img = plt.imread('ferriswheel.jpg')

img_flip = np.flip(img,axis = 2) # same as permuting colour channels (2,1,0)

# draw image
fig = plt.figure(figsize=(8,8))

fig.add_subplot(1,2,1)
plt.imshow(img)
plt.title('original')

fig.add_subplot(1,2,2)
plt.imshow(img_flip)
plt.title('flipped')
img = imageio.imread('ferriswheel.jpg')

img_flip = np.flip(img,axis = 2) # same as np.fliplr(img)

# draw image
fig = plt.figure(figsize=(8,8))

fig.add_subplot(1,2,1)
plt.imshow(img)
plt.title('original')

fig.add_subplot(1,2,2)
plt.imshow(img_flip)
plt.title('flipped')
img = np.array(Image.open('ferriswheel.jpg'))

img_flip = np.flip(img,axis = 2) # same as np.fliplr(img)

# draw image
fig = plt.figure(figsize=(8,8))

fig.add_subplot(1,2,1)
plt.imshow(img)
plt.title('original')

fig.add_subplot(1,2,2)
plt.imshow(img_flip)
plt.title('colour flipped')

Image Image

Transpose of an Image

Since an image is essentially a matrix (of possibly multiple channels) we can take its transpose using the general transpose function on NumPy arrays: numpy.transpose.

img = plt.imread('ferriswheel.jpg')

img_trans = np.transpose(img, axes=(1,0,2))

plt.imshow(img_trans)
img = imageio.imread('ferriswheel.jpg')

img_trans = np.transpose(img, axes=(1,0,2))

plt.imshow(img_trans)
img = np.array(Image.open('ferriswheel.jpg'))

img_trans = np.transpose(img, axes=(1,0,2))

plt.imshow(img_trans)

Image Image

Padding with Black Space

We can add padding to an array with the numpy.pad function. You can read more about here. We'll add a padding of 100 pixels to each side of the image and colour them black (rgb = (0,0,0)).

img = plt.imread('ferriswheel.jpg')

img0 = np.pad(img,((100,100),(100,100),(0,0)), mode='constant', constant_values=0)
plt.imshow(img0)
img = imageio.imread('ferriswheel.jpg')

img0 = np.pad(img,((100,100),(100,100),(0,0)), mode='constant', constant_values=0)
plt.imshow(img0)
img = np.array(Image.open('ferriswheel.jpg'))

img0 = np.pad(img,((100,100),(100,100),(0,0)), mode='constant', constant_values=0)
plt.imshow(img0)

Image Image

Colour Reduction

Colour reduction can be achieved by discretizing the number of colours in the image. Instead of 256 colours per channel, we can reduce this to 4 colours per channel but reducing each rgb value to a multiple of 64. Below we create to colour reduced images: img_0 has 4 values per colour channel: {0, 64, 128, 192}, and img_1 has 2 values per colour channel: {0,128}.

img = plt.imread('ferriswheel.jpg')

# Making Pixel values discrete by first division by // which gives int and then multiply by the same factor
img_0 = (img // 64) * 64
img_1 = (img // 128) * 128
img_all = np.concatenate((img, img_0, img_1), axis=1)
plt.figure(figsize=(8, 8))
plt.imshow(img_all)
img = imageio.imread('ferriswheel.jpg')

# Making Pixel values discrete by first division by // which gives int and then multiply by the same factor
img_0 = (img // 64) * 64
img_1 = (img // 128) * 128
img_all = np.concatenate((img, img_0, img_1), axis=1)
plt.figure(figsize=(8, 8))
plt.imshow(img_all)
img = np.array(Image.open('ferriswheel.jpg'))

# Making Pixel values discrete by first division by // which gives int and then multiply by the same factor
img_0 = (img // 64) * 64
img_1 = (img // 128) * 128
img_all = np.concatenate((img, img_0, img_1), axis=1)
plt.figure(figsize=(8, 8))
plt.imshow(img_all)

Image Image

Binarize an Image

A binary image is one where each rgb value is either 0 or 255. Below we turn an image into a binary image. We choose two different thresholds for a colour channel. If the value is larger than this threshold it gets bumped up to 255, if it is below this threshold it gets dropped to 0.

img = plt.imread('ferriswheel.jpg')

img_64 = (img > 64) * 255
img_128 = (img > 128) * 255
fig = plt.figure(figsize=(8, 8))
img_all = np.concatenate((img, img_64, img_128), axis=1)
plt.imshow(img_all)
img = imageio.imread('ferriswheel.jpg')

img_64 = (img > 64) * 255
img_128 = (img > 128) * 255
fig = plt.figure(figsize=(8, 8))
img_all = np.concatenate((img, img_64, img_128), axis=1)
plt.imshow(img_all)
img = np.array(Image.open('ferriswheel.jpg'))

img_64 = (img > 64) * 255
img_128 = (img > 128) * 255
fig = plt.figure(figsize=(8, 8))
img_all = np.concatenate((img, img_64, img_128), axis=1)
plt.imshow(img_all)

Image Image