Image Processing Basics with NumPy
Getting Started with Images in Python
An image consists of a rectangular array of pixels where each one is assigned a colour. For example, here is an image with 9 pixels, each pixel is assigned a specific colour.
Pixels are indexed from left-to-right, top-to-bottom, so the top-left pixel has index \((0,0)\), the second pixel in the top row (light blue pixel) has index \((0,1)\), and so on. Pixel \((2,2)\) is the colour green.
We can represent this image as \(3\times 3\) matrix where each entry is a colour. Colours can be represented in many ways: HEX, RGB, HSL, CYMK. We'll represent a colour by a 3D vector of RGB values. Each value is from 0 to 255. The components of this vector are know as the colour channels, and the higher the value the more colour that comes from this channel. So \([0,0,255]\) is the colour blue. Two special colours to note: [255,255,255] is white, and [0,0,0] is black.
For example, our image above is represented by the 3D array:
np.array([
[ [234, 12, 16], [12, 168, 234], [255, 179, 0] ],
[ [14, 242, 6], [255, 1, 240], [251, 255, 1] ],
[ [1, 193, 255], [179, 9, 229], [1, 255, 93] ]
])
To select a particular value in this array we can use its index. The indices \((i,j,c)\) are: row index i, column index j, and colour channel c. This means the value at \((1,1,0)\) is the amount of red in that centre pixels colour - which is \(255\). Similarly, the value at \((1,1,1)\) is the amount of green in that centre pixels colour - which is \(1\) (so basically no green is in that pixel).
At first, it may be difficult to wrap your head around a 3D array. After all, visualizing in 3D can be challenging. However, it is best to view an image as a matrix of pixels, where the third dimension represents colour. Therefore, using RGB vectors to represent colour, one could view an image as 3 matrices, each one corresponding to an RGB channel.
It may also help to view them as stacked in the 3rd dimension.
Since we can represent images as matrices, this allows us to use all the tools from linear algebra that we have developed. Let's get started...
The photo we will be using is this ferris wheel.
First we load the main libraries we will need.
We will use np.arrays
to represent images, and matplotlib
to display images.
We can do this with no additional packages required. However, two common packages for working with images are imageio
and PIL
(Python Imaging Library). These libraries provide additional functionality, but our goal is to work directly on the NumPy array, writing any image manipulation function ourselves, so we won't go into any of the more powerful features. Therefore, in what follows we will provide the code to work with images using any of the three methods: the basic tools which we'll just refer to as matplotlib
, the imageio
package, or the pil
package. Note, these titles are a little misleading because matplotlib
and numpy
are used in the background in all three cases.
Opening, Displaying and Saving Images in Python
To open an image into python (as a 3D array), and then display it to the screen.
To save an image to a file.
Details of an Image
img = plt.imread('ferriswheel.jpg')
print('# of dims: ',img.ndim) # dimension of an image
print('Img shape: ',img.shape) # shape of an image
print('Dtype: ',img.dtype) # type of data stored in image
print('type: ', type(img)) # datatype of image object
print(img[20, 20]) # pixel value at [R, G, B]
print(img[:, :, 2].min()) # min pixel value at channel B
img = imageio.imread('ferriswheel.jpg')
print('# of dims: ',img.ndim) # dimension of an image
print('Img shape: ',img.shape) # shape of an image
print('Dtype: ',img.dtype) # type of data stored in image
print('type: ', type(img)) # datatype of image object
print(img[20, 20]) # pixel value at [R, G, B]
print(img[:, :, 2].min()) # min pixel value at channel B
img = np.array(Image.open('ferriswheel.jpg'))
print('# of dims: ',img.ndim) # dimension of an image
print('Img shape: ',img.shape) # shape of an image
print('Dtype: ',img.dtype) # type of data stored in image
print('type: ', type(img)) # datatype of image object
print(img[20, 20]) # pixel value at [R, G, B]
print(img[:, :, 2].min()) # min pixel value at channel B
Converting an Image to Gray Scale
To convert an image from colour to gray scale, we just sum the rgb components and rescale out of \(255^3\). This returns a 2D array (matrix) of gray scale values.
Above we compressed the rgb vector into a scalar essentially by equally weighting the rgb channels:
\(\text{gray_value} = \overrightarrow{rgb} \cdot [\frac{1}{3},\frac{1}{3},\frac{1}{3}]\)
However, it is common to construct the gray scale image by weighting the channels as \(g>r>b\):
\(\text{gray_value} = \overrightarrow{rgb} \cdot [0.2989,0.5870,0.1140]\)
Let's make a function called rgb2gray
to do this, then convert the oolour image to gray scale.
def rgb2gray(rgb):
r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
return gray
img = plt.imread('ferriswheel.jpg')
img_gray = rgb2gray(img)
plt.figure(figsize=(4,4))
plt.imshow(img_gray, cmap="gray")
Separating an Image into Color Channels
To separate an image into its colour channels we can zero out the values in the other two channels. This keeps the image as an array of RGB values.
img = plt.imread('ferriswheel.jpg')
img_R, img_G, img_B = img.copy(), img.copy(), img.copy()
img_R[:, :, (1, 2)] = 0 # zero out GB channels
img_G[:, :, (0, 2)] = 0 # zero out RB channels
img_B[:, :, (0, 1)] = 0 # zero out RG channels
img_rgb = np.concatenate((img_R,img_G,img_B), axis=1)
plt.figure(figsize=(8, 8))
plt.imshow(img_rgb)
img = imageio.imread('ferriswheel.jpg')
img_R, img_G, img_B = img.copy(), img.copy(), img.copy()
img_R[:, :, (1, 2)] = 0 # zero out GB channels
img_G[:, :, (0, 2)] = 0 # zero out RB channels
img_B[:, :, (0, 1)] = 0 # zero out RG channels
img_rgb = np.concatenate((img_R,img_G,img_B), axis=1)
plt.figure(figsize=(8, 8))
plt.imshow(img_rgb)
img = np.array(Image.open('ferriswheel.jpg'))
img_R, img_G, img_B = img.copy(), img.copy(), img.copy()
img_R[:, :, (1, 2)] = 0 # zero out GB channels
img_G[:, :, (0, 2)] = 0 # zero out RB channels
img_B[:, :, (0, 1)] = 0 # zero out RG channels
img_rgb = np.concatenate((img_R,img_G,img_B), axis=1)
plt.figure(figsize=(8, 8))
plt.imshow(img_rgb)
Alternatively, we can split the 3D array into three 2D arrays, one for each colour channel.
Note, this is different from above where we kept the entries as RGB vectors, where just two entries were 0. This means python will still treat it as a colour image when plotting.
Below, we extract each colour channel as a matrix. This means when plotting python will treat each colour channel matrix as grayscale. We print out the shapes of each matrix for emphasis.
We can then restack these three scalar matrices into a colour matrix.
If we used this method to extract colour channels, then we could restack with the zeros matrix to create the colour image with only one, or two colour channels.
Let's crete an image with only the red and green channels.
img = plt.imread('ferriswheel.jpg')
zero = np.zeros(img.shape[0:2]).astype(np.uint8) # zeros matrix
img_rch = img[:,:,0] # grab just the red channel matrix
img_gch = img[:,:,1] # grab just the green channel matrix
img_bch = img[:,:,2] # grab just the blue channel matrix
# stack only the red and green channels, zero out the blue channel
img_stacked = np.stack([img_rch, img_gch, zero], axis = 2)
plt.imshow(img_stacked)
Now for a little fun. Let's swap the red and green channels.
img = plt.imread('ferriswheel.jpg')
img_rch = img[:,:,0] # grab just the red channel matrix
img_gch = img[:,:,1] # grab just the green channel matrix
img_bch = img[:,:,2] # grab just the blue channel matrix
img_stacked = np.stack([img_gch, img_rch, img_bch], axis = 2)
plt.imshow(img_stacked)
Of course, we could have done this with array indexing alone. Here is a simple way to permute indices in an array.
Cropping an Image
Use array slicing to crop the image.
Negative of an Image
Since colour values are between 0 and 255, we can negate an image by taking colour value \(m\) to \(255-m\). We'll do this for both gray scale and colour images.
img = plt.imread('ferriswheel.jpg')
fig = plt.figure(figsize=(10, 10))
# gray
img_gray = rgb2gray(img) # convert to grayscale using function above
img_gray = 255 - img_gray
fig.add_subplot(1, 2, 1)
plt.imshow(img_gray, cmap='gray')
plt.title('Negative of Gray image')
# colour
img = 255 - img
fig.add_subplot(1, 2, 2)
plt.imshow(img)
plt.title('Negative of RGB image')
img = imageio.imread('ferriswheel.jpg')
fig = plt.figure(figsize=(10, 10))
# gray
img_gray = rgb2gray(img) # convert to grayscale using function above
img_gray = 255 - img_gray
fig.add_subplot(1, 2, 1)
plt.imshow(img_gray, cmap='gray')
plt.title('Negative of Gray image')
# colour
img = 255 - img
fig.add_subplot(1, 2, 2)
plt.imshow(img)
plt.title('Negative of RGB image')
img = np.array(Image.open('ferriswheel.jpg'))
fig = plt.figure(figsize=(10, 10))
# gray
img_gray = rgb2gray(img) # convert to grayscale using function above
img_gray = 255 - img_gray
fig.add_subplot(1, 2, 1)
plt.imshow(img_gray, cmap='gray')
plt.title('Negative of Gray image')
# colour
img = 255 - img
fig.add_subplot(1, 2, 2)
plt.imshow(img)
plt.title('Negative of RGB image')
Blending (Summing) Two Images
To blend two images we just add the arrays. However, we must keep in mind the RGB values are to be in the range 0 to 255, so we'll use a weighted sum. Also, the data type for the RGB values is uint8
so once we sum the two arrays we need to convert the entries to the right data type prior to calling imshow
.
Let's blend 30% of waves.jpg
with 70% of ferriswheel.jpg
. First we have to make the images have the same shape (when using PIL we use the resize
command).
img = plt.imread('ferriswheel.jpg')
img0 = plt.imread('waves.jpg')
# resize matrices so they have the same shape
h = min(img.shape[0],img0.shape[0])
w = min(img.shape[1],img0.shape[1])
img = img[-h:,-w:,:] # crop image from bottom-right
img0 = img0[-h:,-w:,:] # crop image from bottom-right
print(img.dtype)
# uint8
img_blend = (img * 0.7 + img0 * 0.3).astype(np.uint8) # Blending them in
plt.figure(figsize=(4, 4))
plt.imshow(img_blend)
img = imageio.imread('ferriswheel.jpg')
img0 = imageio.imread('waves.jpg')
# resize matrices so they have the same shape
h = min(img.shape[0],img0.shape[0])
w = min(img.shape[1],img0.shape[1])
img = img[-h:,-w:,:] # crop image from bottom-right
img0 = img0[-h:,-w:,:] # crop image from bottom-right
print(img.dtype)
# uint8
img_blend = (img * 0.7 + img0 * 0.3).astype(np.uint8) # Blending them in
plt.figure(figsize=(4, 4))
plt.imshow(img_blend)
img = np.array(Image.open('ferriswheel.jpg'))
img0 = np.array(Image.open('waves.jpg').resize(img.shape[1::-1])) # (1)
print(img.dtype)
# uint8
img_blend = (img * 0.7 + img0 * 0.3).astype(np.uint8) # Blending them in
plt.figure(figsize=(10, 10))
plt.imshow(img_blend)
- PIL's resize takes 2 arguments (WIDTH, HEIGHT) and resizes the image to those dimensions. Notice the order of these arguments: width first, then height. This is why we have to reverse the order of the shape tuple.
Rotating and Reflecting an Image
To rotate or reflect and image we work directly on the matrix (numpy.array
) itself. There are a number of numpy.array
functions that will come in handy.
Rotate:
numpy.rot90(arr, k=1, axes=(0, 1))
: Rotate an arrayarr
by 90*k degrees counterclockwise in the plane specified byaxes
.
Flip:
numpy.flip(arr, axis = 0)
: reverse the order of elements in an array along the given axis.numpy.fliplr(arr)
: Reverse the order of elements along axis 1 (left/right).numpy.flipud(arr)
: Reverse the order of elements along axis 0 (up/down).
Let's start by rotating 90 degrees clockwise. Here we take k=-1
.
To flip vertically (i.e. across a horizontal midline) we use axis=0
.
To flip horizontally (i.e. across a vertical midline) we use axis=1
.
Just for fun, let's flip the colour channels. To do this take axis=2
.
Transpose of an Image
Since an image is essentially a matrix (of possibly multiple channels) we can take its transpose using the general transpose function on NumPy arrays: numpy.transpose
.
numpy.transpose(arr, axes=(1, 0, 2))
: Reverse or permute the axes of an array; returns the modified array.
Padding with Black Space
We can add padding to an array with the numpy.pad
function. You can read more about here. We'll add a padding of 100 pixels to each side of the image and colour them black (rgb = (0,0,0)).
Colour Reduction
Colour reduction can be achieved by discretizing the number of colours in the image. Instead of 256 colours per channel, we can reduce this to 4 colours per channel but reducing each rgb value to a multiple of 64. Below we create to colour reduced images: img_0
has 4 values per colour channel: {0, 64, 128, 192}, and img_1
has 2 values per colour channel: {0,128}.
img = imageio.imread('ferriswheel.jpg')
# Making Pixel values discrete by first division by // which gives int and then multiply by the same factor
img_0 = (img // 64) * 64
img_1 = (img // 128) * 128
img_all = np.concatenate((img, img_0, img_1), axis=1)
plt.figure(figsize=(8, 8))
plt.imshow(img_all)
img = np.array(Image.open('ferriswheel.jpg'))
# Making Pixel values discrete by first division by // which gives int and then multiply by the same factor
img_0 = (img // 64) * 64
img_1 = (img // 128) * 128
img_all = np.concatenate((img, img_0, img_1), axis=1)
plt.figure(figsize=(8, 8))
plt.imshow(img_all)
Binarize an Image
A binary image is one where each rgb value is either 0 or 255. Below we turn an image into a binary image. We choose two different thresholds for a colour channel. If the value is larger than this threshold it gets bumped up to 255, if it is below this threshold it gets dropped to 0.