What is image classification?

Image classification means assigning labels to an image based on a predefined set of classes.

Practically, this means that our task is to analyze an input image and return a label that categorizes the image. The label is always from a predefined set of possible categories.
For example, let’s assume that our set of possible categories includes:
categories = {tom, jerry}

Our classification system could assign multiple labels to the image via probabilities, such as:

jerry: 95%; tom: 5% for the image on the left side.

jerry: 10%; tom: 90% for the image on the right side.

What are the challenges in image classification?

Below are the challenges that we face while doing image classification :

  1. Semantic gap: We can clearly see the difference between an image that contains a dog and an image that contains a cat. However, a computer sees a big matrix of numbers. The difference between how we perceive an image and how the computer sees the image(a matrix of numbers) is called the semantic gap.
Computer vision - how humans see and how computers see
Computer vision – difference between how humans see and computers see.
Viewpoint Variation - a car captured from different angles
Viewpoint variation – a car captured from different angles

2. Viewpoint variation: Based on how the object is photographed and captured, it can be oriented in multiple dimensions.

Scale variation is a challenge of Image Classification
Scale Variation – Same pack of fries captured from different distances

3. Scale variation: Have you ever ordered a small, medium,or large pack of fries at Mc Donald. They are all the same – a pack of fries, but of different sizes. Furthermore, the same pack of fries will look dramatically different when it is photographed up close versus when it is captured from farther away. The image classification methods must be tolerable to these types of scale variations.

Deformation is one of the challenges of Image Classification

4. Deformation : For those of you familiar with the television series Popeye, we can see Olive Oyl in the image. As we all know that Olive is elastic, stretchable, and capable of contorting her body in many different poses. We can look at these images of Olive as a type of object deformation – all images contain the Olive character; however, they are all dramatically different from each other.

  1. Occlusions: In the image on the left side, we have to have a picture of a cat. And in the image on the right side, we have a photo of the same cat. But how the cat is resting underneath the covers, occluded from our view. The cat is still clearly in both images – she’s just more visible in one image than the other. Image classification algorithms should still be able to detect and label the presence of the cat in both images.
    Image Classification Challenge - Occlusion
    Occlusion – The same object is more visible in one image
  2. Illumination: The image on the left side was photographed with standard overhead lighting while the image on the right side was captured with little lighting. We are still examining the same cupcake – but based on the lighting conditions, the cupcake looks dramatically different.
Challenge of Image Classification - Illumination
Illumination – objects look different in different lighting

7. Background clutter: Ever played a game – to spot the bird? If so, then you know the goal of the game is to find the decided beautiful bird before the others. However, these games are more than just entertaining children’s game – they are also the perfect representation of background clutter. You can clearly see the Himalayan black-lored tit in the image on the left side. But the image on the right side is very noisy and has a lot of background clutter. We are interested in one particular object in the image; however, due to all the “noise”, it’s not easy to spot the bird. If it’s not easy for us to do, imagine how hard it is for a computer with no semantic understanding to spot it.

Image Classification challenge - Background clutter
Background clutter – The background noise makes it difficult to search the bird on the right side image
  1. Intra-class variation: The canonical example of intra-class variation in computer vision is displaying the diversification of dog breeds. We have different breeds of dogs some used for military, some as pets, some as guards – a dog is still a dog. Our image classification algorithms must be able to categorize these variations correctly.
    Intra-class variation is one of the challenges on Image Classification
    Intra-class variation – Different breeds of dogs