To build a image classification neural network, you use a convolutional neural networks. Convolutional neural networks are the type of neural networks that you use when dealing with image data, audio data, or just generally to find patterns in data.

In this tutorial we use a keras dataset with some images. There are 10 possible classifications in this dataset. Images can be a plane, they can be a truck, horse, etc.

c

from the cifar10 dataset. https://paperswithcode.com/dataset/cifar-10

60,000 images. 10 classes. 6,000 images each class.

programming

these are the libraries we are going to need.

Lets get the data in here too.

we get the training data, testing data and then we scale them down. The pixel values down from 0-255 to 0-1. this will be good to use for the activation values of the input layer.

Now if we print the labels, it is only from 0-9.

you see, the labels are in number. So idk maybe 6 corresponds to ‘dog’ and 9 is ‘ship’ idk idk.

So what we do is we are going to make a list. The list will be the string values of these number labels. I don’t need a lookup table, I just use a regular list because I can index the list instead.

this is actually quite a challenging problem. A deer and a horse, a truck and a car, we can differentiate these things easily from the specific qualities like size, shape, texture. We also ignore the rotation and orientation of an object mostly. We have to understand that an AI doesn’t know what these things like antlers are, and admitedly the images are in quite low resolution. Around 20-50x20-50 px.

Lets look at the images now

yeah its not too easy to recognise these images as a human.

The next step we are inclined to do is to scale down the dataset to have a smaller ammount of images. Will make the program run-time smoother

before the training images & training labels were 50,000. now they are 20,000. the testingimages and testinglabels use to be 10,000. now they are 40,000.

a 60% reduction overall

not needed, but we want to save computer resources.

Now lets build the neural network

sequential yknow basic framework barebones structure.

The input layer is convoluted Conv2D infact. There are 32 neurons and there is a 3x3 as the convolution matrix filter. The activation function is ReLU and at last the input shape is 32323. resolution is 32px by 32px with 3 color channels.

And also, everytime you have a convoluted layer, you also have a max layer right after it which simplifies the output of the previous layer. This could be considered a hidden layer since its not the input layer but it is necessary to have right after the input layer so idk what you would call that.

The max layer has a 2x2 filter

after that we have another convolutional layer, 64 neurons this time, same 3x3 filter and with a relu activation function. Alongside it we also have a max pooling with 2x2 filter

after that we have another convolutional later, 64 neurons and 3x3 filter

now after that we have a flattened layer. This will flatten the convolutional layer before it into a 1 dimensional array

and finally 2 dense layers. One is the output with the softmax activation

and so this is the entire network:

the convolutional layers filters for features in an image. Like a horse has long legs, a cat has pointy ears, a plane has wings, it looks for all these features, max pooling will then reduce the image to the essential information, then toss to another convolutional layer, reduce, then flatten, dense layers for complexity. The output layer scales the results down to probability of classifications.

At last, compile the model and fit the training data in it

validation data here to help with the loss function is the testingimages and testing labels

write some evaluation test on the model before we save it

and now we save it:

ok run the program now.

the loss doesn’t really matter a lot. The loss is just a metric for the computer. What we really are interested in is the accuracy value.

Its at a 64%. not the greatest, but still pretty impressive given that a guess at random would be only 10% accurate and the AI has hardly an idea about to concept of antlers.

Ok we made the model. Lets load it so we don’t have to make it everytime we run the script.

Using real images

We are going to grab some images off of yandex, scale them down ourselves manually and then feed them into the neural network to test it.

I use photoshop. Crop it down, then I go to properties and click Image size and change it to 32x32

can you guess what the images are?

I put them into my python directory

ugh so disorganized.

To load the images into the script we use opencv and numpy.

a problem here is that the images we load in with CV by default use the BGR color format, but up til now we have been using the RGB color format. To change this, we need to convert the color schemes.

we can show using matplotlib

now lets predict the classification of this image

we first convert the model into a numpy array divided by 255 cuz remember we scaled our testing data input to be between 0 and 1. we do model.predict with that new numpy image array.

Then from the softmax output, we want to grab the maximum value. The maximum value will be in a index format like 0-9 cuz remember thats what it does always. We will then need to convert that input to a class string from our classnames list

argmax will get us the index of the maximum value

and that should be it.

For horse.jpg: it says:

for horse2,jpg: it says:

for plane.jpg: it says:

for car.jpg: it says:

for deer.jpg: it says:

40% winrate. Eh, alright I guess. The images I fed were not that great.