To build a image classification neural network, you use a convolutional neural networks. Convolutional neural networks are the type of neural networks that you use when dealing with image data, audio data, or just generally to find patterns in data.
In this tutorial we use a keras dataset with some images. There are 10 possible classifications in this dataset. Images can be a plane, they can be a truck, horse, etc.
c
from the cifar10 dataset. https://paperswithcode.com/dataset/cifar-10
60,000 images. 10 classes. 6,000 images each class.
programming
these are the libraries we are going to need.
Lets get the data in here too.
we get the training data, testing data and then we scale them down. The pixel values down from 0-255 to 0-1. this will be good to use for the activation values of the input layer.
Now if we print the labels, it is only from 0-9.
you see, the labels are in number. So idk maybe 6 corresponds to âdogâ and 9 is âshipâ idk idk.
So what we do is we are going to make a list. The list will be the string values of these number labels. I donât need a lookup table, I just use a regular list because I can index the list instead.
this is actually quite a challenging problem. A deer and a horse, a truck and a car, we can differentiate these things easily from the specific qualities like size, shape, texture. We also ignore the rotation and orientation of an object mostly. We have to understand that an AI doesnât know what these things like antlers are, and admitedly the images are in quite low resolution. Around 20-50x20-50 px.
Lets look at the images now
yeah its not too easy to recognise these images as a human.
The next step we are inclined to do is to scale down the dataset to have a smaller ammount of images. Will make the program run-time smoother
before the training images & training labels were 50,000. now they are 20,000. the testingimages and testinglabels use to be 10,000. now they are 40,000.
a 60% reduction overall
not needed, but we want to save computer resources.
Now lets build the neural network
sequential yknow basic framework barebones structure.
The input layer is convoluted Conv2D infact. There are 32 neurons and there is a 3x3 as the convolution matrix filter. The activation function is ReLU and at last the input shape is 32323. resolution is 32px by 32px with 3 color channels.
And also, everytime you have a convoluted layer, you also have a max layer right after it which simplifies the output of the previous layer. This could be considered a hidden layer since its not the input layer but it is necessary to have right after the input layer so idk what you would call that.
The max layer has a 2x2 filter
after that we have another convolutional layer, 64 neurons this time, same 3x3 filter and with a relu activation function. Alongside it we also have a max pooling with 2x2 filter
after that we have another convolutional later, 64 neurons and 3x3 filter
now after that we have a flattened layer. This will flatten the convolutional layer before it into a 1 dimensional array
and finally 2 dense layers. One is the output with the softmax activation
and so this is the entire network:
the convolutional layers filters for features in an image. Like a horse has long legs, a cat has pointy ears, a plane has wings, it looks for all these features, max pooling will then reduce the image to the essential information, then toss to another convolutional layer, reduce, then flatten, dense layers for complexity. The output layer scales the results down to probability of classifications.
At last, compile the model and fit the training data in it
validation data here to help with the loss function is the testingimages and testing labels
write some evaluation test on the model before we save it
and now we save it:
ok run the program now.
the loss doesnât really matter a lot. The loss is just a metric for the computer. What we really are interested in is the accuracy value.
Its at a 64%. not the greatest, but still pretty impressive given that a guess at random would be only 10% accurate and the AI has hardly an idea about to concept of antlers.
Ok we made the model. Lets load it so we donât have to make it everytime we run the script.
Using real images
We are going to grab some images off of yandex, scale them down ourselves manually and then feed them into the neural network to test it.
I use photoshop. Crop it down, then I go to properties and click Image size and change it to 32x32
can you guess what the images are?
I put them into my python directory
ugh so disorganized.
To load the images into the script we use opencv and numpy.
a problem here is that the images we load in with CV by default use the BGR color format, but up til now we have been using the RGB color format. To change this, we need to convert the color schemes.
we can show using matplotlib
now lets predict the classification of this image
we first convert the model into a numpy array divided by 255 cuz remember we scaled our testing data input to be between 0 and 1. we do model.predict with that new numpy image array.
Then from the softmax output, we want to grab the maximum value. The maximum value will be in a index format like 0-9 cuz remember thats what it does always. We will then need to convert that input to a class string from our classnames list
argmax will get us the index of the maximum value
and that should be it.
For horse.jpg: it says:
for horse2,jpg: it says:
for plane.jpg: it says:
for car.jpg: it says:
for deer.jpg: it says:
40% winrate. Eh, alright I guess. The images I fed were not that great.