AI Starter — Train and test your first neural network classifier in Keras from scratch

Pallawi
17 min readApr 11, 2019

--

Hello there,

How are you doing? Welcome to the AI Starter series part -2. I hope you have read part 1 of this series where I have explained the basics of machine learning, deep learning framework, and Keras code syntax.

In this blog, we will learn to build a multi-class classifier model using Dense layers(fully connected layer). We will build a model to correctly classify a test image into any of these three classes, panda, dog or cat.

At the top of the above image, you see a cat. This is a test image which goes into the classification model (which we will build in this article). The model has to predict the probability of the image to be a cat image. The test image could any among three classes pandas, Dogs or Cats.

Using convolution layers to build the model will be more efficient than using a fully connected layer in all aspects of memory utilization, speed, accuracy. But when there were no GPU’s but CPU’s the computation capabilities were very much limited and performing convolution was costly and non-scalable. So it all started with the fully connected layer where the cost of computation was comparatively low with limited efficiency.

When I talk about efficiency I mean to say, with fully connected layers the number of classes that a model can be trained on was very much limited. The number and size of the training images were very small. Training used to take a long time (Several days and months).

Now, You can ask me a valid question then why are we using fully connected layers today. The answer is I want you to experience the difference between performance, data and computation time when you use the fully connected layer to build a model.

In this part, we will create a multiclass classification model using a fully connected layer and in the next article — part 3 of AI starter series, we will create a model using convolution layer.

It is a very well known proverb. “Experience is the best teacher”. So let us begin the journey to experience by doing things on your own.

Install necessary packages

  1. Install Python and python packages-

We are going to write the code in python so I would suggest you to install Python 3.6. You can follow the instructions to install python. You will need Numpy and Pillow, scikit learn, OpenCV, matplotlib, imutils library.

pip install numpy
pip install Pillow
pip install scikit-learn
pip install opencv-python
pip install imutils

To install matplotlib. Follow this tutorial.

2. Install Keras -

sudo pip install keras

After installing Keras a folder with name “.keras” will be created in your machine, where you can find the Keras configuration file. The path to look for Keras configuration file is $home/.keras/keras.json. Make sure your backend is tensorflow as we will be using tensorflow as the backend in all the tutorials.

I am using tensorflow as the backend. Theano or CNTK can also be used as backends. This value of backend effects how the images are read and loaded. So make sure to know which backend you are using. You would like to read about “image_data_format” from the following link https://keras.io/backend/.

you can also use the below code to check the backend you are using.

import keras
from keras import backend as K
backend_keras = keras.backend.backend()
print ("Keras is using",backend_keras, "as the backend.")

Project structure

|------AI_STARTER_CLASSIIFER
| |------data
| | |------animals
| | | | |------|dogs
| | | | |------|cats
| | | | |------|pands
| |------code
| | |------train.py
| | |------predict.py
| |------test_images
| | |------dogs.jpg
| | |------cats.jpg
| | |------pandas.jpg
| |------output
| | |------simple_multiclass_classifcation_lb.pickle
| | |------simple_multiclass_classifcation_model.model | | |------training_performance.png

You can download the whole project from the below link.

https://drive.google.com/open?id=1EZr_gn7g7lK8EE66e5Qj4SGYiTyBkILX

The data folder has three folders dogs, cats, and pandas. Each folder has 1000 images. We will split these images into train and test later in the code.

In the code folder, you can find two code files train and predict. The file train.py has the model and we use this code file to train and evaluate our model. The file predict.py will use the learning to predict results on test images.

The output folder has a pickle file which is a serialized label binarizer file. This file contains an object which contains class names. It accompanies a model file. The model file is a serialized Keras model file is generated after training and can be used in future inference scripts. The training performance file will have a performance plot of training/validation of the training process for every epoch.

Always remember to follow Keras 7 steps to build a Deep learning model.

1. Analyze the dataset
2. Prepare the dataset
3. Create the model
4. Compile the model
5. Fit the model
6. Evaluate the model
7. Summary

Build your first model

We already have the dataset with us for three classes dog, cats, pandas. The dataset is analyzed with care so that you can train and test on your CPU. You can add more images to it later if you wish to train the model on a GPU.

So let us start building our model. You can create a file with name train.py and start building the model.

Each part is explained with a block of code. The code is commented well. For a block of code line number and its explanation is with respect to that code block.

Step -1 Import all the packages

  • matplotlib: This is the go-to plotting package for Python. That said, it does have its nuances, and if you’re having trouble with it, refer to this blog post. On Line 3, we instruct matplotlib to use the “Agg” backend enabling us to save plots to disk.
  • sklearn: The scikit-learn library will help us with binarizing our labels, splitting data for training/testing, and generating a training report.
  • Keras: It is a deep learning framework. keras.models — There are two types of models in Keras: the Sequential model, and the functional model. The difference between sequential and functional model is that in the sequential model output of one layer can go only into the very next layer to it but in the functional model the output of any layer can follow any sequence of flow of data from one layer to another. keras.layers has many types of layers like Conv2D, MaxPooling2D, Activation, Dropout, Flatten, Dense. All of these layers have mathematical equations running in their backend to extract features, learn features, optimize learning parameters. One that we are using is the dense layer (fully connected layer). keras.optimizers provide us many optimizers like the one we are using in this tutorial SGD(Stochastic gradient descent).
  • imutils: pyimagesearch convenience functions. We’ll use the paths module to generate a list of image file paths for training.
  • numpy: NumPy is for numerical processing with Python. It is another go-to package.
  • cv2: This is OpenCV. Open Source Computer Vision Library.

…the remaining imports are built into your installation of Python!

Step -2 Load the image data from the disk

After you have imported all the dependencies, let us load the 3000 images into a numpy array data, of size (3000,3072), where rows = 3000 and columns = 3072(image_height*image_width*number of channels =3072)(32*32*3 = 3072) and the labels of all the 3000 images into a numpy array labels of size (3000,), where rows = 1 and columns = 3000.

Line 2- Give the path of the folder “animal” which has three sub-folders cats, dogs, pandas. Each folder has 1000 images.

Line 7 and 8 -Initialize two empty list data and label. These data list will later store all the 3000 images. The label list will store all the labels corresponding to 3000 images.

Line 11-Creates a list of 3000 image path. It creates a single list by of absolute path of images by concatenating the file name of each 3000 images with its respective folder name.

Line 14, 15, 16 -Counts the total number of available images. This gives an idea to decide on how many images out of available images would you want to use for training and testing.

Line 19-Very important line. It helps you to randomly shuffle the images name in the image path list. This helps you to train the model equally for all the classes and not to follow a particular sequence of training(dog images first, then cats and then pandas). Shuffle is a very important step.

Line 22-Loop over all the shuffled image path list. Helps to load image data and labels one by one.

Line 26-Read images one at a time using OpenCV function imread. The image height, width can vary but the channels will remain the same. The number of channels will always be 3. OpenCV stacks channels in Blue, green, red (BGR) format.

Line 31-We know that the images are of different shapes. Let us resize all the images to a uniform shape of height = 32 and width = 32 and channel =3. This will help us to fix an input size to our deep learning model. For resizing the image we use an OpenCV function resize. The total number of pixels in a single image would be 32x32x3 = 3072. You might wonder what can we learn from a 32x32x3 image. Yes, you are correct learning from such a small image is difficult. But using a bigger image will cost us for high computation power. But right now to train on CPU, test and understand the concepts we must keep going.

Line 36-Since the aim is to store the images into a list we must need to flatten the images. Here, flatten function converts a 32x32x3 image to a numpy array of shape (3072,).

Line 39- We will append each flattened array one by one to the list we had initialized at the beginning “data”.

Line 42,45-Line 42 extracts the class/label of an image from the filename of the corresponding image. If you open the dog's folder you will find the naming convention is “dogs_number.jpg”. Here the “dog” in the filename is the class(label/ground truth) of the image which helps us to know which class does the image belong to. In simple language which animal is there in the image. Line 45 appends the label into a list.

Line 48- Line 48 scales the value of pixel intensities between lower limit 0 and upper limit 255. This is a data pre-processing step. The image is an 8-bit image. The largest value of pixel intensity could be 255. So, we divide each number by 255 to normalize the data. Then we convert the list into a numpy array. The size of this numpy array would be (3000, 3072). In (3000, 3072) rows=3000 and columns = 3072, I think by now you have an idea that 3000 stands for the number of total images and 3072 is the total number of pixels in each image. This is how the data of 3000 RGB images are loaded in a numpy array.

Line 49-Converts the labels list to a numpy array of shape (3000,). Rows = 1, Columns = 3000.For example — Label[0][1] corresponds to the true value(class or label) of the image data data[0][0]. Label[0][23] corresponds to the true value of the image data data[23][0]. Label[0][100] corresponds to the true value of the image data data[100][0].

Step -3 Creating training and testing data split

Line 4,6 and 7-We know that the total number of images is 3000. We have also loaded the data of 3000 images into a numpy array. We now must decide how many images out of 3000 will be used for training and how many images we will use for testing. Scikit learn helps us do this with a function train_test_split(). We just need to input our numpy array data and labels of 3000 images and specify the percentage of images we want to use as testing data. In the code, we have defined that we have defined the percentage, test_size = 0.25 which means that we want 25 percent of the image to be treated as test images. You can see below the split and the image count for train and test images.

Number of training images = 2250 ,Number of training labels = 2250
Number of testing images = 750 , Number of testing labels = 750

Line 12, 13 and 14- Label binarization takes place in all of these lines of code. Before this line type of the labels is a string. Now we need to encode it to binary. So, to binarize the labels we use the Scikit learn label binarizer. This binarizer is present in the preprocessing module of Scikit learn. lb = preprocessing.LabelBinarizer(). One-hot encoding is performed on these labels making each label represented as a vector.

[1, 0, 0] # corresponds to cats[0, 1, 0] # corresponds to dogs[0, 0, 1] # corresponds to panda

A call to fit_transform finds all unique class labels in trainY and then transforms them into one-hot encoded labels.

A call to just .transform on testY performs just the one-hot encoding step — the unique set of possible class labels was already determined by the call to .fit_transform

Step — 4 Create your Keras model

Line 2- Type of the model that we are building is sequential which means, the output of layer 1 can only go into layer 2 of the model and the output of layer 2 can go only to layer 3. The output of layer 1 can never go into layer 3. No layer can be skipped.

Line 6 to 8-The next step is to define our neural network architecture using Keras. Here we will be using a network with one input layer, two hidden layers, and one output layer.

The input layer has 3072 nodes. The first hidden layer has 1024 nodes, the second hidden layer has 512 nodes followed by the output layer which has 3 nodes. In the hidden layers, the activation function is “ sigmoid”. It forces the output values from the hidden layers to 0 or 1. As we are performing a multiclass classification we can not use a sigmoid activation function in the output layer but softmax as softmax gives us class wise probability.

Line 11-It helps to print the summary of your model on the console. You can see how the data is flowing in your network. you can see the number of parameters learned during the training process.

Step — 5 Compile the model

Line 2- Here we are initializing the learning rate. Learning rate = 0.01. Learning rate is the rate with which the model should learn. The learning rate value is a small real value such as 0.1, 0.001 or 0.0001.The decision of how much our learning rate should depend on experimentation. Naive method for choosing learning rate is trying out a bunch of numbers and using the one that looks to work best, manually decreasing it over time when training doesn’t seem to improve the loss anymore. It tells how fast the weights must be learned.

Line 3-Here we define the number of epochs. Epoch is a unit. Here Epoch =75 means that model will be trained 75 times on every single training images. When every single image in a training dataset has at least undergone forward and backward propagation once then we say one epoch is completed.

Line 9 and, 10-Line 9 calls the Keras, stochastic gradient descent(SGD). SGD is an optimizer. It optimizes the model by reducing the loss calculated by the loss function (categorical cross entropy). Work of the loss function is to calculate the difference between the predicted and true values by the machine learning model which is getting trained. This difference is also called loss, the lesser it is the better it is. The behavior of loss helps the model to understand what must be done to optimize the model so that the loss can be reduced. We have used accuracy as the metrics here. The greater it is the better it is. Unlike the loss function, metrics do not play any role in optimization.

You can compile a network (model) as many times as you want. You need to compile the model if you wish to change the loss function, optimizer or matrices.

You need a compiled model to train (because training uses the loss function and the optimizer). But it’s not necessary to compile the model when testing the model on a new data.

“Categorical cross entropy” is the loss function used in the code. Work of the loss function is to calculate the difference between the predicted and expected values by the machine learning model which is getting trained. This difference is also called loss, the lesser it is the better it is. The behavior of loss helps the model to understand what must be done to optimize the model so that the loss can be reduced.

Cross-entropy is commonly used to quantify the difference between two probability distributions. Usually, the “true” distribution (the one that your machine learning algorithm is trying to match) is expressed in terms of a one-hot distribution.

For example, suppose for a specific training instance, the label is B (out of the possible labels A, B, and C). The one-hot distribution for this training instance is, therefore:

Pr(Class A)  Pr(Class B)  Pr(Class C)
0.0 1.0 0.0

You can interpret the above “true” distribution to mean that the training instance has 0% probability of being class A, 100% probability of being class B, and 0% probability of being class C.

Now, suppose your machine learning algorithm predicts the following probability distribution:

Pr(Class A)  Pr(Class B)  Pr(Class C)
0.228 0.619 0.153

How close is the predicted distribution to the true distribution? That is what the cross-entropy loss determines. Use this formula:

Cross-entropy

Where p(x) is the wanted probability, and q(x) the actual probability. The sum is over the three classes A, B, and C. In this case, the loss is 0.479 :

H = - (0.0*ln(0.228) + 1.0*ln(0.619) + 0.0*ln(0.153)) = 0.479

So that is how “wrong” or “far away” your prediction is from the true distribution.

Step -6 Train the model

Line 2- Fit is used to train the model for a given number of epochs.

The argument trainX is the numpy array of training data. The argument trainY is the numpy array of labels.

Similarly testX and testY are numpy arrays of validation numpy array for image data and its corresponding labels. A fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and, y data provided, before shuffling.

Batch size is either Integer or none which is also the number of image samples per gradient update. If unspecified will default to 32. Larger GPUs would be able to accommodate larger batch sizes. I recommend starting with 32 or 64 and going up from there.

Epoch is an Integer. Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided.

Step — 7 Evaluate the model

Line 3 and 11 -While training we predict the results using Keras predict function.

It’s important that we evaluate on our testing data so we can obtain an unbiased (or as close to unbiased as possible) representation of how well our model is performing with data it has never been trained on.

To visualize our model prediction during training we can use a combination of the .predict method of the model along with the classification_report from scikit-learn

Line 15 to 26-Helps us to plot the performance of the model in every epoch. It helps us to access the history that at what point of the training(Epoch) the loss, accuracy was decreasing or increasing. This performance is saved in an image “training_performance.png”.

In the above image, you can see the red line which indicates the change in the training loss every epoch. You can see the training loss has decreased with every epoch. The validation loss has stopped decreasing after epoch number 30. The training accuracy after 75 epoch is 62% and the validation accuracy is less than 60%. Analyzing the training performance will help us to train better.

Save your model so that you can use it next time to predict the results.

Test the model performance:

You have trained the network and achieved some accuracy. Now, you wish to use the model to predict the class of any test image. What do you do?

Create a python file with name predict.py

You will have to load three things first a test image (cat or dog or panda), trained model and the binarized label file. You can find the test images in the folder Test_folder.

Line 9,11,12 -Path of the test image, the trained model and the label binarized file is given.

Line 14 to 16-Test image data is converted from its original shape to a shape height =32, width =32, channels = 3

Line 19 to 23- The 32x32x3 image is converted to a numpy array of size (1,3072).

Line 27 and 28-The trained model and the label binarizer are loaded.

Line 32- Generates output predictions for the input test image.

Line 36–48- Helps you visualize the prediction result in the form of text and images.

When you run the predict.py code, the input is a cat image. The output would be.

This is the result of the prediction for a test image of a cat.

The accuracy of the above image 55.68%.So it is classified properly but with less accuracy.

Now I will show you a case of misclassifications which will convince you that we must work on our hyperparameters (size of the image, number of images, number of nodes, types of layers, learning rate, kind of loss function and optimizer)to increase the accuracy and avoid miss-classification.

Correct classification and Misclassification

This is a cat image but the model has misclassified it to a class dog.

There could have been multiple reasons for this. One very obvious reason could be the similarity between the dog and cats lower body physic except for their faces. So, the model has overfitted on the lower body of the dog as the training images of the cat mostly had faces of cats and not the lower body. So, images of the cat with its lower body is predicted as a dog. You can test it on other images.

But images which has a focused cat face. It is predicted as a cat.

Similarly, hyperparameters like the size of the image, number of images, number of nodes, types of layers, learning rate, kind of loss function and optimizer plays a very important role to decide the performance of the model.

Cat test image classified as cat .Correct classification.
Dog test image classified as Dog .Correct classification.
Panda test image classified as Panda .Correct classification.

Conclusion:

In this blog, we have learned to build our first deep learning model using Keras. Most importantly in this blog, we have trained and tested a multi-class classifier to classify dogs, cats, and pandas from scratch in Keras.

Each line of the code is explained in detail. We have learned why there is a need to move to convolution layer from the fully connected layer. I hope you enjoyed this part of AI Starter series. We also learned about hyperparameters. In the next blog part — 3, we will build a better model using convolution layers which will be much more powerful than the current model. Please give your kind feedback for this article, it will encourage and help me improve my work. Also, share it and follow to stay updated with such easy and detailed articles in the field of Machine learning, Deep learning, Computer Vision and Image processing.

I am extremely happy to collaborate with Shubham Shrey a dexterous graphic designer. Special thanks for creating this thoughtful logo for my AI -Starter series.

References:

--

--

Pallawi
Pallawi

Written by Pallawi

Computer Vision contributor. Lead Data Scientist @https://www.here.com/ Love Data Science.

No responses yet