Saturday, July 13, 2019

How to create a model for Keras

Create Models of Keras

In this post, I will explain how to create a model for Keras. At first, we will see the MNIST example from the GitHub.
The code is like this:
from __future__ import print_function

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop

batch_size = 128
num_classes = 10
epochs = 20

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))

model.summary()

model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(),
              metrics=['accuracy'])

history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
The model is created like this in the code :
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))

Dense function

Actually the function "Dropout" is used to give some randomness to prevent the network to memorize the entire data. So this code without the dropout function works too:
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dense(512, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
So what we most care is the "Dense function". According to the documentation of Keras, the arguments for Dense function are like this:
keras.layers.Dense(units, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
The first argument "512" of the dense function is "units" of the layer. You can see this stackoverflow to learn what the "units" is. It just an "output shape" of the layer. The first layer is expected to output a shape with 512 neurons because 512 is given as units.

 image 1

So if we give 512 for the units of the layer, the result of the layer is carried over to the next layer from the 512 neurons.  

But why 512? Actually we don't know why we use 512. Maybe because 512 neurons treats the activation most efficiently? It is a kind of arbitrary number, which we must find by trial and error.

Video 1

If you see carefully, you will notice that only the last layer has "num_classes" (= 10) as units. The last layers has 10 as the units (or output shape) because the neural network is expected to give a number out of 10 numbers (namely 0, 1, 2, 3, 4, 5, 6, 7, 8, 9) at the end. So the last layer must have a 10 as output shape. 

"input_shape" of the dense function

Only the first layers has the argument "input_shape". Why? That's because the layers after the first can guess what number should be the input shape from the previous output shape. What we must do is only giving what shape will be given to the first layer. 

The samples of MNIST are images of handwritten numbers. Each image has 28 * 28 (=784) gray-scaled pixels like this:


28 * 28 pixels

"Relu" activation

What is activation in the first place? According to this page:
It’s just a thing function that you use to get the output of node. It is used to determine the output of neural network like yes or no. It maps the resulting values in between 0 to 1 or -1 to 1 etc. (depending upon the function).
- SAGAR SHARMA, Towards Data Science
In the model used for MNIST, relu and softmax activation are used.

model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))

Relu means "Rectified Linear Unit". This is the most used activation function as it usually makes better results compared to other activation functions. Relu's advantage is "sparsity and a reduced likelihood of vanishing gradient" according to StackExchange, so it is used to define how the model learn during the training.

Softmax is used to transform arbitrary real values to probabilities, so softmax is used to change the output of the previous layer to probabilities. In fact, this is the layer to make the prediction.

You can see here for other available activation functions.

The conclusion

Like we have seen above, we can create the model of Keras like this: 

model = Sequential()
// 28 * 28 pixels = 784 pixel
// 512 for the "output shape".
model.add(Dense(512, activation='relu', input_shape=(784,))) 
// 512 for the "output shape".
model.add(Dense(512, activation='relu'))
// One of 10 numbers (0, 1, 2, 3 ... 9) must be chosen at the last layer 
model.add(Dense(10, activation='softmax'))

But we can make the model like this too:

model = Sequential()
model.add(Dense(300, activation='relu', input_shape=(784,)))
// Three hidden layers with 300 neurons!
// Why 300? I don't know why, but it might work! 
model.add(Dense(300, activation='relu'))
model.add(Dense(300, activation='relu'))
model.add(Dense(300, activation='relu'))
model.add(Dense(10, activation='softmax'))

And, though this is meaningless, you can make each layer with 1 neuron if you want:

model = Sequential()
model.add(Dense(1, activation='relu', input_shape=(784,))) 
model.add(Dense(1, activation='relu'))
model.add(Dense(1, activation='relu'))
model.add(Dense(1, activation='relu'))
model.add(Dense(1, activation='relu')) // 4 hidden layers with 1 neuron!
model.add(Dense(10, activation='softmax'))

But the first layer's input shape and the last layer's output shape can not be changed in any case. They must be always consistent.

Also you can change "relu" to other functions like "selu", but "softmax" can not be exchanged to other function as it is the function used to get probabilities. According to StackExchange, "the sigmoid function is used for the two-class logistic regression, whereas the softmax function is used for the multiclass logistic regression". In the above example, there are 10 cases (0, 1, 2, ..., 9), so "sigmoid" can not be used for it. You must use "softmax" for the example above.