Published By Dr. Mahsa Hassankashi | 50/03/2019
## What is Deep Learning ?

## Deep Learning vs Machine Learning

## Feature Engineering Importance

## Deep Learning and human brain

## Requierment:

##### 1. Machine Learning - Linear Regression Gradient Descent

## How Deep Learning - Convolutional Neural Network Works?

**One CNN Dimensional**

### Two CNN Dimensional

## Deep Learning Code Sample by Digit Recognition

#### Download training and test data set

## Increase deep learning performance with hardware by GPU

#### GPU Activation

##### GPU Test Code

## Increase deep learning performance with software libraries

## Feedback

Deep learning convolutional neural network by tensorflow python, complete and easy understanding.

Actually, deep learning is a branch of machine learning. Machine learning includes some different types of algorithms which get a few thousands of data and try to learn from them in order to predict new events in the future. But deep learning applies a neural network as extended or variant shapes. Deep learning has a capacity of handling a million points of data.

The most fundamental infrastructure of deep learning could be; its ability to pick the best features. Indeed, deep learning summarized data and compute the result based on compressed data. It is what is really needed in artificial intelligence, especially when we have a huge database with dramatical computation.

Deep learning has sequential layers which are inspired by the neural network. These layers have a nonlinear function with the duty of feature selection. Each layer has an output which will be used as input for the next layers. Deep learning applications are computer vision (such as the face or object recognition), speech recognition, natural language process(NLP) and cyber threat detection.

The major differences between machine learning and deep learning is that; in ML we need to **human manual intervention to select feature** extraction while in DL it will be done by its **intuitive knowledge** which has been embedded inside its architecture. This differences make a dramatically influence in their performance either in precision or speed. Because there are always **human error in manually feature detection**, therefore DL can be best option for giantic data computation.

The common factor between DL and ML is that both of them are working in supervised and unsupervised. DL is just based on NN while changes its shape and operation in CNN - RNN andd etc. But ML has different algorithms which are based on statistical and mathematical science. Although it doesn't mean that DL is merely on neural network, DL can also uses of various ML algorithms in order to increase performance by making hybrid functions. For instance DL can apply Support Vector Machine (SVM) as its own activation function instead of softmax. [1]

We try to make a machine as an independent tool in artificial intelligence to think which needs less programmer intervention. The most characteristic of an automated machine is; the way he thinks if his way of thinking has the most similarity to the human brain so he will win in the race of best machine. So let’s see what is the pillar attribute in making an accurate decision. Remember our childhood, when we saw objects but we had no idea about their properties such as name, exact size, weight and so on. But we could categorize them quickly by noticing one important thing. For example, by looking at one animal we noticed that it is "Dog" as soon as we heard is sound which is "barking" or we noticed it is "Cat" when we heard it is "meowing". So here animal sound has a most effective influence rather than size because as experience when we see animal with similar size to another animal our brain starts to pay attention to the most distinguishing feature which is sound. On the other hand, when we see the most taller animal in the zoo we ignore all of the other features and we say “Yes, it is giraffe”.

It is a miracle in the brain because it can inference situation and according to a different condition in the same problem such as “animal detection” make one feature as his final key to making a decision according to that and given result by this attitude will be accurate and also quickly. Another story to make clear the feature engineering importance is “Twenty Questions Game” if you did not play it till now please look at: __ here__

The player will win if has the ability to ask a proper question and according to the recent answers he should make and improve the next question. The questions are sequentially and the next question is 100% depends on the previous answer. Previous answers have the duties to make filtration ad clarification for the player to reach the goal. Each question is like a hidden layer in a neural network which are connected to the next layers and their output will be used as input for the next layers. Our first question always starts as “Is it alive?” and by this question, we remove half of the possibilities. This omitting and dropping lead us to ask a better question in a new category, obviously, we cannot ask the next one without previous answer which made a clarification and filtration in our brain. This story happens somehow in deep learning convolutional neural network.

Deep learning is an imitation of the human brain with almost in the aspect of precision and speed. Convolutional Neural Networks (CNN) is inspired from brain cortex. AS you see in below picture visual cortex layer has covered all of the entire visual fields. These sensitive cells have the role of kernel or filter matrix which we will pay attention to them later in this article. God created these cells to extract important data which are coming from the eyes.

Assume students have an exam and they are preparing themselves, they start to read the book while they pick up an important part of the book and write it on notes or by highlighting them. In both, they tend to reduce the volume of the book and summarized 100 pages into two pages which are easy to use it as a reference and review it. The similar scenario happens on DL CNN, this time we need a smaller matrix to filter and remove data.

I strongly recommend and please you to read carefully the first and second below articles, because their concept will be needed and I assumed that you know everything about linear regression and neural network.

Deep learning is a neural network which has more than two hidden layers. Please if you are new in neural network study this link. There are more data because of more layers which causes overfitting. Overfitting happens when we made our model from training data set as really complete and match to test set and always there is one answer inside the model. One of the good characteristics of the model is to be generalized not to be complete coincident.

We cannot or even we can it is wrong to make a complete model. Let’s see what happens when we want to assign a “Y” inside our model. We must ignore to be too much idealistic in making model and tend to make it general rather than specifically, in order to reach this point, we can apply cross-validation. Cross-validation is a model evaluation method. The best way is using K-fold cross-validation which tries to divide train set to k parts and in each iteration, k belongs to test and the rest of k-1 is trained to set, therefore the chance of matching will be decreased. There are some specific solutions instead of K-fold cross-validation in a convolutional neural network in order to avoid overfitting such as drop out and regularization.

Fully connected in DL means that each neuron in one hidden layer has a connection to all of the neurons to the next layer. In the case of applying drop out in training time some of the neurons will be turned off and after finishing training on the prediction time all neurons will be turned on. So DL tries to omit and remove redundant data and obscure their role and enhance and bold the role of important features. Such as below picture when the left picture has high resolution but within passing time DL CNN tries to keep on important pixel and make it's smaller.

Assume students have exam and they are preparing themselves, they start to read the book while they pick up important part of book and write it on notes or by highlighting them. In both they tend to reduce the volume of book and summarized 100 pages into two pages which are easily to use it as reference and review it. The similar scenario happens on DL CNN, this time we need a smaller matrix to filter and remove data.

We can transform data to smaller data -which is easier to rely on it for making decision- with the aid of smaller matrix and rotating all over of original and primitive matrix. We do some mathematical calculation by moving filter matrix around primitive matrix. For example, in below picture 12 data points will be reduced to just 3 data points by rotating one matrix 3 times in all over of primitive matrix. These computation can be maximized or taking average of data.

There is no such as one-dimensional matrix in the real world but because of presenting its way I prefer to start with 1D Matrix. I want to make a dimensional reduction with the aid of a red matrix on the blue matrix. So the blue matrix is a real data set and the red one is filtered matrix. I want to transform blue matrix with 5 elements into 3 elements. I push the red matrix from left to the right (I push it in each step just one element). Whenever there are coincident I multiply two related elements and in the case of more than one matching elements, I sum up them together. As a notice red matrix was [2 -1 1] and after the flip, it (kernel) becomes [1 -1 2].

To reduce matrix, I am looking for valid results and they happen when all of red or filter elements are covered by a blue one. I just pick up [3 5]

```
import numpy as np
x = np.array([0,1,2,3])
w = np.array([2,-1,1])
result = np.convolve(x,w)
result_Valid = np.convolve(x,w, "valid")
print(result)
print(result_Valid)
```

There is a similar story in two-dimensional matrixes. The kernel matrix [[-1, 0], [2, 1]] will be changed [[1, 2], [0, -1]] to after flipping. Because in all steps in the below pictures filter matrix is inside the original train matrix, so all of the commutated elements are valid.

```
from scipy import signal as sg
print(sg.convolve([[2, 1, 3],
[5, -2, 1],
[0, 2, -4]], [[-1, 0],[2, 1]]))
print(sg.convolve([[2, 1, 3],
[5, -2, 1],
[0, 2, -4]], [[-1, 0],[2, 1]], "valid"))
```

I want to introduce you best competition community __KAGGLE __ which is famous around data scientist. There are many competitions which are worthy to practice your abilities in machine learning and deep learning. Also, there are awards or whoever can accomplish code for recent challenges. There are kernels which have been written by authors and also you can contribute to those and they are good sources for learning artificial intelligence in R and Python. Moreover, you can use its data set as a reference and test your code with prepared data.

I want to practice convolutional please click here.

Please Go to this link to get training and testing data set. Obviously, you must sign up on kaggle site and then try to join this competition.

```
# -*- coding: utf-8 -*-
"""
Created on Sun Nov 19 05:59:50 2017
author: Mahsa
"""
import numpy as np
from numpy.random import permutation
import pandas as pd
import tflearn
from tflearn.layers.core import input_data,dropout,fully_connected,flatten
from tflearn.layers.conv import conv_2d,max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression
from sklearn.cross_validation import train_test_split
train_Path = r'D:\digit\train.csv'
test_Path = r'D:\digit\test.csv'
#Split arrays or matrices into random train and test subsets
#http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
def split_matrices_into_random_train_test_subsets(train_Path):
train = pd.read_csv(train_Path)
train = np.array(train)
train = permutation(train)
X = train[:,1:785].astype(np.float32) #feature
y = train[:,0].astype(np.float32) #label
return train_test_split(X, y, test_size=0.33, random_state=42)
def reshape_data(Data,Labels):
Data = Data.reshape(-1,28,28,1).astype(np.float32)
Labels = (np.arange(10) == Labels[:,None]).astype(np.float32)
return Data,Labels
X_train, X_test, y_train, y_test = split_matrices_into_random_train_test_subsets(train_Path)
X_train,y_train = reshape_data(X_train,y_train)
X_test,y_test = reshape_data(X_test,y_test)
test_x = np.array(pd.read_csv(test_Path))
test_x = test_x.reshape(-1,28,28,1)
def Convolutional_neural_network():
network = input_data(shape=[None,28,28,1],name='input_layer')
network = conv_2d(network, nb_filter=6, filter_size=6, strides=1, activation='relu', regularizer='L2')
network = local_response_normalization(network)
network = conv_2d(network, nb_filter=12, filter_size=5, strides=2, activation='relu', regularizer='L2')
network = local_response_normalization(network)
network = conv_2d(network, nb_filter=24, filter_size=4, strides=2, activation='relu', regularizer='L2')
network = local_response_normalization(network)
network = fully_connected(network, 128, activation='tanh')
network = dropout(network, 0.8)
network = fully_connected(network, 256, activation='tanh')
network = dropout(network, 0.8)
network = fully_connected(network, 10, activation='softmax')
sgd = tflearn.SGD(learning_rate=0.1,lr_decay=0.096,decay_step=100)
top_k = tflearn.metrics.top_k(3) #Top-k mean accuracy , Number of top elements to look at for computing precision
network = regression(network, optimizer=sgd, metric=top_k, loss='categorical_crossentropy')
return tflearn.DNN(network, tensorboard_dir='tf_CNN_board', tensorboard_verbose=3)
model = Convolutional_neural_network()
model.fit(X_train, y_train, batch_size=128, validation_set=(X_test,y_test), n_epoch=1, show_metric=True)
P = model.predict(test_x)
index = [i for i in range(1,len(P)+1)]
result = []
for i in range(len(P)):
result.append(np.argmax(P[i]).astype(np.int))
res = pd.DataFrame({'ImageId':index,'Label':result})
res.to_csv("sample_submission.csv", index=False)
```

One common important factor among gamer developer, graphic designer, and data scientist is matrices. Every data point either in images, video or complex data has a value in matric element. Whatever we do include some mathematical operation to transforming matrices.

For usual processing, Central Processing Unit is a good answer, but in advanced mathematical and statistical operations with huge data CPU cannot tolerate and we have to use Graphics Processing unit (GPU) which was designed for mathematical difficult function. Because deep learning includes functions which need complex computation such as convolution neural network, activation function, sigmoid softmax, and Fourier Transform will be processed on GPU and the rest of other 95% will be moved on CPU which or mostly I/O procedures.

- Open start and bring "
**windows command prompt cmd**". - Type "
**dxdiag**" - On the opening window look at "
**Display Tab**" - If the name is equal to "
**NVIDIA**" or (NVIDIA GPU - AMD GPU - Intel Xeon Phi) other company, means that there is GPU card on the board. - Lets try to set configuration .theanorc on the "C:\users\"yourname"\".theanorc "
- Set { device =
**gpu**or**cuda0**, floatX =**float32**} in**[global]**section, and preallocate = 1 in**[gpuarray]** - If you want to know more about it please look at here.

```
import os
import shuti
destfile = "/home/ubuntu/.theanorc"
open(destfile, 'a').close()
shutil.copyfile("/mnt/.theanorc", destfile) # make .theanorc file in the project directory
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
```

In order to enhance the CNN performances and also because it is not possible to shocked CPU or even GPU with gigantic data more than a terabyte, we must use some strategies to break down data manually in some chunks for processing. I have used DASK to prevent out of ram memory crashes. It is a responsible or time schedule.

```
import dask.array as da
X = da.from_array(np.asarray(X), chunks=(1000, 1000, 1000, 1000))
Y = da.from_array(np.asarray(Y), chunks=(1000, 1000, 1000, 1000))
X_test = da.from_array(np.asarray(X_test), chunks=(1000, 1000, 1000, 1000))
Y_test = da.from_array(np.asarray(Y_test), chunks=(1000, 1000, 1000, 1000))
```

Feel free to leave any feedback on this article; it is a pleasure to see your opinions and **vote **about this code. If you have any questions, please do not hesitate to ask me here.

Join Us

Let's subscribe and follow our daily tutorial about technology. You can order your difficult topic and see on subscriber your topic with your name.