September 3, 2018

July 24, 2018

March 11, 2018

April 15, 2017

Please reload

Recent Posts

The Startup Quadrant

April 15, 2017

1/3
Please reload

Featured Posts

The Perceptron Algorithm

September 3, 2018

The Perceptron Algorithm is one of the oldest Machine Learning algorithm developed in 1950's.

 

The motivation behind this algorithm comes from the study of a biological neuron in order to replicate the intelligence brain to develop Artificial Intelligence.

 

This is how a neuron works - 

 

A neuron takes input signals from its surroundings using Dendrites 

and send to the cell body. Once the received signal crosses the threshold point, the neuron gets fired/activated and passes the signal via Axon to the Synaptic terminals..

 

A basic Perceptron works on the same principle. It is used in binary classification problems,

                                                                                                                     

Fig. 1 Neuron                                               

 

 

 

Just as we can see the figure of a perceptron, we pass the inputs via the some weights. And then we sum the product of the weights and the inputs. Weights are something that we learn overtime from the data that is given to us. That part is the "Learning" in Machine Learning, This whole summation is passed to an activation function which generates probability of a class in binary classification. 

 

 

How we learn weights

A Perceptron seeks to minimize the loss function which is defined as - 

 

L = - Σ (yi * (xi^T * w))

which is summation of y and the product of x and w over all the data points where y is not same as that of sign of x*w.

 

By minimizing L, we try to predict the correct label.

 

After calculating the loss function using the above formula, we update the weights using the below equation.

 

W(new) = W(old) - ( η * ΔL )

 

Here, ΔL cannot be solved analytically, but it gives us a sense of direction in which L is increasing in w. η is a constant and also called learning rate. It sets the speed with which we learn. 

 

The Loss calculated using the W(new) is less than the loss calculated using W(old). This procedure is one step of optimizing the objective function. After repeating it for certain number of times, we get the value of W that fits well to the data and produces a low loss. This is procedure of updating the value of weights using the delta of the loss function is called Gradient Descent. 

 

And after multiple iterations of Gradient Descent, we have the optimized value of W which is then used to predict the class of the incoming data. 

 

One thing to notice about Perceptron is that if the data is not linearly separable, then its not possible to converge the value of W using Gradient Descent, which leads to the bad fit and wrong predictions.

 

 

Kush.

 

Share
Tweet
Please reload

Follow Us
Please reload

Search By Tags
Please reload

Archive
  • Facebook Basic Square
  • Twitter Basic Square
  • Google+ Basic Square

© 2020 by Kush Kulshrestha. All Rights Reserved.