Categories

## Perceptron and Backpropagation

A short essay I prepared for the course “Machine Learning at Scale” at York University. This text scores over 55 on the Flesch Reading Ease scale, which is pretty impressive for such technical material.

## The Working of a Perceptron

A perceptron is a simple algorithm that can be trained as a binary classifier using supervised learning. It was invented in 1958 by Frank Rosenblatt at the Cornell Aeronautical Laboratory.

A very simple example of a perceptron contains 3 layers, an input layer, a hidden layer, and an output layer. Each layer contains a number of nodes. Each node passes values to each node in the successive layer. When only a single hidden layer exists, the perceptron can be called a shallow neural network.

Each input value is multiplied by a unique weight as it is passed to each node in the hidden layer. These weights are contained in a matrix having numbers of rows and columns equal to the number of nodes in the input and hidden layers. Additionally, there is a bias factor which is passed to the hidden layer, which allows the output curve to be moved with respect to the origin, without affecting its shape. The values from the nodes of the hidden layer are then passed along to the output layer for summation. Finally, an activation function is usually applied to map the input values onto the required output values, though for simplicity, not in the example being considered here.

If the inputs are y1 and y2, the weights are w[1,1], w[1,2], w[1,3], w[2,1], w[2,2], and w[2,3], and the bias value is b, the perceptron in the simple diagram above would calculate the output (ŷ) as:

ŷ = y1*w[1,1] + y1*w[1,2] + y1*w[1,3] + y2*w[2,1] + y2*w[2,2] + y2*w[2,3] + b

In any perceptron or neural network larger than that, writing out all the terms would be cumbersome, to say the least, and so this is usually done with summation notation: