For this project I implemented 4 different neural networks in Matlab:
- Single-layer linear perceptron
- Single-layer perceptron
- Multi-layer perceptron
- Convolutional
While all are neural networks, all 4 function differently and thus provide differences in accuracy. I used stochastic gradient descent in all of them, but here are the differences:
- My single-layer linear perceptron calculated the loss derivative using euclidean distance between our prediction and ground truth.
- My single-layer perceptron calculated the loss derivative using a loss cross-entropy soft-max. This calculates cross-entropy between the prediction and ground truth soft-maxes.
- My multi-layer perceptron was similar to the single-layer perceptron in that it used the loss cross-entropy soft-max. However, it added an additional layer before the loss calculation. This layer was an ReLu that could turn incoming an incoming data value into max(0, value). This allows faster learning and makes sure your gradient doesn’t vanish in your calculations.
- Finally, the convolutional neural network adds in a convolution layer and a pooling method which pulls max values out of a grid of pixels. The big benefit of the convolution layer is that we can train a model that takes into account the spatial relationships of a pixel, or their local spatial coherence, that helps find patterns. Other networks don’t care at all about the order of their images and thus do not take advantage of pixels spatial relationship.
Through all of this, by far the most difficult aspect of it for me was the back-propagation. When you calculate your prediction, to perform gradient descent, you have to back-propagate your loss derivative through all the layers back to the input, so that all your layers and values can adapt. This was extremely complex and very rewarding once I finally figured it out.
Accuracy:
Source code: Here