Part 1 of the series is here.
Many different techniques have been used for Machine Learning over the years, but since all of the most exciting recent advances have been made using neural network systems that’s what we will focus on in this series.
So let’s start from the basics. A neural network is a network of neurons ok but…
What is a Neuron?
Your brain has billions of neuron cells which enable you to think and learn. In machine learning we try to model this with “neurons” using a computer. A computer neuron is a lot simpler than the cells in your brain but we can think about it as a very simple computer that receives some inputs and produces an output. This output might become an input to another neuron.
As a really simplified case, consider a neuron with only 3 inputs called Input1, Input2 and Input3. Let’s say the inputs have numeric values which range from -100 to 100. The neuron will base its output on those inputs using some formula. We might imagine a very simple neuron that just adds up the value of all its inputs and outputs the total.
In a real machine learning system, the function of the neurons is more complicated. We probably don’t want to treat all the inputs the same, so we will take the value of each Input and multiply it by a scale factor or weight. An Input that is important to this neuron might have a weight of 1 or even greater than 1 while something unimportant to this neuron may have a very small or 0 weighting factor.
So far our very simple neuron is defined by a formula like this:
Output = Input1 * Weight1 + Input2 * Weight2 + Input3 * Weight3
We can think about this like the weights on the inputs are the ways this little tiny neuron expresses its preference about the inputs.
There’s one more key thing to consider, sometimes we’d really like to think of a neuron more like a switch that is on or off rather than just the sum of all the inputs. There are lots of ways these systems try to do this kind of thing, but one of them is using a function called ReLU. We don’t need to understand its name or really any of the math for why its chosen. Its enough to know that for any value where the inputs sum up to less than 0, the ReLU function will output 0, otherwise it will just output the value.
The key idea being that the neuron is kind of making a choice, below a certain threshold we don’t output anything and above that we do. We could say that the neuron triggers on this threshold. We might even say the neuron is “detecting” something.
We may also add an offset to the total of the inputs so that the threshold could be something other than 0. If we want the neuron to trigger on a value of 5 instead of 0 we can just add an offset of -5 to the sum of the inputs.
So now our simple neuron’s math looks something like:
Output = ReLU(Input1 * Weight1 + Input2 * Weight2 + Input3 * Weight3 + Offset)
Or in English, we look at all the inputs, add the values up based on how important we think they are and if the result is more than 0 we output that number, otherwise we just output 0.
The weights and the offset are parameters for this neuron.
What is a Neural Network?
A single neuron can’t do very much but if we connect them together in large enough networks they can do some pretty impressive things. Cat detection in photos is going to require 1000s of neurons or even more. In such a system, each neuron might have a lot more than just three inputs. How do we construct this?
First we have to decide on how we will connect up all the neurons. We could connect them all to each other but over time scientists learned that putting them together in a specific structure works better.
A common approach is to put them in layers. If we had 10,000 neurons there might be say 5-6 layers with 1 to 1000s of neurons in each layer. The first layer of neurons receives the input that we want to make predictions about. For our cat detection problem, we could perhaps use the brightness value for every pixel in the image as inputs to the first layer of neurons. We could have multiple neurons “watching” each pixel. The outputs of those neurons feed into the next layer and so on until the neuron(s) of the last layer produce(s) the output. In between the input and output layers, we have what we call hidden layers that we’ll talk about more later. It might look something like this but this is only 34 neurons, not 1000s.
People designing the model make choices about the number of layers and how they connect together. They don’t, however, decide the weights for all the inputs to each neuron or the offset from the formula above. They have designed a kind of brain, but it hasn’t yet learned anything. During the training process the system will determine values for all of these parameters so that the system can make good predictions.
Training A Neural Network
The math for training a neural network to make predictions is pretty complicated. If you want to understand more about it you can read about terms such as gradient descent and backpropagation. Here we will just present a simple intuitive version.
Let’s begin by setting all the parameters to random numbers. This isn’t going to give a good final result but we have to start somewhere. Next we take an item from the training set such as an image for our cat detector and present it to the model. The values from the image are inputs to the first layer of neurons. These neurons compute their outputs based on the inputs and the random values they have for all the weights etc. This continues throughout the layers till we get to the output layer. For cat detection, the output layer may only have one neuron. If that neuron’s output is 0 we say it predicted that there is no cat in the image. If it is greater than 0 it thinks there is a cat. We might even say that the larger the number is, the more confident the model is that there is a cat.
If the model is correct we go on to the next image, if the model is wrong we are going to change some of the values for the weights etc and try again with other images. Real systems do this in very sophisticated ways. Essentially they are able to determine which changes to the parameters are most likely to drive the result in the direction we want based on the differences between the expected answer and what the network predicted. There’s a lot of research on this process and the computing resources to do this work are very expensive. The fancy math alluded to above is what guides us on making these changes. A lot of care is taken on what samples to show when etc.
What we care about right now is understanding that after huge amounts of data are sent through the system, it is able to adjust the values to where they give good results. The final quality of the results will depend on the amount of data we had and how well we designed our little brain. Is it big enough? Does it have the right connections etc? A system with only 20 neurons will never be great at detecting cats no matter how much data we give it. For a system with 100 Million neurons, it’s going to take a lot of cat images before it reaches its full potential.
Understanding a Neural Network
So what is going on with the structure of the neurons? What is up with these layers?
We previously talked about the challenge of a human trying to program the computer to recognize a cat. We don’t really have a clear idea of what to ask the computer to do. What are the key features that define what a cat is?
The great thing about machine learning with neural networks is that we often don’t have to decide what those features are. We just let the network learn them for itself. This is essentially what is going on in the hidden layers of the network. If we tried to analyze what each neuron in the system is “learning” we would find that the neurons are learning to trigger on different low level concepts like finding edges in the image, or looking for specific orientations of edges that seem cat shaped. Sometimes these concepts may make sense to us but other times they won’t. The neural network learned its own concepts in the pursuit of its goal of cat detection but explaining itself to us is not one of its goals.
This difficulty in understanding what is going on inside the network gets way more challenging as the networks get very large.
Key Points
- Neural Networks are a kind of machine learning model that uses a large number of simple mathematical neurons which are roughly modeled on how our brains work.
- A neural network’s size and shape is designed ahead of time by people, and then a training process determines the parameters for the network based on data.
- Training a large neural network to get good results requires both a lot of data and a lot of computer resources.
- Some parts of the network can be thought of as learning to detect different concepts or features as part of the goal of getting the final result.
The next post will focus on what happens when these networks get really huge.
Leave a Reply