Activation function is a what forms an output of a neuron. This is what adds non-linearlity to your prediction and makes a Neural Network based predictor so much better than linear models.
The question we usually ask ourselves is which activation function should I use?
The answer is there is no one-works-all answer to this question. It depends.
Let me walk you through the most commonly used activation functions and their pros and cons to help you make a better decision.
We can define our own activation functions to best fit our need, the most commonly used ones are:
1. Sigmoid Activation
2. Tan hyperbolic Activation
3. ReLU (Rectified Linear Unit)
4. Leaky ReLU
This is how each of them looks like:
Photo Source: DeepLearning.ai Specialization
1. Sigmoid Activation
The sigmoid activation ranges between 0 and 1. It looks like the common “S shaped” curve we see in different fields of studies.
Pros:
Simple – logic and arithmetic-wise
Offers good non-linearity
Natural probablity output – between 0 and 1 for classification problems.
Cons:
Network stops learning when you shoot values towards the extremes of the sigmoid – This is called the problem of vanishing gradients.
2. Tan Hyperbolic
This is pretty much a sigmoid over an extended range (-1 to 1).
Pros:
This increases the steady non-linear range in the middle of the sigmoid before the slope/gradient flattens out. This increased range helps the network learn faster.
Cons:
This activation does limit the problem of vanishing gradients on the tail ends of the sigmoid to a certain grade but we have better options to learn faster.
3. ReLU
Rectified – MAX(0,value)
Linear – for z >0 (positive values)
ReLU is a fancy name for a postive-only linear function. For negative predictions the unit has a slope of 0. However, for the positive activations the network can surely learn a lot faster with a linear slope.
Pros:
Learns faster.
Slope is 1 as long as z is positive.
Cons:
Yet to find any 🙂
4. Leaky ReLU
This provides a slight slope for negative values of the prediction from the neuron. It’s an improvement over ReLU.
These non-linear activations help improve and lay the foundation for Neural Networks.
Until next time!