Home > Machine Learning > Rectifier Nonlinearities

## Rectifier Nonlinearities

There are multiple different choice of activation functions for a NN. Many work has shown that using Rectified linear unit (ReLU) helps improve discriminative performance.

The figure below shows few popular activation functions, including sigmoid, and tanh. sigmoid:       g(x) = 1 /(1+exp(-1)). The derivative of sigmoid function g'(x) = (1-g(x))g(x).

tanh :              g(x) = sinh(x)/cosh(x) = ( exp(x)- exp(-x) ) / ( exp(x) + exp(-x) )

Rectifier (hard ReLU) is really a max function

g(x)=max(0,x)

Another version is Noise ReLU max(0, x+N(0, σ(x)). ReLU can be approximated by a so called softplus function (for which the derivative is the logistic functions):

g(x) = log(1+exp(x))

The derivative of hard ReLU is constant over two ranges x<0 and x>=0, for x>0, g’=1, and x<0, g’=0.

This recent icml paper has discussed the possible reasons that why ReLU sometimes outperform sigmoid function:

• Hard ReLU is naturally enforcing sparsity.
• The derivative of ReLU is constant, as compared to sigmoid function, for which the derivative dies out if we either increase x or decrease x.
Categories: Machine Learning