Home > Machine Learning > Rectifier Nonlinearities

Rectifier Nonlinearities

November 6, 2013 Leave a comment Go to comments

There are multiple different choice of activation functions for a NN. Many work has shown that using Rectified linear unit (ReLU) helps improve discriminative performance.

The figure below shows few popular activation functions, including sigmoid, and tanh.

activation_funcs

sigmoid:       g(x) = 1 /(1+exp(-1)). The derivative of sigmoid function g'(x) = (1-g(x))g(x).

tanh :              g(x) = sinh(x)/cosh(x) = ( exp(x)- exp(-x) ) / ( exp(x) + exp(-x) )

Rectifier (hard ReLU) is really a max function

g(x)=max(0,x)

Another version is Noise ReLU max(0, x+N(0, σ(x)). ReLU can be approximated by a so called softplus function (for which the derivative is the logistic functions):

g(x) = log(1+exp(x))

The derivative of hard ReLU is constant over two ranges x<0 and x>=0, for x>0, g’=1, and x<0, g’=0.

This recent icml paper has discussed the possible reasons that why ReLU sometimes outperform sigmoid function:

  • Hard ReLU is naturally enforcing sparsity.
  • The derivative of ReLU is constant, as compared to sigmoid function, for which the derivative dies out if we either increase x or decrease x.
Advertisement
Categories: Machine Learning
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: