This may help to have Nitish’s deepnet work on your mac. The code is very clean, most important thing is to follow the instructions here https://github.com/nitishsrivastava/deepnet/blob/master/INSTALL.txt
b) CUDA Toolkit and SDK.
Follow the instructions(CUDA5.5): http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-mac-os-x/
NVIDIA CUDA Toolkit (available at http://developer.nvidia.com/cuda-downloads)
I followed both instruction on http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-mac-os-x/
and instruction from the deepnet to set the system paths:
Follow the deepnet instruction: for mac, it is the ‘~.profile’, edit/add to the file:
First make sure CUDA installed right:
install the examples: cuda-install-samples-5.5.sh <dir>
and go to /Developer/NVIDIA/CUDA-5.5/samples, choose any simple example subfolder, go into and do ‘make’, after make completed, you can do a simple test.
(c) Protocol Buffers.
Download the file: http://code.google.com/p/protobuf/
Follow the instructions to compile/install it. It will be install (generally in /usr/local/bin/protoc). It was said that you only need to include the directory that contains ‘proc’, so add to path:
(2) COMPILING CUDAMAT AND CUDAMAT_CONV
For making the cuda work, do ‘make’ in cudamat , but change all the ‘uint’ to ‘unsigned’ in file: cudamat_conv_kernels.cuh
or do a #define uint unsigned
Then run ‘make’ in cudamat folder
(3,4) STEP 3,4
continue follow step 3, and 4 on https://github.com/nitishsrivastava/deepnet/blob/master/INSTALL.txt. and you will get there.
Note (1): I did not install separately for cudamat library by Vlad Mnih and cuda-convnet library by Alex Krizhevsky.
Note (2): If you do NOT have GPU: another alternative is to not use GPU, most recent mac come with NVIDIA 650, but some old version may use intel graphical card. In that case you can still do the deep learning part, but using eigenmat. The drawback is that it will be very slow.
Install eigen from here: http://eigen.tuxfamily.org/index.php?title=Main_Page
if given error <Eigen/..> can not found, change to “Eigen/…”
also you need to change python path, including path to where ‘libeigenmat.dylib’ located. It it still fails to find: libeigenmat.dylib. It may not hurt to give it a direct path, edit the file <eigenmat/eigenmat.py>.
_eigenmat = ct.cdll.LoadLibrary(‘the-path-to/libeigenmat.dylib’)
There are multiple different choice of activation functions for a NN. Many work has shown that using Rectified linear unit (ReLU) helps improve discriminative performance.
The figure below shows few popular activation functions, including sigmoid, and tanh.
sigmoid: g(x) = 1 /(1+exp(-1)). The derivative of sigmoid function g'(x) = (1-g(x))g(x).
tanh : g(x) = sinh(x)/cosh(x) = ( exp(x)- exp(-x) ) / ( exp(x) + exp(-x) )
Rectifier (hard ReLU) is really a max function
Another version is Noise ReLU max(0, x+N(0, σ(x)). ReLU can be approximated by a so called softplus function (for which the derivative is the logistic functions):
g(x) = log(1+exp(x))
The derivative of hard ReLU is constant over two ranges x<0 and x>=0, for x>0, g’=1, and x<0, g’=0.
This recent icml paper has discussed the possible reasons that why ReLU sometimes outperform sigmoid function:
- Hard ReLU is naturally enforcing sparsity.
- The derivative of ReLU is constant, as compared to sigmoid function, for which the derivative dies out if we either increase x or decrease x.
Deep learning recently becomes such a hot topic across both academic and industry area. Guess the best way to learn some stuff is to implement them. So I checked the recent tutorial posted at
ACL 2012 + NAACL 2013 Tutorial: Deep Learning for NLP (without Magic)
There are two main parts for an autoencoder: feedforward and backpropagation. The essential thing needs to be calculated is the “error term”, because it is going to decide the partial derivatives for parameters including both W and the bias term b.
You can think of autoencoder as an unsupervised learning algorithm, that sets the target value to be equal to the inputs. But why so, or that is, then why bother to reconstruct the signal? The trick is actually in the hidden layers, where small number of nodes are used (smaller than the dimension of the input data — the sparsity enforced to the hidden layer). So you may see autoencoder has this ‘vase’ shape.
Thus, the network will be forced to learn a compressed representation of the input. You can think it of learning some intrinsic structures of the data, that is concise, analog to the PCA representation, where data can be represented by few axis. To enforce such sparsity, the average activation value ( averaged across all training samples) for each node in the hidden layer is forced to equal to a small value close to zero (called sparsity parameters) . For every node, a KL divergence between the ‘expected value of activation’ and the ‘activation from training data’ is computed, and adding to both cost function and derivatives which helps to update the parameters (W & b).
After learning completed, the weights represent the signals ( think of certain abstraction or atoms) that unsupervised learned from the data, like below: