This may help to have Nitish’s deepnet work on your mac. The code is very clean, most important thing is to follow the instructions here https://github.com/nitishsrivastava/deepnet/blob/master/INSTALL.txt
b) CUDA Toolkit and SDK.
Follow the instructions(CUDA5.5): http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-mac-os-x/
NVIDIA CUDA Toolkit (available at http://developer.nvidia.com/cuda-downloads)
I followed both instruction on http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-mac-os-x/
and instruction from the deepnet to set the system paths:
Follow the deepnet instruction: for mac, it is the ‘~.profile’, edit/add to the file:
First make sure CUDA installed right:
install the examples: cuda-install-samples-5.5.sh <dir>
and go to /Developer/NVIDIA/CUDA-5.5/samples, choose any simple example subfolder, go into and do ‘make’, after make completed, you can do a simple test.
(c) Protocol Buffers.
Download the file: http://code.google.com/p/protobuf/
Follow the instructions to compile/install it. It will be install (generally in /usr/local/bin/protoc). It was said that you only need to include the directory that contains ‘proc’, so add to path:
(2) COMPILING CUDAMAT AND CUDAMAT_CONV
For making the cuda work, do ‘make’ in cudamat , but change all the ‘uint’ to ‘unsigned’ in file: cudamat_conv_kernels.cuh
or do a #define uint unsigned
Then run ‘make’ in cudamat folder
(3,4) STEP 3,4
continue follow step 3, and 4 on https://github.com/nitishsrivastava/deepnet/blob/master/INSTALL.txt. and you will get there.
Note (1): I did not install separately for cudamat library by Vlad Mnih and cuda-convnet library by Alex Krizhevsky.
Note (2): If you do NOT have GPU: another alternative is to not use GPU, most recent mac come with NVIDIA 650, but some old version may use intel graphical card. In that case you can still do the deep learning part, but using eigenmat. The drawback is that it will be very slow.
Install eigen from here: http://eigen.tuxfamily.org/index.php?title=Main_Page
if given error <Eigen/..> can not found, change to “Eigen/…”
also you need to change python path, including path to where ‘libeigenmat.dylib’ located. It it still fails to find: libeigenmat.dylib. It may not hurt to give it a direct path, edit the file <eigenmat/eigenmat.py>.
_eigenmat = ct.cdll.LoadLibrary(‘the-path-to/libeigenmat.dylib’)
There are multiple different choice of activation functions for a NN. Many work has shown that using Rectified linear unit (ReLU) helps improve discriminative performance.
The figure below shows few popular activation functions, including sigmoid, and tanh.
sigmoid: g(x) = 1 /(1+exp(-1)). The derivative of sigmoid function g’(x) = (1-g(x))g(x).
tanh : g(x) = sinh(x)/cosh(x) = ( exp(x)- exp(-x) ) / ( exp(x) + exp(-x) )
Rectifier (hard ReLU) is really a max function
Another version is Noise ReLU max(0, x+N(0, σ(x)). ReLU can be approximated by a so called softplus function (for which the derivative is the logistic functions):
g(x) = log(1+exp(x))
The derivative of hard ReLU is constant over two ranges x<0 and x>=0, for x>0, g’=1, and x<0, g’=0.
This recent icml paper has discussed the possible reasons that why ReLU sometimes outperform sigmoid function:
- Hard ReLU is naturally enforcing sparsity.
- The derivative of ReLU is constant, as compared to sigmoid function, for which the derivative dies out if we either increase x or decrease x.
Deep learning recently becomes such a hot topic across both academic and industry area. Guess the best way to learn some stuff is to implement them. So I checked the recent tutorial posted at
ACL 2012 + NAACL 2013 Tutorial: Deep Learning for NLP (without Magic)
There are two main parts for an autoencoder: feedforward and backpropagation. The essential thing needs to be calculated is the “error term”, because it is going to decide the partial derivatives for parameters including both W and the bias term b.
You can think of autoencoder as an unsupervised learning algorithm, that sets the target value to be equal to the inputs. But why so, or that is, then why bother to reconstruct the signal? The trick is actually in the hidden layers, where small number of nodes are used (smaller than the dimension of the input data — the sparsity enforced to the hidden layer). So you may see autoencoder has this ‘vase’ shape.
Thus, the network will be forced to learn a compressed representation of the input. You can think it of learning some intrinsic structures of the data, that is concise, analog to the PCA representation, where data can be represented by few axis. To enforce such sparsity, the average activation value ( averaged across all training samples) for each node in the hidden layer is forced to equal to a small value close to zero (called sparsity parameters) . For every node, a KL divergence between the ‘expected value of activation’ and the ‘activation from training data’ is computed, and adding to both cost function and derivatives which helps to update the parameters (W & b).
After learning completed, the weights represent the signals ( think of certain abstraction or atoms) that unsupervised learned from the data, like below:
Here’s video recording for Andrew Ng’s talk @ TechXploration
Abstract: How “Deep Learning” is Helping Machines Learn Faster
What deep learning is and how its algorithms are shaping the future of machine learning; computational challenges in working with these algorithms; Google’s “artificial neural network,” which learns by loosely simulating computations of the brain.
Stephen J. Write gave a brief, high-level, but very nice, clear keynote today at KDD’2013. Here’s the link to his talk slide, Several links he posted are also very interesting:
The panel discussion is about why and how Data Scientist should and can contribute to start up industry. People asked various questions like:
- What are the tips of choosing a startup, is a real ‘start’- type start-up better, or a start-up but kind of established better?
- As an investigator, what are the key things to evaluate to make the judgment that whether the start-up is worthy to investigate?
CEO, team, find a business co-founder, idea, execution, hiring, patent,
- In this (big)data analysis, minding area, does people (data scientist) still need or better to get a PHD?
- When choose to start a ‘start-up’, when choose to leave a big company, leave a stable job & life, how do you evaluate, perceive and think about the risk behind this? Is it rewarding a lot (financially) ? Is financial gain your objective function?
You can find a more complete note here.
Interesting work on “language memorability” by Jon Kleinberg
* You had me at hello: How phrasing affects memorability, Cristian Danescu-Niculescu-Mizil, Justin Cheng, Jon Kleinberg, Lillian Lee
It’s counterpart “memory on visual data“:
* Making Personas Memorable , CHI, 2007, extended abstracts on Human Factors in Computing Systems.
* Konkle, T., Brady, T. F., Alvarez, G. A., & Oliva, A. (2010). Conceptual distinctiveness supports detailed visual long-term memory for real-world objects. Journal of Experimental Psychology: General, 139(3), 558-78. <Data http://cvcl.mit.edu/MM/objectCategories.html >
Another dataset: <http://visualrecall.org/datasets.html >
More publications: <http://visualrecall.org/publications.html >
* Understanding the intrinsic memorability of images, NIPS 2011, Phillip Isola, Devi Parikh, Antonio Torrala, Aude Oliva,
Here’s the previous example on Logistic Regression using mahout.
Here‘s is my recent try out of Mahout K-means. There are some key points I think it’s necessary to clarify first. Mahout kmeans is mainly for text processing, if you need to process some numerical data, you need to write some utility functions to write the numerical data into sequence-vector format. For the general example “Reuters”, the first few Mahout steps are actually doing some data processing.
To be explicit, for reuters example, the original downloaded file is in SGML format, which is similar to XML. So we need to first parse(like preprocessing) those files into document-id and document-text. After that we can convert the file into sequenceFiles. SequencesFiles is kind of key-value format. Key is the document id and value is the document content. This step will be done using ‘seqdirectory’. Then use ‘seq2sparse’ do if-idf convert the id-text data to vectors (Vector Space Model: VSM).
For the first preprocessing job, a much quicker way is to reuse the Reuters parser given in the Lucene benchmark JAR file.
Because its bundled along with Mahout, all you need to do is change to the examples/ directory under the Mahout source tree and run the org.apache.lucene.benchmark.utils.ExtractReuters class. Details see the chapter 8 of book Mahout In Action. (http://manning.com/owen/MiA_SampleCh08.pdf)
The generated vectors dir should contain the following items:
We will then use tfidf-vectors to run kmeans. You could give a ‘fake’ initial center path, as given argument k, mahout will automatically random select k to initial the clustering.
mahout-0.5-cdh3u5:$./bin/mahout kmeans -i reuters-vectors/tfidf-vectors/ -o mahout-clusters -c mahout-initial-centers -c 0.1 -k 20 -x 10 -ow
The clustering results will look like this