Install Deepnet on Mac

November 15, 2013 3 comments

This may help to have Nitish’s deepnet work on your mac. The code is very clean, most important thing is to follow the instructions here


a) You will need Numpy, Scipy installed first, because the tools is largely python. Simply way is to use ‘brew‘. For example, follow the instructions here.

b) CUDA Toolkit and SDK.
Follow the instructions(CUDA5.5):
NVIDIA CUDA Toolkit (available at

I followed both instruction on
and instruction from the deepnet to set the system paths:

export PATH=/Developer/NVIDIA/CUDA-5.5/bin:$PATH

Follow the deepnet instruction: for mac, it is the ‘~.profile’, edit/add to the file:

export CUDA_BIN=/usr/local/cuda-5.0/bin
export CUDA_LIB=/usr/local/cuda-5.0/lib

First make sure CUDA installed right:
install the examples: <dir>

and go to /Developer/NVIDIA/CUDA-5.5/samples, choose any simple example subfolder, go into and do ‘make’, after make completed, you can do a simple test.

(c) Protocol Buffers.

Download the file:

Follow the instructions to compile/install it.  It will be install (generally in /usr/local/bin/protoc). It was said that you only need to include the directory that contains ‘proc’, so add to path:
export PATH=$PATH:/usr/local/bin


For making the cuda work, do ‘make’ in cudamat , but change all the ‘uint’ to ‘unsigned’ in file: cudamat_conv_kernels.cuh
or do a #define uint unsigned
Then run ‘make’ in cudamat folder

(3,4) STEP 3,4

continue follow step 3, and 4 on you will get there.

Note (1): I did not install separately for  cudamat library by Vlad Mnih and cuda-convnet library by Alex Krizhevsky.

Note (2): If you do NOT have GPU: another alternative is to not use GPU, most recent mac come with NVIDIA 650, but some old version may use intel graphical card. In that case you can still do the deep learning part, but using eigenmat. The drawback is that it will be very slow. 

Install eigen from here:
if given error <Eigen/..> can not found, change to “Eigen/…”
also you need to change python path, including path to where ‘libeigenmat.dylib’ located. It it still fails to find: libeigenmat.dylib. It may not hurt to give it a direct path, edit the file <eigenmat/>.
_eigenmat = ct.cdll.LoadLibrary(‘the-path-to/libeigenmat.dylib’)

Rectifier Nonlinearities

November 6, 2013 Leave a comment

There are multiple different choice of activation functions for a NN. Many work has shown that using Rectified linear unit (ReLU) helps improve discriminative performance.

The figure below shows few popular activation functions, including sigmoid, and tanh.


sigmoid:       g(x) = 1 /(1+exp(-1)). The derivative of sigmoid function g’(x) = (1-g(x))g(x).

tanh :              g(x) = sinh(x)/cosh(x) = ( exp(x)- exp(-x) ) / ( exp(x) + exp(-x) )

Rectifier (hard ReLU) is really a max function


Another version is Noise ReLU max(0, x+N(0, σ(x)). ReLU can be approximated by a so called softplus function (for which the derivative is the logistic functions):

g(x) = log(1+exp(x))

The derivative of hard ReLU is constant over two ranges x<0 and x>=0, for x>0, g’=1, and x<0, g’=0.

This recent icml paper has discussed the possible reasons that why ReLU sometimes outperform sigmoid function:

  • Hard ReLU is naturally enforcing sparsity.
  • The derivative of ReLU is constant, as compared to sigmoid function, for which the derivative dies out if we either increase x or decrease x.
Categories: Machine Learning

Exercising Sparse Autoencoder

November 5, 2013 Leave a comment

Deep learning recently becomes such a hot topic across both academic and industry area. Guess the best way to learn some stuff is to implement them.  So I checked the recent tutorial posted at

ACL 2012 + NAACL 2013 Tutorial: Deep Learning for NLP (without Magic)

and they have a nice ‘assignment‘ for whoever wants to learn for sparse autoencoder. So I get my hands on it, and final codes are here.

There are two main parts for an autoencoder: feedforward and backpropagation. The essential thing needs to be calculated is the “error term”, because it is going to decide the partial derivatives for parameters including both W and the bias term b.

You can think of autoencoder as an unsupervised learning algorithm, that sets the target value to be equal to the inputs. But why so, or that is, then why bother to reconstruct the signal? The trick is actually in the hidden layers, where small number of nodes are used (smaller than the dimension of the input data —  the sparsity enforced to the hidden layer).  So you may see autoencoder has this ‘vase’ shape.




Thus, the network will be forced to learn a compressed representation of the input. You can think it of learning some intrinsic structures of the data, that is concise, analog to the PCA representation, where data can be represented by few axis. To enforce such sparsity, the average activation value ( averaged across all training samples) for each node in the hidden layer is forced to equal to a small value close to zero (called sparsity parameters) . For every node, a KL divergence between the ‘expected value of activation’ and the ‘activation from training data’ is computed, and adding to both cost function and derivatives which helps to update the parameters (W & b).

After learning completed, the weights represent the signals ( think of certain abstraction or atoms) that unsupervised learned from the data, like below:







Andrew Ng’s talk @ TechXploration

September 4, 2013 Leave a comment

Here’s video recording for Andrew Ng’s  talk @ TechXploration

Abstract: How “Deep Learning” is Helping Machines Learn Faster

What deep learning is and how its algorithms are shaping the future of machine learning; computational challenges in working with these algorithms; Google’s “artificial neural network,” which learns by loosely simulating computations of the brain.








Categories: Computer Vision

KDD 13, another day

August 13, 2013 Leave a comment

Stephen J. Write gave a brief, high-level, but very nice, clear keynote today at KDD’2013. Here’s the link to his talk slide, Several links he posted are also very interesting:

The panel discussion is about why and how Data Scientist should and can contribute to start up industry. People asked various questions like:

- What are the tips of choosing a startup, is a real ‘start’- type start-up better, or a start-up but kind of established better?

- As an investigator, what are the key things to evaluate to make the judgment that whether the start-up is worthy to investigate?

CEO, team, find a business co-founder, idea, execution,  hiring, patent,

- In this (big)data analysis, minding area,  does people (data scientist) still need or  better to get a PHD?

- When choose to start a ‘start-up’, when choose to leave a big company, leave a stable job & life, how do you evaluate, perceive and think about the risk behind this?  Is it rewarding  a lot (financially) ? Is financial gain your objective function?

You can find a more complete note here.

Categories: Data Mining

Memory on X

August 11, 2013 Leave a comment

Interesting work on “language memorability” by  Jon Kleinberg

You had me at hello: How phrasing affects memorability,  Cristian Danescu-Niculescu-Mizil, Justin Cheng, Jon Kleinberg, Lillian Lee

It’s counterpart “memory on visual data“:

* Making Personas Memorable ,  CHI, 2007, extended abstracts on Human Factors in Computing Systems.

* Konkle, T., Brady, T. F., Alvarez, G. A., & Oliva, A. (2010). Conceptual distinctiveness supports detailed visual long-term memory for real-world objects. Journal of Experimental Psychology: General, 139(3), 558-78.  <Data >
Another dataset: < >
More publications:  < >

* Understanding the intrinsic memorability of images, NIPS 2011, Phillip Isola, Devi Parikh, Antonio Torrala, Aude Oliva,


Mahout k-means Example

July 27, 2013 Leave a comment

Here’s the previous example on Logistic Regression using mahout.

Here‘s is my recent try out of Mahout K-means. There are some key points I think it’s necessary to clarify first. Mahout kmeans is mainly for text processing, if you need to process some numerical data, you need to write some utility functions to write the numerical data into sequence-vector format. For the general example “Reuters”, the first few Mahout steps are actually doing some data processing.

To be explicit, for reuters example, the original downloaded file is in SGML format, which is similar to XML. So we need to first parse(like preprocessing) those files into document-id and document-text. After that we can convert the file into sequenceFiles.  SequencesFiles is kind of key-value format. Key is the document id and value is the document content. This step will be done using ‘seqdirectory’. Then use ‘seq2sparse’ do if-idf convert the id-text data to vectors (Vector Space Model: VSM).

For the first preprocessing job, a much quicker way is to reuse the Reuters parser given in the Lucene benchmark JAR file.
Because its bundled along with Mahout, all you need to do is change to the examples/ directory under the Mahout source tree and run the org.apache.lucene.benchmark.utils.ExtractReuters class. Details see the chapter 8 of book Mahout In Action. (

The generated vectors dir should contain the following items:

  • reuters-vectors/df-count
  • reuters-vectors/dictionary.file-0
  • reuters-vectors/frequency.file-0
  • reuters-vectors/tf-vectors
  • reuters-vectors/tfidf-vectors
  • reuters-vectors/tokenized-documents
  • reuters-vectors/wordcount

We will then use tfidf-vectors to run kmeans. You could give a ‘fake’ initial center path, as given argument k, mahout will automatically random select k to initial the clustering.

mahout-0.5-cdh3u5:$./bin/mahout kmeans -i reuters-vectors/tfidf-vectors/ -o mahout-clusters -c mahout-initial-centers -c 0.1 -k 20 -x 10 -ow

The clustering results will look like this

Categories: Hadoop, Mahout

Get every new post delivered to your Inbox.

Join 109 other followers