Posts Tagged ‘Deep learning’

Few Python base Deep Learning Libs

June 23, 2015 Leave a comment

Lasagne: light weighted Theano extension, Theano can be used explicitly

Keras: is a minimalist, highly modular neural network library in the spirit of Torch, written in Python, that uses Theano under the hood for fast tensor manipulation on GPU and CPU. It was developed with a focus on enabling fast experimentation.

Pylean2: wrapper for Theano, yaml, experimental oriented.

Caffe: CNN oriented deep learning framework using c++, with python wrapper, easy model definitions using prototxt.

Theano: general gpu math

nolearn: a probably even simpler one

you can find more here.

For Lasagne and nolearn, they are still in the rapid develop stage, so they changes a lot. Be careful with the versions installed, they need to match each other. If you are having problems such as “cost must be a scalar”, you can refer link here to solve it by uninstall and reinstall them.

pip uninstall Lasagne
pip uninstall nolearn
pip install -r

Forward to the past

June 19, 2015 Leave a comment

I was listening to Hinton’s interview (on CBC Radio: He mentioned multiple times of possible break through on natural language understanding by using deep learning technology. It is definitely true that human reasoning is such a difficult task to modeling as it is so complex to be abstracted easily. While I watch my little boy grows, I was amazed every time he shows a new ability, ability to do something, and ability to understand/perceive something. When training my own model (on image instead), I start to gain more understanding of the model. Structure determines the function. In most cases, the training is more like a process of “trial and error”. It’s a big black box with complex structures and connections. One of the biggest advantage of such learning network is its ability to automatically learn the representation, or say to abstract things. With abstraction in our logical system, we are able to organize things, dissect things, compose things, and possibly to create new things. Given what the network can already see/imaging (, it’s likely down the few years later, a network on human language could help us to translate the languages that went extinct thousands years by simply seeing over and over those scripts. This would be so wonderful cause so many ancient civilization will start shine again. Maybe I should call this “Forward to the Past”.

Categories: Uncategorized Tags: ,

Deep learning on visual recognition task

May 13, 2014 Leave a comment

The current benchmark on visual recognition task:

Categories: Uncategorized Tags:

Exercising Sparse Autoencoder

November 5, 2013 Leave a comment

Deep learning recently becomes such a hot topic across both academic and industry area. Guess the best way to learn some stuff is to implement them.  So I checked the recent tutorial posted at

ACL 2012 + NAACL 2013 Tutorial: Deep Learning for NLP (without Magic)

and they have a nice ‘assignment‘ for whoever wants to learn for sparse autoencoder. So I get my hands on it, and final codes are here.

There are two main parts for an autoencoder: feedforward and backpropagation. The essential thing needs to be calculated is the “error term”, because it is going to decide the partial derivatives for parameters including both W and the bias term b.

You can think of autoencoder as an unsupervised learning algorithm, that sets the target value to be equal to the inputs. But why so, or that is, then why bother to reconstruct the signal? The trick is actually in the hidden layers, where small number of nodes are used (smaller than the dimension of the input data —  the sparsity enforced to the hidden layer).  So you may see autoencoder has this ‘vase’ shape.




Thus, the network will be forced to learn a compressed representation of the input. You can think it of learning some intrinsic structures of the data, that is concise, analog to the PCA representation, where data can be represented by few axis. To enforce such sparsity, the average activation value ( averaged across all training samples) for each node in the hidden layer is forced to equal to a small value close to zero (called sparsity parameters) . For every node, a KL divergence between the ‘expected value of activation’ and the ‘activation from training data’ is computed, and adding to both cost function and derivatives which helps to update the parameters (W & b).

After learning completed, the weights represent the signals ( think of certain abstraction or atoms) that unsupervised learned from the data, like below: