Lasagne: light weighted Theano extension, Theano can be used explicitly
Keras: is a minimalist, highly modular neural network library in the spirit of Torch, written in Python, that uses Theano under the hood for fast tensor manipulation on GPU and CPU. It was developed with a focus on enabling fast experimentation.
Pylean2: wrapper for Theano, yaml, experimental oriented.
Caffe: CNN oriented deep learning framework using c++, with python wrapper, easy model definitions using prototxt.
Theano: general gpu math
nolearn: a probably even simpler one
you can find more here.
For Lasagne and nolearn, they are still in the rapid develop stage, so they changes a lot. Be careful with the versions installed, they need to match each other. If you are having problems such as “cost must be a scalar”, you can refer link here to solve it by uninstall and reinstall them.
pip uninstall Lasagne pip uninstall nolearn pip install -r https://raw.githubusercontent.com/dnouri/kfkd-tutorial/master/requirements.txt
I was listening to Hinton’s interview (on CBC Radio: http://nvda.ly/OioP3). He mentioned multiple times of possible break through on natural language understanding by using deep learning technology. It is definitely true that human reasoning is such a difficult task to modeling as it is so complex to be abstracted easily. While I watch my little boy grows, I was amazed every time he shows a new ability, ability to do something, and ability to understand/perceive something. When training my own model (on image instead), I start to gain more understanding of the model. Structure determines the function. In most cases, the training is more like a process of “trial and error”. It’s a big black box with complex structures and connections. One of the biggest advantage of such learning network is its ability to automatically learn the representation, or say to abstract things. With abstraction in our logical system, we are able to organize things, dissect things, compose things, and possibly to create new things. Given what the network can already see/imaging (http://goo.gl/A1sL8N), it’s likely down the few years later, a network on human language could help us to translate the languages that went extinct thousands years by simply seeing over and over those scripts. This would be so wonderful cause so many ancient civilization will start shine again. Maybe I should call this “Forward to the Past”.
The current benchmark on visual recognition task:
Deep learning recently becomes such a hot topic across both academic and industry area. Guess the best way to learn some stuff is to implement them. So I checked the recent tutorial posted at
ACL 2012 + NAACL 2013 Tutorial: Deep Learning for NLP (without Magic)
There are two main parts for an autoencoder: feedforward and backpropagation. The essential thing needs to be calculated is the “error term”, because it is going to decide the partial derivatives for parameters including both W and the bias term b.
You can think of autoencoder as an unsupervised learning algorithm, that sets the target value to be equal to the inputs. But why so, or that is, then why bother to reconstruct the signal? The trick is actually in the hidden layers, where small number of nodes are used (smaller than the dimension of the input data — the sparsity enforced to the hidden layer). So you may see autoencoder has this ‘vase’ shape.
Thus, the network will be forced to learn a compressed representation of the input. You can think it of learning some intrinsic structures of the data, that is concise, analog to the PCA representation, where data can be represented by few axis. To enforce such sparsity, the average activation value ( averaged across all training samples) for each node in the hidden layer is forced to equal to a small value close to zero (called sparsity parameters) . For every node, a KL divergence between the ‘expected value of activation’ and the ‘activation from training data’ is computed, and adding to both cost function and derivatives which helps to update the parameters (W & b).
After learning completed, the weights represent the signals ( think of certain abstraction or atoms) that unsupervised learned from the data, like below: