Archive for the ‘Uncategorized’ Category

Pig Error for string to long

October 19, 2016 Leave a comment

Gotten error message as:

Error: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Long

The original pig script is as:


score = FOREACH score_account_kpi_avro
GENERATE FLATTEN(STRSPLIT(uid,’:’)) as (account_id:chararray,
date_sk:chararray, index:long), (double)predictedLabel, (double)predictedProb; — (xxxxxxxx,2016-09-30,221905,221905.0,221905.6822910905)

Up to this stage, if you dump some examples, it will be fine. But if proceed joining other data or computing something, you’ll get the error of “java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Long”, which might be hard to tell why it happens.

What is happening here is when you do the split, you can’t cast one of the split entry into long directly (index:long). The right way  is to just get it as chararray type, and cast it in the downstrem process, for example:

score = FOREACH score GENERATE account_id AS account_id, (double)index as index,
(double)predictedLabel as predictedLabel, (double)predictedProb as predictedProb;

Categories: Uncategorized

Installing mxnet

December 23, 2015 Leave a comment

I wanted to install the newly released deep learning package “mxnet” on my mac. Here’s the instruction site:

It mostly comes fine, but I did have few problems including some linking error.

One is with ‘libtbb.dylib’, it keep complaining that it couldn’t find the lib, but when I check it it is in the right folder `/usr/local/lib` — which is actually a soft link to “/usr/local/Cellar/tbb/4.4-20150728/lib/”. The problem is actually because of the false configuration in opencv.pc. So what I did was to open “/usr/local/lib/pkgconfig/opencv.pc” (which provides the meta-information for pkg-config) and change -llibtbb.dylib to -ltbb.

I also got other few linking errors for libJPEG.dylib, libtiff.dylib and libpng.dylib. What I found is that they points to few libs like “/usr/local/Cellar/jpeg/8d/lib/libjpeg.dylib” or “/usr/local/Cellar/libtiff/4.0.6/lib/libtiff.dylib” but it seems that they are not the ones expected.

Screen Shot 2015-12-23 at 10.56.47 AM

Screen Shot 2015-12-23 at 10.57.30 AM

To fix this:

# creates the locate database if it does not exist, this may take a longer time, so be patient
sudo launchctl load -w /System/Library/LaunchDaemons/

#do locate to locate the actual lib, for example
locate libJPEG.dylib

# suppose you got the path from the above command as abspath_to_lib, if the lib already exist in /usr/local/lib, you can remove it first.
ln -s abspath_to_lib /usr/local/libJPEG.dylib

Now, you can run one mnist example by `python example/image-classification/`. It should display the following results:

Screen Shot 2015-12-23 at 11.20.01 AM.png


Forward to the past

June 19, 2015 Leave a comment

I was listening to Hinton’s interview (on CBC Radio: He mentioned multiple times of possible break through on natural language understanding by using deep learning technology. It is definitely true that human reasoning is such a difficult task to modeling as it is so complex to be abstracted easily. While I watch my little boy grows, I was amazed every time he shows a new ability, ability to do something, and ability to understand/perceive something. When training my own model (on image instead), I start to gain more understanding of the model. Structure determines the function. In most cases, the training is more like a process of “trial and error”. It’s a big black box with complex structures and connections. One of the biggest advantage of such learning network is its ability to automatically learn the representation, or say to abstract things. With abstraction in our logical system, we are able to organize things, dissect things, compose things, and possibly to create new things. Given what the network can already see/imaging (, it’s likely down the few years later, a network on human language could help us to translate the languages that went extinct thousands years by simply seeing over and over those scripts. This would be so wonderful cause so many ancient civilization will start shine again. Maybe I should call this “Forward to the Past”.

Categories: Uncategorized Tags: ,

Remote access ipython notebooks

February 18, 2015 1 comment

Original post:

remote$ipython notebook --no-browser --port=8889

local$ssh -N -f -L localhost:8888:localhost:8889 remote_user@remote_host

To close the SSH tunnel on the local machine, look for the process and kill it manually:

local_user@local_host$ ps aux | grep localhost:8889
local_user 18418  0.0  0.0  41488   684 ?        Ss   17:27   0:00 ssh -N -f -L localhost:8888:localhost:8889 remote_user@remote_host
local_user 18424  0.0  0.0  11572   932 pts/6    S+   17:27   0:00 grep localhost:8889

local_user@local_host$ kill -15 18418

Alternatively, you can start the tunnel without the -f option. The process will then remain in the foreground and can be killed with ctrl-c.

On the remote machine, kill the IPython server with ctrl-c ctrl-c.

Note: If you are running GPU & Theano on your remote machine, you can launch the notebook by:

THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 ipython notebook –no-browser –port=8889

Another simple way is to do the following (adding ip=*):

# In the remote server

$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 ipython notebook –no-browser –ip=* –port=7777

then you can reach the notebook from http:// the-ip-address-of-your-remote-server:7777/

Categories: Uncategorized Tags:

Deep learning on visual recognition task

May 13, 2014 Leave a comment

The current benchmark on visual recognition task:

Categories: Uncategorized Tags:

Tools for Large Scale Learning

January 31, 2013 Leave a comment
Some tools for large scale learning, mostly running on hadoop. Please recommend in your comments, I’ll put it into this listing:
Vowpal Wabbit (VW) fast learning:
Mahout: provides a scalable machine learning library
R: integration of R in Hadoop, might be slow on single machine
Categories: Uncategorized Tags:

Large-scale machine learning course & streaming organism classification

January 10, 2013 Leave a comment

Follow the Data

The NYU Large Scale Machine Learning course looks like it will be very worthwhile to follow. The instructors, John Langford and Yann Le Cun, are both key figures in the machine learning field – for instance, the former developed Vowpal Wabbit and the latter has done pioneering work in deep learning. It is not an online course like those at Coursera et al., but they have promised to put lecture videos and slides online. I’ll certainly try to follow along as best I can.


There is an interesting Innocentive challenge going on, called “Identify Organisms from A Stream of DNA Sequences.” This is interesting to me both because of the subject matter (classification based on DNA sequences) and also because the winner is explicitly required to submit an efficient, scalable solution (not just a good classifier.) Also, the prize sum is one million US dollars! It’s…

View original post 32 more words

Categories: Uncategorized