Archive for January, 2013

Tools for Large Scale Learning

January 31, 2013 Leave a comment
Some tools for large scale learning, mostly running on hadoop. Please recommend in your comments, I’ll put it into this listing:
Vowpal Wabbit (VW) fast learning:
Mahout: provides a scalable machine learning library
R: integration of R in Hadoop, might be slow on single machine
Categories: Uncategorized Tags:

Large-scale machine learning course & streaming organism classification

January 10, 2013 Leave a comment

Follow the Data

The NYU Large Scale Machine Learning course looks like it will be very worthwhile to follow. The instructors, John Langford and Yann Le Cun, are both key figures in the machine learning field – for instance, the former developed Vowpal Wabbit and the latter has done pioneering work in deep learning. It is not an online course like those at Coursera et al., but they have promised to put lecture videos and slides online. I’ll certainly try to follow along as best I can.


There is an interesting Innocentive challenge going on, called “Identify Organisms from A Stream of DNA Sequences.” This is interesting to me both because of the subject matter (classification based on DNA sequences) and also because the winner is explicitly required to submit an efficient, scalable solution (not just a good classifier.) Also, the prize sum is one million US dollars! It’s…

View original post 32 more words

Categories: Uncategorized

bashrc and his friends

January 10, 2013 1 comment

I am having some time that whenever I login to the Linux system, .bashrc is not automatically started. Anyway, fixed this morning. The reason is BASH executes ~/.bashrc when you start “an interactive shell that is not a login shell.” When the shell is a login shell, it starts ~/.bash_profile or ~/.profile instead. A terminal window is not a login shell, but apparently the shell started by an SSH login is. so basically simple ass source $HOME/.bashrc to ~/.bash_profile or ~/.profile will work.

Categories: Linux

The better business – advertising via social network

January 5, 2013 Leave a comment

I guess one of the morden ways to advertising is to use social network. My friend says that every company is an “advertising company”. They let you registered, study you and promote goods to you. I understand the need of survival of the company, but sometimes, there are great differences on how to leverage the social element. I think it’s a matter of fact about the spirit: company’s motto, and the mechanism: the functionality of their product (either their original goal or something later on developed/discovered by the user on the fly). When company’s interest meets the interest of their consumer,  it’s win-win results. When not, it’s sometime like butterfly emerge from cocoon. Something certainly lost, it’s only the matter of fact that whether the butterfly turns out to be beautiful. Nothing is absolutely good or bad. Personally, I don’t like the side-bar displaying advertisement all the time, or if I found out it’s part of the commercial ads when I simply want to enjoy a beautiful picture. Like me, many consumers may prefer service rather than finding everything connected to commercial product. It’s very subtul to recognize the tolerance of customer on adverting. Here’s some links to see how people are trying to promote their business by using public social network.

How to: Promote your business on Pinterest – and why it is important!

How To: A good start with Twitter marketing

Be a Better Facebook Blogger — Be Original (And Brief)


Categories: Interesting Stuff

Hbase Shell

January 4, 2013 Leave a comment

# Useful commands:help ‘command’, status, list, describe ‘<tablename>’

The status command shows basic status about the cluster, such as whether there are dead nodes. status ‘simple’ and status ‘detailed’ show additional information.

hbase> create ‘t1’, {NAME => ‘fam1’}, {NAME => ‘fam2’}
hbase> create ‘t1’, ‘fam1’, ‘fam2’ # Shorthand

To scan the rows of a table, use scan ‘tablename’. There are several options that can be used to restrict what data is returned: COLUMNS: To retrieve only certain columns.For all the columns in a column family, leave the qualifier empty (e.g., ‘fam1:’). START ROW/STOP ROW: The row key to start or stop scanning from. TIMESTAMP: A specific timestamp to search for (this is a long type).
LIMIT: The number of row keys to return.

hbase> count ‘tablename’, 5000
To count the rows of a table, and report results every 5000 rows. This can be very slow for large tables.

The delete command can be used to delete certain columns.In order to delete an entire row including all of its columns, use delete all.To delete all the rows from a table, use truncate ‘tablename’. Under the hood, HBase will disable, drop, and re-create the table

* delete column in a row:
hbase> delete ”, ‘rowkey’, ‘col’
hbase> delete ‘t1’, ‘r1’, ‘fam1:c1’
* delete an entire row
hbase> delete ”, ‘rowkey’
hbase> delete ‘t1’, ‘r1’
* delete all the rows
hbase> truncate ‘<tablename>’

To remove a table completely, use the dropcommand. However, a table cannot be dropped unless it is first disabled:
hbase> disable ‘<tablename>’hbase> drop ”
If the table hadmore than 1 Region, it is recommended to compact the META table:
hbase> major_compact’.META.’

To change or add column families,the table must be disabled: disable ‘tablename’. While a table is disabled, clients will not be able to access the table. After the alteration, re-enable the table with the command enable ‘tablename’.
To add or change column families, the alter command uses the same syntax. The only difference is whether that column family name already exists or not.
To remove column families, this option must be included: METHOD => ‘delete’

* must disable table first
* Add or change column families
hbase> alter ‘<tablename>’, {NAME => ‘<colfam>’ [,<options>]}

* Remove column families
hbase> alter ‘<tablename>’, {NAME => ‘<colfam>’, METHOD =>’delete’}


* Store files are stored as HFiles in HDFS

– sorted key/value pairs and an index of keys

– /hbase/tablename/region/column-family



Categories: Hbase