Home > Hadoop, Hbase > New to Hbase (Note 1)

New to Hbase (Note 1)

September 10, 2012 Leave a comment Go to comments

* TIME OUT PROBLEM

Very new to Hbase. Just had the time-out exception problem, since I am looping though a 300×300 size of image in the mapper. The exception means that the ‘next’ in mapper taking up too long time to wait.

org.apache.hadoop.hbase.client.ScannerTimeoutException: 556054ms passed since the last invocation, timeout is currently set to 300000

One solution would be increase the timeout limit <hbase.rpc.timeout>

conf.setLong("hbase.rpc.timeout", 6000000);

* Understand HADOOP_CLASSPATH

In the very begning, I thought it was the folder path. However, it turns out to be the exactly jar file path. What I end up doing is to have several bash command in the bashprofile, to automatically add each one:

jfs=$(ls /home/username/mylib/*.jar)
for jf in $jfs ;do
  # echo "$jf"
  export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:"$jf"
done

* DIFFERENCE of a typical HADOOP job vs. HBASE-HADOOP job

For hadoop job,  to set the mapper and reducer classes (input from hdfs, output to hdfs):

job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);

The input/output hdfs pathes will also need to be set, e.g. ‘FileOutputFormat.setOutputPath

Mapper and Reducer class extend ‘org.apache.hadoop.mapreduce.Mapper;’ and ‘org.apache.hadoop.mapreduce.Reducer;

For hbase-hadoop job,  to set the mapper and reducer classes(input from htable, output to htable):

TableMapReduceUtil.initTableMapperJob("hbase-input-table-name", scan,  hMapper.class, OneKindWritable.class,
                    OneKindWritable.class, job);
TableMapReduceUtil.initTableReducerJob("hbase-output-table-name", hReducer.class, job);

Mapper and Reducer class extend ‘org.apache.hadoop.hbase.mapreduce.TableMapper;’ and ‘org.apache.hadoop.hbase.mapreduce.TableReducer;’

The ouput table should be created before launch the job, with corresponding column family name and qualifier name as you may did in your code. For input part, you can set up certain filters for ‘scan’, add input columns with family name and qualifier.

A nice thing is you can mix those settings, so you can read data from hdfs, output to hbase, or read data from hbase output to hdfs.

Nice tips:

efficient hadoop : http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/

http://hbase.apache.org/book/mapreduce.example.html

Advertisements
Categories: Hadoop, Hbase
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: