Archive

Archive for the ‘Hbase’ Category

Hbase Shell

January 4, 2013 Leave a comment

# Useful commands:help ‘command’, status, list, describe ‘<tablename>’

The status command shows basic status about the cluster, such as whether there are dead nodes. status ‘simple’ and status ‘detailed’ show additional information.

# CREATE TABLES
hbase> create ‘t1’, {NAME => ‘fam1’}, {NAME => ‘fam2’}
hbase> create ‘t1’, ‘fam1’, ‘fam2’ # Shorthand

# SCAN
To scan the rows of a table, use scan ‘tablename’. There are several options that can be used to restrict what data is returned: COLUMNS: To retrieve only certain columns.For all the columns in a column family, leave the qualifier empty (e.g., ‘fam1:’). START ROW/STOP ROW: The row key to start or stop scanning from. TIMESTAMP: A specific timestamp to search for (this is a long type).
LIMIT: The number of row keys to return.

# COUNT
hbase> count ‘tablename’, 5000
To count the rows of a table, and report results every 5000 rows. This can be very slow for large tables.

# DELETE
The delete command can be used to delete certain columns.In order to delete an entire row including all of its columns, use delete all.To delete all the rows from a table, use truncate ‘tablename’. Under the hood, HBase will disable, drop, and re-create the table

* delete column in a row:
hbase> delete ”, ‘rowkey’, ‘col’
hbase> delete ‘t1’, ‘r1’, ‘fam1:c1’
* delete an entire row
hbase> delete ”, ‘rowkey’
hbase> delete ‘t1’, ‘r1’
* delete all the rows
hbase> truncate ‘<tablename>’

# REMOVE TABLE
To remove a table completely, use the dropcommand. However, a table cannot be dropped unless it is first disabled:
hbase> disable ‘<tablename>’hbase> drop ”
If the table hadmore than 1 Region, it is recommended to compact the META table:
hbase> major_compact’.META.’

# CHANGE COLUMN FAMILY
To change or add column families,the table must be disabled: disable ‘tablename’. While a table is disabled, clients will not be able to access the table. After the alteration, re-enable the table with the command enable ‘tablename’.
To add or change column families, the alter command uses the same syntax. The only difference is whether that column family name already exists or not.
To remove column families, this option must be included: METHOD => ‘delete’

* must disable table first
* Add or change column families
hbase> alter ‘<tablename>’, {NAME => ‘<colfam>’ [,<options>]}

* Remove column families
hbase> alter ‘<tablename>’, {NAME => ‘<colfam>’, METHOD =>’delete’}

# HBASE and HDFS

* Store files are stored as HFiles in HDFS

– sorted key/value pairs and an index of keys

– /hbase/tablename/region/column-family

 

 

Advertisements
Categories: Hbase

New to Hbase (Note 1)

September 10, 2012 Leave a comment

* TIME OUT PROBLEM

Very new to Hbase. Just had the time-out exception problem, since I am looping though a 300×300 size of image in the mapper. The exception means that the ‘next’ in mapper taking up too long time to wait.

org.apache.hadoop.hbase.client.ScannerTimeoutException: 556054ms passed since the last invocation, timeout is currently set to 300000

One solution would be increase the timeout limit <hbase.rpc.timeout>

conf.setLong("hbase.rpc.timeout", 6000000);

* Understand HADOOP_CLASSPATH

In the very begning, I thought it was the folder path. However, it turns out to be the exactly jar file path. What I end up doing is to have several bash command in the bashprofile, to automatically add each one:

jfs=$(ls /home/username/mylib/*.jar)
for jf in $jfs ;do
  # echo "$jf"
  export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:"$jf"
done

* DIFFERENCE of a typical HADOOP job vs. HBASE-HADOOP job

For hadoop job,  to set the mapper and reducer classes (input from hdfs, output to hdfs):

job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);

The input/output hdfs pathes will also need to be set, e.g. ‘FileOutputFormat.setOutputPath

Mapper and Reducer class extend ‘org.apache.hadoop.mapreduce.Mapper;’ and ‘org.apache.hadoop.mapreduce.Reducer;

For hbase-hadoop job,  to set the mapper and reducer classes(input from htable, output to htable):

TableMapReduceUtil.initTableMapperJob("hbase-input-table-name", scan,  hMapper.class, OneKindWritable.class,
                    OneKindWritable.class, job);
TableMapReduceUtil.initTableReducerJob("hbase-output-table-name", hReducer.class, job);

Mapper and Reducer class extend ‘org.apache.hadoop.hbase.mapreduce.TableMapper;’ and ‘org.apache.hadoop.hbase.mapreduce.TableReducer;’

The ouput table should be created before launch the job, with corresponding column family name and qualifier name as you may did in your code. For input part, you can set up certain filters for ‘scan’, add input columns with family name and qualifier.

A nice thing is you can mix those settings, so you can read data from hdfs, output to hbase, or read data from hbase output to hdfs.

Nice tips:

efficient hadoop : http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/

http://hbase.apache.org/book/mapreduce.example.html

Categories: Hadoop, Hbase