Home > Data Mining, Hadoop, Mahout > Mahout Logistic Regression

Mahout Logistic Regression

classifier.sgd

# Check infor for help

$mahout org.apache.mahout.classifier.sgd.TrainLogistic –help
$mahout org.apache.mahout.classifier.sgd.RunLogistic –help

# Example of Training

# To train the model– model stored in donut.model, which is a json type file, to read the file better, try http://jsonviewer.stack.hu/

$mahout org.apache.mahout.classifier.sgd.TrainLogistic \
–passes 100 \
–rate 50 –lambda 0.001 \
–input /mahout_examples/donut.csv \

–features 21 \
–output /mahout_examples/donut.model \
–target color \
–categories 2 \
–predictors x y xx xy yy a b c –types n n

Then you should be able to get from the terminal:
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop
No HADOOP_CONF_DIR set, using /usr/lib/hadoop/conf
11/11/01 17:58:21 WARN driver.MahoutDriver: No org.apache.mahout.classifier.sgd.TrainLogistic.props found on classpath, will use command-line arguments only
21
color ~ 5.048*Intercept Term + 3.747*x + 4.530*y + -3.986*xx + 2.191*xy + -4.723*yy + 0.562*a + -0.580*b + -22.188*c
Intercept Term 5.04769
a 0.56192
b -0.57986
c -22.18806
x 3.74697
xx -3.98555
xy 2.19129
y 4.52954
yy -4.72268
-3.985546155 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 -0.579859597 4.529541855 0.000000000 0.000000000 0.000000000 -4.722678608 5.047685107 0.000000000 0.000000000 2.191286892 0.561916360 -22.188056574 0.000000000 0.000000000 3.746971894
11/11/01 17:58:22 INFO driver.MahoutDriver: Program took 858 ms

# To test the model
/usr/lib/mahout/bin/mahout org.apache.mahout.classifier.sgd.RunLogistic –help

$mahout org.apache.mahout.classifier.sgd.RunLogistic \
–input /mahout_examples/donut-test.csv \
–model /mahout_examples/donut.model –auc \
–scores –confusion

Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop

No HADOOP_CONF_DIR set, using /usr/lib/hadoop/src/conf
11/11/02 10:35:27 WARN driver.MahoutDriver: No org.apache.mahout.classifier.sgd.RunLogistic.props found on classpath, will use command-line arguments only
“target”,”model-output”,”log-likelihood”
0,0.004,-0.003696
0,0.003,-0.002722
1,0.959,-0.042384
1,0.977,-0.023617
0,0.000,-0.000166
1,0.922,-0.081457
1,0.678,-0.388569
0,0.160,-0.174764
0,0.019,-0.019335
0,0.740,-1.348002
0,0.040,-0.040603
1,0.873,-0.135365
1,0.106,-2.242013
1,0.933,-0.069273
1,0.997,-0.003449
0,0.106,-0.112158
1,0.971,-0.029869
0,0.001,-0.001182
1,0.898,-0.107512
0,0.000,-0.000007
0,0.103,-0.108486
0,0.033,-0.034022
0,0.003,-0.003357
0,0.722,-1.281526
0,0.002,-0.002285
1,0.997,-0.002749
1,0.968,-0.032817
0,0.013,-0.013217
0,0.458,-0.613088
0,0.020,-0.019809
0,0.563,-0.827950
0,0.178,-0.195591
0,0.340,-0.416144
0,0.043,-0.043604
0,0.020,-0.020153
0,0.088,-0.091683
1,0.649,-0.432606
0,0.832,-1.786718
0,0.007,-0.006844
0,0.014,-0.014132
AUC = 0.96
confusion: [[23.0, 1.0], [4.0, 12.0]]
entropy: [[-0.2, -2.3], [-4.2, -0.2]]
11/11/02 10:35:28 INFO driver.MahoutDriver: Program took 312 ms

 

AUC: Area under Curve 

http://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_Under_Curve

About these ads
  1. ryan
    January 3, 2013 at 6:33 PM | #1

    Could you please share the CSV file’s format (/mahout_examples/donut.csv)? I’m puzzled how Mahout can understand the column name you specified on the command line. Thank you!

  2. Wei
    January 4, 2013 at 8:50 AM | #2

    Hi, if you check the source code, you will find it there: ./mahout-0.5-cdh3u5/examples/src/main/resources/donut.csv

  3. ryan
    January 4, 2013 at 4:33 PM | #3

    I have tested the example successfully. Looks like the job run locally instead of on hadoop cluster. Wondering how I can run it on cluster. Thank you!

  1. July 27, 2013 at 7:30 PM | #1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 109 other followers

%d bloggers like this: