Mahout Logistic Regression
classifier.sgd
# Check infor for help
$mahout org.apache.mahout.classifier.sgd.TrainLogistic –help
$mahout org.apache.mahout.classifier.sgd.RunLogistic –help
# Example of Training
# To train the model– model stored in donut.model, which is a json type file, to read the file better, try http://jsonviewer.stack.hu/
$mahout org.apache.mahout.classifier.sgd.TrainLogistic \
–passes 100 \
–rate 50 –lambda 0.001 \
–input /mahout_examples/donut.csv \
–features 21 \
–output /mahout_examples/donut.model \
–target color \
–categories 2 \
–predictors x y xx xy yy a b c –types n n
Then you should be able to get from the terminal:
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop
No HADOOP_CONF_DIR set, using /usr/lib/hadoop/conf
11/11/01 17:58:21 WARN driver.MahoutDriver: No org.apache.mahout.classifier.sgd.TrainLogistic.props found on classpath, will use command-line arguments only
21
color ~ 5.048*Intercept Term + 3.747*x + 4.530*y + -3.986*xx + 2.191*xy + -4.723*yy + 0.562*a + -0.580*b + -22.188*c
Intercept Term 5.04769
a 0.56192
b -0.57986
c -22.18806
x 3.74697
xx -3.98555
xy 2.19129
y 4.52954
yy -4.72268
-3.985546155 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 -0.579859597 4.529541855 0.000000000 0.000000000 0.000000000 -4.722678608 5.047685107 0.000000000 0.000000000 2.191286892 0.561916360 -22.188056574 0.000000000 0.000000000 3.746971894
11/11/01 17:58:22 INFO driver.MahoutDriver: Program took 858 ms
# To test the model
/usr/lib/mahout/bin/mahout org.apache.mahout.classifier.sgd.RunLogistic –help
$mahout org.apache.mahout.classifier.sgd.RunLogistic \
–input /mahout_examples/donut-test.csv \
–model /mahout_examples/donut.model –auc \
–scores –confusion
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop
AUC: Area under Curve
http://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_Under_Curve
Could you please share the CSV file’s format (/mahout_examples/donut.csv)? I’m puzzled how Mahout can understand the column name you specified on the command line. Thank you!
Hi, if you check the source code, you will find it there: ./mahout-0.5-cdh3u5/examples/src/main/resources/donut.csv
I have tested the example successfully. Looks like the job run locally instead of on hadoop cluster. Wondering how I can run it on cluster. Thank you!