A note on randomForest in R

November 9, 2011

Using the importance value to select features.

Link: http://www.statmethods.net/advstats/cart.html


Random forests improve predictive accuracy by generating a large number of bootstrapped trees (based on random samples of variables), classifying a case using each tree in this new “forest”, and deciding a final predicted outcome by combining the results across all of the trees (an average in regression, a majority vote in classification). Breiman and Cutler’s random forest approach is implimented via therandomForest package.

Here is an example.

# Random Forest prediction of Kyphosis data
fit <- randomForest(Kyphosis ~ Age + Number + Start, data=kyphosis)
print(fit) # view results
importance(fit) # importance of each predictor

For more details see the comprehensive Random Forest website.

