library(ggplot2)
library(rpart)
library(rpart.plot)
library(tree)
= "https://liangfgithub.github.io/Data/HousingData.csv"
url = read.csv(url) Housing
There are two R packages for tree models, tree
and
rpart
. We will mainly use rpart
. The package
tree
is called for its command partition.tree
,
which we use to generate the first figure.
The syntax for fitting a regression tree with tree
is
similar to that used for linear regression models. In this example, we
use just two predictors, longitude and latitude, from the Housing data
to predict the variable Y.
= tree(Y ~ lon + lat, data = Housing)
trfit= prune.tree(trfit, best = 7)
small.tree small.tree
node), split, n, deviance, yval
* denotes terminal node
1) root 506 84.1800 3.035
2) lon < -71.0667 202 16.8800 3.297 *
3) lon > -71.0667 304 44.1400 2.860
6) lon < -71.0155 185 32.8300 2.752
12) lat < 42.241 147 27.4400 2.671
24) lat < 42.1698 18 0.8337 3.104 *
25) lat > 42.1698 129 22.7700 2.611
50) lon < -71.0332 102 15.4200 2.707
100) lat < 42.2011 51 3.5830 2.525 *
101) lat > 42.2011 51 8.4700 2.888 *
51) lon > -71.0332 27 2.8910 2.250 *
13) lat > 42.241 38 0.6923 3.066 *
7) lon > -71.0155 119 5.8160 3.028 *
par(mfrow = c(1, 2))
plot(small.tree)
text(small.tree, cex = .75)
= cut(Housing$Y, quantile(Housing$Y, 0: 20 / 20),
price.quantiles include.lowest = TRUE)
plot(Housing$lat, Housing$lon, col = grey(20: 1 / 21)[price.quantiles],
pch = 20, ylab = "Longitude", xlab = "Latitude")
partition.tree(small.tree, ordvars = c("lat", "lon"), add = TRUE)