library(ggplot2)
library(rpart)
library(rpart.plot)
library(tree)
url = "https://liangfgithub.github.io/Data/HousingData.csv"
Housing = read.csv(url)

There are two R packages for tree models, tree and rpart. We will mainly use rpart. The package tree is called for its command partition.tree, which we use to generate the first figure.

The tree Package

The syntax for fitting a regression tree with tree is similar to that used for linear regression models. In this example, we use just two predictors, longitude and latitude, from the Housing data to predict the variable Y.

trfit= tree(Y ~ lon + lat, data = Housing)
small.tree = prune.tree(trfit, best = 7)
small.tree
node), split, n, deviance, yval
      * denotes terminal node

  1) root 506 84.1800 3.035  
    2) lon < -71.0667 202 16.8800 3.297 *
    3) lon > -71.0667 304 44.1400 2.860  
      6) lon < -71.0155 185 32.8300 2.752  
       12) lat < 42.241 147 27.4400 2.671  
         24) lat < 42.1698 18  0.8337 3.104 *
         25) lat > 42.1698 129 22.7700 2.611  
           50) lon < -71.0332 102 15.4200 2.707  
            100) lat < 42.2011 51  3.5830 2.525 *
            101) lat > 42.2011 51  8.4700 2.888 *
           51) lon > -71.0332 27  2.8910 2.250 *
       13) lat > 42.241 38  0.6923 3.066 *
      7) lon > -71.0155 119  5.8160 3.028 *
par(mfrow = c(1, 2))
plot(small.tree)
text(small.tree, cex = .75)

price.quantiles = cut(Housing$Y, quantile(Housing$Y, 0: 20 / 20),
  include.lowest = TRUE)
plot(Housing$lat, Housing$lon, col = grey(20: 1 / 21)[price.quantiles],
  pch = 20, ylab = "Longitude", xlab = "Latitude")
partition.tree(small.tree, ordvars = c("lat", "lon"), add = TRUE)