In our code for analyzing the spam data, I utilized the
tree package, which differs from the rpart
package we employed for [regression
trees]. It’s worth noting that rpart can also be
utilized for classification tasks. I encourage you to explore the
rpart package yourself.
First, we fit a large classification tree on the Spam Data. You can
use the summary function to obtain some summary information
about the tree and understand the outputs it returns.
library(tree)
spam.tr=tree(Y~., spam[-test.id,], mindev=0.005, minsize=2); # Grow a BIG tree
summary(spam.tr)## 
## Classification tree:
## tree(formula = Y ~ ., data = spam[-test.id, ], mindev = 0.005, 
##     minsize = 2)
## Variables actually used in tree construction:
##  [1] "Cdollar"    "Wremove"    "Cexclam"    "Whp"        "CAPlongest"
##  [6] "Wfree"      "Wour"       "W1999"      "Wedu"       "W650"      
## [11] "CAPave"     "Wbusiness"  "W85"        "Wpm"        "Wgeorge"   
## [16] "Wmeeting"   "Wemail"    
## Number of terminal nodes:  24 
## Residual mean deviance:  0.3669 = 1342 / 3657 
## Misclassification error rate: 0.06031 = 222 / 3681For example, let’s delve into how the ‘df’ and ‘deviance’ are computed.
leaf.nodes is a 26-by-6 matrix, where each row
represents a leaf node. The column ‘n’ denotes the number of training
samples in that node.
To illustrate, the sum of the ‘n’ column should be equal to the size of the training dataset.
The number of leaf nodes is the same as the number of parameters. Therefore, the degrees of freedom (‘df’) are equal to the size of the training samples minus the number of leaf nodes.
Next, let’s explore how the ‘deviance’ is computed.
mydev, computed below, should agree with the output from
summary(spam.tr)
Cross-validation based on deviance.
cv.spam1=cv.tree(spam.tr)  
names(cv.spam1)
cv.spam1
plot(cv.spam1$size ,cv.spam1$dev ,type="b")
cv.spam1$size[which.min(cv.spam1$dev)]Cross-validation based on misclassification rate.
Cut the tree to the desired size using two different criteria: misclass and deviance. The results may be slightly different.
bestsize=14;
spam.tr1=prune.tree(spam.tr, best=bestsize);
spam.tr2=prune.misclass(spam.tr, best=bestsize); # should be the same as: 
#spam.tr2=prune.tree(spam.tr, method="misclass", best=bestsize);
par(mfrow=c(1,2));
plot(spam.tr1);
text(spam.tr1, cex=0.5);
plot(spam.tr2);
text(spam.tr2, cex=0.5);It’s important to note that in the left tree (which has been trimmed based on deviance), there is a particular node where the two leaf nodes have the same prediction. Such a branch might be pruned if we use the mis-classification rate as a criterion for tree pruning, as the mis-classification rate would remain the same if we were to cut that branch. However, it’s worth mentioning that such a branch could still contribute significantly to reducing the deviance.
Check how to obtain predictions using the fitted classification tree.
The training and testing mis-classification errors are given below.
It seems prune.misclass performs better if the performance
is measured by 0/1 loss. But as mentioned before, it’s better to use
Gini or deviance to grow a tree.
train.pred1 = predict(spam.tr1, spam[-test.id,], type="class")
table(train.pred1, spam$Y[-test.id])
train.pred2 = predict(spam.tr2, spam[-test.id,], type="class")
table(train.pred2, spam$Y[-test.id])
test.pred1 = predict(spam.tr1, spam[test.id,], type="class")
table(test.pred1, spam$Y[test.id])
test.pred2 = predict(spam.tr2, spam[test.id,], type="class")
table(test.pred2, spam$Y[test.id])```