library(MASS)
#help(lda)
#help(qda)
We illustrate the Discriminant Analysis using the digits dataset. I’ve created my own dataset, 40 samples from each class, totally 400 training samples and 400 test samples. Each data point represents a 16x16 image, resulting in a feature vector of length 256. Specifically, there are 40 images for each of the 10 classes in both the training and test datasets.
= "https://liangfgithub.github.io/Data/digits.rdata"
githubURL load(url(githubURL))
ls()
dim(X)
dim(Xtest)
table(Y)
table(Ytest)
par(mfrow=c(2,5), mai = c(0.2, 0.2, 0.2, 0.2))
for(i in 0:9){
= matrix(X[40*i+1,], 16, 16)
x image(x[,16:1], axes=FALSE)
}
First, ran Linear Discriminant Analysis (LDA) and generated a prediction table. The accuracy achieved is approximately 75%.
= lda(X,Y) dig.lda
= predict(dig.lda, Xtest)$class
Ytest.pred table(Ytest, Ytest.pred)
1 - sum(Ytest != Ytest.pred) / length(Ytest)
Attempting Quadratic Discriminant Analysis (QDA) led to an error message due to the rank deficiency of the sigma matrix for each group. Unfortunately, the MASS package couldn’t handle this issue.
= qda(X,Y) # error message dig.qda
Move on to Fisher’s Discriminant Analysis (FDA). Project the data
onto FDA directions (returned by the lda
function), which
amounted to nine (10-1 = 9) directions obtained from the LDA
function.
= dig.lda$scaling
FDA.dir dim(FDA.dir) # at most 10-1 = 9 directions
## [1] 256 9
= X%*%FDA.dir
F par(mfrow=c(1,2))
plot(F[,1],F[,2], type="n", xlab="", ylab="")
text(F[,1], F[,2], Y, col=Y+1)
= Xtest%*%dig.lda$scaling;
Ftest plot(Ftest[,1], Ftest[,2], type="n", xlab="", ylab="")
text(Ftest[,1], Ftest[,2], Ytest, col=Ytest+1)