(PSL) Smoothing Splines

Load Libraries

The primary R function for fitting smoothing splines is smooth.spline available in the splines library.

library(splines)
help(smooth.spline)
options(digits = 4)

A Simulated Example

This is a simulated example from ESL Chapter 5.5.2. The true function is

\[ f(x) = \frac{\sin (12(x + 0.2))}{x+0.2}, \quad x \in [0, 1]. \] A graphical representation shows the true function as a grey curve, with observed data points shown as red dots.

set.seed(1234)
n = 30 
err = 1
x = sort(runif(n))
y = sin(12*(x+0.2))/(x+0.2) + rnorm(n, 0, err);
plot(x, y, col="red");

fx = 1:50/50;
fy = sin(12*(fx+0.2))/(fx+0.2)
lines(fx, fy, col=8, lwd=2);

Fit a Smoothing Spline Model

By using the smooth.spline function, different smoothing splines can be fit by adjusting the degree of freedom (df). Using the predict function, one can then determine the spline on a fine grid of points.

A lower df leads to a more penalized function which appears flatter since roughness is more penalized. This corresponds to a higher lambda value. As df increases, the fitted spline becomes more flexible or “wiggly”.

par(mfrow=c(2,2));      # 2x2 (totally 4) subplots
plot(x,y, xlab='x', ylab='y');
lines(fx, fy, col=8, lwd=1.5);
lines(predict(smooth.spline(x, y, df=5),fx),  lty=2, col='blue', lwd=1.5);
title('df=5');

plot(x,y,xlab='x', ylab='y');
lines(fx, fy, col=8, lwd=1.5);
lines(predict(smooth.spline(x, y, df=9),fx),  lty=2, col='blue', lwd=1.5);
title('df=9');

plot(x,y,xlab='x', ylab='y');
lines(fx, fy, col=8, lwd=1.5);
lines(predict(smooth.spline(x, y, df=15),fx),  lty=2, col='blue', lwd=1.5);
title('df=15');

plot(x,y,xlab='x', ylab='y');
lines(fx, fy, col=8, lwd=1.5);
lines(predict(smooth.spline(x, y, df=20),fx),  lty=2, col='blue', lwd=1.5);
title('df=20')

DR Basis Functions

The DR ((Demmler & Reinsch) basis functions are derived from the eigenvectors of the smoother matrix $S_{\lambda}$. Yet, this matrix isn’t directly returned by the smooth.spline function.

The following script uses a clever way to “reverse engineer” the smoother matrix $S_{\lambda}$ from a smoothing spline when using the smooth.spline function. By feeding in specific $\mathbf{y}$ vectors that have a single “1” and otherwise consist of “0”s, you’re effectively pulling out each column of $S_{\lambda}$ as the prediction of the spline. At the end, to ensure that the matrix is symmetric (which it should be by the nature of the smoother matrix), we average it with its transpose.

It’s important to note that this approach works because the smoother matrix is determined by x alone, and not influenced by y.

smooth.matrix = function(x, df){
# return the smoother matrix with knots x and degree of freedom = df
# this function is for x having unique values
 n = length(x)
 A = matrix(0, n, n)
 for(i in 1:n){
       y = rep(0, n); y[i]=1
       yi = smooth.spline(x, y, df=df)$y
       A[,i]= yi
       }
 return((A+t(A))/2)
 }
fx = 1:50/50
S4 = smooth.matrix(fx, df=4)
S9 = smooth.matrix(fx, df=9)
tmp = ns(fx, df=9, intercept=TRUE)
H9 = tmp%*%solve(t(tmp)%*%tmp)%*%t(tmp)

We now have three matrices:

An n-by-n smoother matrix for df equal to four S4
An n-by-n smoother matrix for df equal to four S9
The projection matrix derived from a regression spline model for df equal to nine.

Next we will perform an eigen-decomposition on these three matrices.

The eigen-vectors of the smoother matrix remain consistent regardless of the $\lambda$ value. Consequently, the eigenvectors derived from both S4 and S9 are identical, with the exception of possible sign differences. This leads us to the visual representation of the DR (Demmler & Reinsch) basis functions.

These functions are systematically ordered based on increasing eigenvalues. The initial two values, $d_1$ and $d_2$, are zero. This implies that they correspond to linear functions, which we intentionally do not penalize. As eigenvalues increase, the affiliated basis functions exhibit more oscillations or ‘wiggles.’

Recall that the shrinkage factor is determined by $1/(1 + \lambda d_i)$. As $d_i$ grows, this factor diminishes, leading to increased shrinkage. This is intuitive because larger $d_i$ values signify more oscillatory basis functions. To mitigate the roughness penalty, it’s logical to enforce more stringent shrinkage on coefficients tied to these oscillatory functions.

eigen.S4 = eigen(S4)
eigen.S9 = eigen(S9)
eigen.H9 = eigen(H9)

v4 = eigen.S4$ve
v9 =  eigen.S9$ve

par(mfrow = c(3, 5))
for (i in 1:15) {
  plot(c(min(x), max(x)), c(min(v4, v9), max(v4, v9)),
    xlab = "x", ylab = "", type = "n")
  
  lines(fx, v4[, i], col = 2, lty = 1, lwd = 2.5)
  lines(fx, v9[, i], col = 3, lty = 2, lwd = 2.5)
  }

Below we display the eigenvalues, from small to large, of the three matrices.

plot(eigen.H9$va, pch=5, col="black", cex=1)
points(eigen.S4$va, , col="red", xlab='', ylab='eigen values',
     cex=1.5)
points(eigen.S9$va, pch=4, col="blue", cex=1)
lines(c(0,n), c(1, 1), col=8, lty=2, lwd=1)
legend("topright", pch=c(1,4,5), col=c("red", "blue", "black"),
       legend=c("SS with df=4", "SS with df=9", "NCS with df=9"))

Check for the effective degree of freedom.

sum(diag(S4))
sum(diag(S9))
sum(diag(H9))

Choice of Lambda

Many R packages come with an inbuilt option for determining lambda, primarily based on leave-one-out cross-validation (LOOCV) and generalized cross-validation (GCV).

There’s no need to specify lambda directly. Instead, we indicate the desired degrees of freedom (df), ranging from 0 to n. R then determines the appropriate lambda. We can then consult both CV and GCV curves to decide the best lambda or df.

fit = smooth.spline(x, y, df=9);

When using the smooth.spline function with a degree of freedom set at 9, the returned value fit$df may have a slight deviation due to rounding errors.

The leverage output is equivalent to the diagonal entries of the smoothing matrix.

fit$df
fit$lev  # leveage = diagnal entries of the smoother matrix
sum(fit$lev)
diag(smooth.matrix(x, df =9))

By default, the function offers a GCV score, but you can alter it to LOOCV for comparison.

fit$cv  # default: GCV
sum((y-fit$y)^2)/(1-fit$df/n)^2/n
fit=smooth.spline(x, y, df=9, cv=T) # set 'cv=T' to return CV 
fit$cv
sum(((y-fit$y)/(1-fit$lev))^2)/n

Employ LOO-CV and GCV to select df. Just as with Ridge Regression, smoothing splines can possess fractional dfs.

df = 2+(1:40)/2
m = length(df)
mycv = rep(0,m)
mygcv = rep(0,m)
for (i in 1:m){
    fit = smooth.spline(x, y, df=df[i]);
    mygcv[i] = fit$cv.crit;
    fit = smooth.spline(x, y, df=df[i], cv=T);
    mycv[i] = fit$cv.crit
    }

plot(c(1,20), range(mycv, mygcv)+c(-0.5,0.5), xlab='df', 
     ylab='CV scores', type='n')
points(df, mycv, col="red", cex=2);
points(df, mygcv, col="blue", cex=2);

optdf = df[mygcv==min(mygcv)]
optdf

## [1] 8.5

fitopt = smooth.spline(x, y, df=optdf);
plot(x, y, xlab="x", ylab="y")
lines(predict(fitopt, (1:100)/100),col="red", lwd=2)
lines(fx, fy, col="gray", lwd=2)