The primary R function for fitting smoothing splines is
smooth.spline
available in the splines
library.
library(splines)
help(smooth.spline)
options(digits = 4)
This is a simulated example from ESL Chapter 5.5.2. The true function is
\[ f(x) = \frac{\sin (12(x + 0.2))}{x+0.2}, \quad x \in [0, 1]. \] A graphical representation shows the true function as a grey curve, with observed data points shown as red dots.
set.seed(1234)
= 30
n = 1
err = sort(runif(n))
x = sin(12*(x+0.2))/(x+0.2) + rnorm(n, 0, err);
y plot(x, y, col="red");
= 1:50/50;
fx = sin(12*(fx+0.2))/(fx+0.2)
fy lines(fx, fy, col=8, lwd=2);
By using the smooth.spline
function, different smoothing
splines can be fit by adjusting the degree of freedom (df). Using the
predict
function, one can then determine the spline on a
fine grid of points.
A lower df
leads to a more penalized function which
appears flatter since roughness is more penalized. This corresponds to a
higher lambda value. As df increases, the fitted spline becomes more
flexible or “wiggly”.
par(mfrow=c(2,2)); # 2x2 (totally 4) subplots
plot(x,y, xlab='x', ylab='y');
lines(fx, fy, col=8, lwd=1.5);
lines(predict(smooth.spline(x, y, df=5),fx), lty=2, col='blue', lwd=1.5);
title('df=5');
plot(x,y,xlab='x', ylab='y');
lines(fx, fy, col=8, lwd=1.5);
lines(predict(smooth.spline(x, y, df=9),fx), lty=2, col='blue', lwd=1.5);
title('df=9');
plot(x,y,xlab='x', ylab='y');
lines(fx, fy, col=8, lwd=1.5);
lines(predict(smooth.spline(x, y, df=15),fx), lty=2, col='blue', lwd=1.5);
title('df=15');
plot(x,y,xlab='x', ylab='y');
lines(fx, fy, col=8, lwd=1.5);
lines(predict(smooth.spline(x, y, df=20),fx), lty=2, col='blue', lwd=1.5);
title('df=20')
The DR ((Demmler & Reinsch) basis functions are derived from the
eigenvectors of the smoother matrix \(S_{\lambda}\). Yet, this matrix isn’t
directly returned by the smooth.spline
function.
The following script uses a clever way to “reverse engineer” the
smoother matrix \(S_{\lambda}\) from a
smoothing spline when using the smooth.spline
function. By
feeding in specific \(\mathbf{y}\)
vectors that have a single “1” and otherwise consist of “0”s, you’re
effectively pulling out each column of \(S_{\lambda}\) as the prediction of the
spline. At the end, to ensure that the matrix is symmetric (which it
should be by the nature of the smoother matrix), we average it with its
transpose.
It’s important to note that this approach works because the smoother matrix is determined by x alone, and not influenced by y.
= function(x, df){
smooth.matrix # return the smoother matrix with knots x and degree of freedom = df
# this function is for x having unique values
= length(x)
n = matrix(0, n, n)
A for(i in 1:n){
= rep(0, n); y[i]=1
y = smooth.spline(x, y, df=df)$y
yi = yi
A[,i]
}return((A+t(A))/2)
}= 1:50/50
fx = smooth.matrix(fx, df=4)
S4 = smooth.matrix(fx, df=9)
S9 = ns(fx, df=9, intercept=TRUE)
tmp = tmp%*%solve(t(tmp)%*%tmp)%*%t(tmp) H9
We now have three matrices:
S4
S9
Next we will perform an eigen-decomposition on these three matrices.
The eigen-vectors of the smoother matrix remain consistent regardless
of the \(\lambda\) value. Consequently,
the eigenvectors derived from both S4
and S9
are identical, with the exception of possible sign differences. This
leads us to the visual representation of the DR (Demmler & Reinsch)
basis functions.
These functions are systematically ordered based on increasing eigenvalues. The initial two values, \(d_1\) and \(d_2\), are zero. This implies that they correspond to linear functions, which we intentionally do not penalize. As eigenvalues increase, the affiliated basis functions exhibit more oscillations or ‘wiggles.’
Recall that the shrinkage factor is determined by \(1/(1 + \lambda d_i)\). As \(d_i\) grows, this factor diminishes, leading to increased shrinkage. This is intuitive because larger \(d_i\) values signify more oscillatory basis functions. To mitigate the roughness penalty, it’s logical to enforce more stringent shrinkage on coefficients tied to these oscillatory functions.
= eigen(S4)
eigen.S4 = eigen(S9)
eigen.S9 = eigen(H9)
eigen.H9
= eigen.S4$ve
v4 = eigen.S9$ve
v9
par(mfrow = c(3, 5))
for (i in 1:15) {
plot(c(min(x), max(x)), c(min(v4, v9), max(v4, v9)),
xlab = "x", ylab = "", type = "n")
lines(fx, v4[, i], col = 2, lty = 1, lwd = 2.5)
lines(fx, v9[, i], col = 3, lty = 2, lwd = 2.5)
}
Below we display the eigenvalues, from small to large, of the three matrices.
plot(eigen.H9$va, pch=5, col="black", cex=1)
points(eigen.S4$va, , col="red", xlab='', ylab='eigen values',
cex=1.5)
points(eigen.S9$va, pch=4, col="blue", cex=1)
lines(c(0,n), c(1, 1), col=8, lty=2, lwd=1)
legend("topright", pch=c(1,4,5), col=c("red", "blue", "black"),
legend=c("SS with df=4", "SS with df=9", "NCS with df=9"))
Check for the effective degree of freedom.
sum(diag(S4))
sum(diag(S9))
sum(diag(H9))
Many R packages come with an inbuilt option for determining lambda, primarily based on leave-one-out cross-validation (LOOCV) and generalized cross-validation (GCV).
There’s no need to specify lambda directly. Instead, we indicate the desired degrees of freedom (df), ranging from 0 to n. R then determines the appropriate lambda. We can then consult both CV and GCV curves to decide the best lambda or df.
= smooth.spline(x, y, df=9); fit
When using the smooth.spline
function with a degree of
freedom set at 9, the returned value fit$df
may have a
slight deviation due to rounding errors.
The leverage output is equivalent to the diagonal entries of the smoothing matrix.
$df
fit$lev # leveage = diagnal entries of the smoother matrix
fitsum(fit$lev)
diag(smooth.matrix(x, df =9))
By default, the function offers a GCV score, but you can alter it to LOOCV for comparison.
$cv # default: GCV
fitsum((y-fit$y)^2)/(1-fit$df/n)^2/n
=smooth.spline(x, y, df=9, cv=T) # set 'cv=T' to return CV
fit$cv
fitsum(((y-fit$y)/(1-fit$lev))^2)/n
Employ LOO-CV and GCV to select df. Just as with Ridge Regression, smoothing splines can possess fractional dfs.
= 2+(1:40)/2
df = length(df)
m = rep(0,m)
mycv = rep(0,m)
mygcv for (i in 1:m){
= smooth.spline(x, y, df=df[i]);
fit = fit$cv.crit;
mygcv[i] = smooth.spline(x, y, df=df[i], cv=T);
fit = fit$cv.crit
mycv[i]
}
plot(c(1,20), range(mycv, mygcv)+c(-0.5,0.5), xlab='df',
ylab='CV scores', type='n')
points(df, mycv, col="red", cex=2);
points(df, mygcv, col="blue", cex=2);
= df[mygcv==min(mygcv)]
optdf optdf
## [1] 8.5
= smooth.spline(x, y, df=optdf);
fitopt plot(x, y, xlab="x", ylab="y")
lines(predict(fitopt, (1:100)/100),col="red", lwd=2)
lines(fx, fy, col="gray", lwd=2)