Load Data

To illustrate local regression techniques, I’ve chosen three distinct data sets.

The first two are simulated.

  • In the first set exa, the true curve is smooth. Notably, on the left side, it remains relatively flat, while on the right, there’s a noticeable fluctuation.
  • In the second set exb, the true curve is a simple straight line, , but two outliers may impact the estimated curve.

The third data set is derived from observations of the Old Faithful Geyser. Familiar to many statistics courses, each data point here denotes the duration of a specific eruption, while the y-axis represents the waiting time between eruptions. There’s an evident positive correlation between these two variables. While it might be tempting to fit a linear model, in this session, we will explore non-linear modeling to uncover deeper insights the data might offer.

par(mfrow=c(1,3))
url = "https://liangfgithub.github.io/Data/Example_A.csv"
exa = read.csv(url)
plot (y ~ x, exa, main="Example A")
lines(m ~ x, exa)

url = "https://liangfgithub.github.io/Data/Example_B.csv"
exb = read.csv(url)
plot(y ~ x, exb, main="Example B")
lines(m ~ x, exb)

url = "https://liangfgithub.github.io/Data/faithful.dat"
faithful = read.table(url, header=TRUE)
plot(waiting ~ eruptions, faithful,main="Old Faithful")