LOESS regression with R

The other day, I came a small problem: I was investigating a dataset, and the different variables clearly showed a non-linear behaviour.
Consequently, I could not apply the classical linear regression.
An approach to solve this kind of problem is LOESS regression, which stands for locally weighted scatterplot smoothing. More information can be found here.

In the following, we will work through an artificial example to make the concept clear and show how to apply in R.

We first create a dataframe with two variables, a and y, which are non-linearily related. The plot shows this relationship. We perform two regressions, one linear and one loess.

``````##
## Call:
## lm(formula = y ~ a)
##
## Residuals:
##    Min     1Q Median     3Q    Max
## -2.303 -0.635  0.294  0.792  1.635
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.100335   0.136374    22.7   <2e-16 ***
## a           0.007645   0.000305    25.1   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.973 on 98 degrees of freedom
## Multiple R-squared: 0.865,   Adjusted R-squared: 0.864
## F-statistic:  630 on 1 and 98 DF,  p-value: <2e-16
```
```## Call:
## loess(formula = y ~ a)
##
## Number of Observations: 100
## Equivalent Number of Parameters: 4.56
## Residual Standard Error: 0.587
## Trace of smoother matrix: 4.98
##
## Control settings:
##   normalize:  TRUE
##   span       :  0.75
##   degree   :  2
##   family   :  gaussian
##   surface  :  interpolate      cell = 0.2
`````` The scatterplot clearly shows the better fit from the loess regression.
The summary of the regression shows one difference between linear and loess: The loess regression does not give the parameters, since it is a local regression. The classical measure of goodness of fit is r2.
The linear model has a r2 of `0.865`. For the LOESS we have to calculate the r2 and follow this proposal.

``````##  0.9528
``````

The r2 from the loess is `0.953` and thus very good and better than the r2 from the linear regression.
Also the investigation of the plot of residuals vs fitted/predicted values indicates a much better fit of the LOSS regression compared to the linear regression (the residuals plot of the linear regression shows the structure – which we clearly do not want); the QQ-plot shows some issues on the tails. But what to do now with it? Finally, we probably want to predict some values out of this exercise.
The predict-function helps us with this exercise.

``````##     a linreg loess
## 1  10  3.177 2.265
## 2 400  6.158 7.133
## 3 900  9.981 9.365
`````` Alright, this was a quick introduction on how to apply the LOESS regression.

And here is the full code:

```## @knitr create_data
y <- seq(from=1, to=10, length.out=100)
a <- y^3 +y^2  + rnorm(100,mean=0, sd=30)
data <- data.frame(a=a, y=y)
plot(y=y, x=a)

## @knitr linreg
linreg <- lm(y~a)
summary(linreg)

loess <- loess(y~a)
summary(loess)

scatter.smooth(data)
abline(linreg, col="blue")

## @knitr loess_fit
hat <- predict(loess)
plot(y~a)
lines(a[order(a)], hat[order(hat)], col="red")
(r_sq_loess <- cor(y, hat)^2)

## @knitr fit_loess
par(mfrow=c(2,2))
plot(linreg\$fitted, linreg\$residuals, main="classical linear regression")
plot(loess\$fitted, loess\$residuals, main="LOESS")
# normal probablility plot
qqnorm(linreg\$residuals, ylim=c(-2,2))
qqline(linreg\$residuals)
qqnorm(loess\$residuals, ylim=c(-2,2))
qqline(loess\$residuals)

## @knitr predict
predict <- data.frame(a=c(10,400,900))
scatter.smooth(data)
abline(linreg, col="blue")

predict\$linreg <- predict(linreg, predict)
predict\$loess <- predict(loess, predict)

predict

points(x=predict\$a, y=predict(linreg, predict), col="blue", pch=18, cex=2)
points(x=predict\$a, y=predict(loess, predict), col="red", pch=18, cex=2)```

Created by Pretty R at inside-R.org