## LOESS regression with R

November 4, 2012 Leave a comment

The other day, I came a small problem: I was investigating a dataset, and the different variables clearly showed a non-linear behaviour.

Consequently, I could not apply the classical linear regression.

An approach to solve this kind of problem is LOESS regression, which stands for *locally weighted scatterplot smoothing*. More information can be found here.

In the following, we will work through an artificial example to make the concept clear and show how to apply in R.

We first create a dataframe with two variables, a and y, which are non-linearily related. The plot shows this relationship.

We perform two regressions, one linear and one loess.

`## ## Call: ## lm(formula = y ~ a) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.303 -0.635 0.294 0.792 1.635 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.100335 0.136374 22.7 <2e-16 *** ## a 0.007645 0.000305 25.1 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.973 on 98 degrees of freedom ## Multiple R-squared: 0.865, Adjusted R-squared: 0.864 ## F-statistic: 630 on 1 and 98 DF, p-value: <2e-16`

`## Call: ## loess(formula = y ~ a) ## ## Number of Observations: 100 ## Equivalent Number of Parameters: 4.56 ## Residual Standard Error: 0.587 ## Trace of smoother matrix: 4.98 ## ## Control settings: ## normalize: TRUE ## span : 0.75 ## degree : 2 ## family : gaussian ## surface : interpolate cell = 0.2`

The scatterplot clearly shows the better fit from the loess regression.

The summary of the regression shows one difference between linear and loess: The loess regression does not give the parameters, since it is a local regression. The classical measure of goodness of fit is r2.

The linear model has a r2 of `0.865`

. For the LOESS we have to calculate the r2 and follow this proposal.

```
## [1] 0.9528
```

The r^{2} from the loess is `0.953`

and thus very good and better than the r^{2} from the linear regression.

Also the investigation of the plot of residuals vs fitted/predicted values indicates a much better fit of the LOSS regression compared to the linear regression (the residuals plot of the linear regression shows the structure – which we clearly do not want); the QQ-plot shows some issues on the tails.

But what to do now with it? Finally, we probably want to *predict* some values out of this exercise.

The *predict*-function helps us with this exercise.

```
## a linreg loess
## 1 10 3.177 2.265
## 2 400 6.158 7.133
## 3 900 9.981 9.365
```

Alright, this was a quick introduction on how to apply the LOESS regression.

And here is the full code:

## @knitr create_data y <- seq(from=1, to=10, length.out=100) a <- y^3 +y^2 + rnorm(100,mean=0, sd=30) data <- data.frame(a=a, y=y) plot(y=y, x=a) ## @knitr linreg linreg <- lm(y~a) summary(linreg) loess <- loess(y~a) summary(loess) scatter.smooth(data) abline(linreg, col="blue") ## @knitr loess_fit hat <- predict(loess) plot(y~a) lines(a[order(a)], hat[order(hat)], col="red") (r_sq_loess <- cor(y, hat)^2) ## @knitr fit_loess par(mfrow=c(2,2)) plot(linreg$fitted, linreg$residuals, main="classical linear regression") plot(loess$fitted, loess$residuals, main="LOESS") # normal probablility plot qqnorm(linreg$residuals, ylim=c(-2,2)) qqline(linreg$residuals) qqnorm(loess$residuals, ylim=c(-2,2)) qqline(loess$residuals) ## @knitr predict predict <- data.frame(a=c(10,400,900)) scatter.smooth(data) abline(linreg, col="blue") predict$linreg <- predict(linreg, predict) predict$loess <- predict(loess, predict) predict points(x=predict$a, y=predict(linreg, predict), col="blue", pch=18, cex=2) points(x=predict$a, y=predict(loess, predict), col="red", pch=18, cex=2)