Julian Faraway 21 September 2020
See the introduction for more about INLA. This tutorial paper is directed towards spatial problems but same methods can be profitably applied on one-dimensional smoothing problems. See also (Lindgren and Rue 2015) for details. The construction is detailed in our book.
Load the packages (you may need to install the brinla package):
library(ggplot2)
library(INLA)
library(brinla)
We use the fossil example from (Bralower et al. 1997) and used by (Chaudhuri and Marron 1999). We have the ratio of strontium isotopes found in fossil shells in the mid-Cretaceous period from about 90 to 125 million years ago. We rescale the response as in the SiZer paper.
data(fossil, package="brinla")
fossil$sr <- (fossil$sr-0.7)*100
Plot the data:
plot(sr ~ age, fossil, xlab="Age",ylab="Strontium Ratio")
The default fit uses priors based on the SD of the response and the range of the predictor to motivate sensible priors.
gpmod = bri.gpr(fossil$age, fossil$sr)
We can plot the resulting fit and 95% credible bands
plot(sr ~ age, fossil, xlab="Age",ylab="Strontium Ratio")
lines(gpmod$xout, gpmod$mean)
lines(gpmod$xout, gpmod$lcb, lty=2)
lines(gpmod$xout, gpmod$ucb, lty=2)
The default number of spline basis functions is 25. Let’s see what happens if you increase this to 100:
gpmod = bri.gpr(fossil$age, fossil$sr, nbasis = 100)
plot(sr ~ age, fossil, xlab="Age",ylab="Strontium Ratio")
lines(gpmod$xout, gpmod$mean)
lines(gpmod$xout, gpmod$lcb, lty=2)
lines(gpmod$xout, gpmod$ucb, lty=2)
It makes very little difference although it does take longer to compute. This demonstrates that you just need enough splines for the flexibility you want. If you use more, you won’t get a rougher fit.
We can decrease the degree of the splines (default is 2 corresponding to cubic splines):
gpmod = bri.gpr(fossil$age, fossil$sr, degree=0)
plot(sr ~ age, fossil, xlab="Age",ylab="Strontium Ratio")
lines(gpmod$xout, gpmod$mean)
lines(gpmod$xout, gpmod$lcb, lty=2)
lines(gpmod$xout, gpmod$ucb, lty=2)
We get a piecewise constant fit (the diagonal parts are just from interpolating the fit - could use a better grid to avoid this)
We can change the shape of the GP kernel where default is alpha = 2
gpmod = bri.gpr(fossil$age, fossil$sr, alpha = 1)
plot(sr ~ age, fossil, xlab="Age",ylab="Strontium Ratio")
lines(gpmod$xout, gpmod$mean)
lines(gpmod$xout, gpmod$lcb, lty=2)
lines(gpmod$xout, gpmod$ucb, lty=2)
The kernel is less smooth resulting in a rougher fit.
We can set the prior mean on the error to be much larger than the default (which is the SD of the response - already quite large:
gpmod = bri.gpr(fossil$age, fossil$sr, sigma0 = 10*sd(fossil$sr))
plot(sr ~ age, fossil, xlab="Age",ylab="Strontium Ratio")
lines(gpmod$xout, gpmod$mean)
lines(gpmod$xout, gpmod$lcb, lty=2)
lines(gpmod$xout, gpmod$ucb, lty=2)
Doesn’t have much affect on the outcome. Nice to have a robust choice of prior.
We can set the prior mean on the error to be much smaller than the default (which is the SD of the response)
gpmod = bri.gpr(fossil$age, fossil$sr, sigma0 = 0.1*sd(fossil$sr))
plot(sr ~ age, fossil, xlab="Age",ylab="Strontium Ratio")
lines(gpmod$xout, gpmod$mean)
lines(gpmod$xout, gpmod$lcb, lty=2)
lines(gpmod$xout, gpmod$ucb, lty=2)
Again quite robust to this choice.
We can also experiment with the range. The default is one quarter of the range of the predictor. Let’s make it equal to the range:
gpmod = bri.gpr(fossil$age, fossil$sr, rho0 = max(fossil$age) - min(fossil$age))
plot(sr ~ age, fossil, xlab="Age",ylab="Strontium Ratio")
lines(gpmod$xout, gpmod$mean)
lines(gpmod$xout, gpmod$lcb, lty=2)
lines(gpmod$xout, gpmod$ucb, lty=2)
Again makes very little difference.
We can also use penalized complexity priors. We set a high value for sigma where we judge only a 5% chance it is more than this and a high value for the range where we judge only a 5% chance that it is less than this.
highsig = 10*sd(fossil$sr)
lowrho = 0.05*(max(fossil$age) - min(fossil$age))
gpmod = bri.gpr(fossil$age, fossil$sr, pcprior = c(highsig, lowrho))
plot(sr ~ age, fossil, xlab="Age",ylab="Strontium Ratio")
lines(gpmod$xout, gpmod$mean)
lines(gpmod$xout, gpmod$lcb, lty=2)
lines(gpmod$xout, gpmod$ucb, lty=2)
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] brinla_0.1.0 INLA_20.03.17 foreach_1.5.0 sp_1.4-2 Matrix_1.2-18 ggplot2_3.3.2 knitr_1.29
loaded via a namespace (and not attached):
[1] pillar_1.4.6 compiler_4.0.2 iterators_1.0.12 tools_4.0.2 digest_0.6.25
[6] evaluate_0.14 lifecycle_0.2.0 tibble_3.0.3 gtable_0.3.0 lattice_0.20-41
[11] pkgconfig_2.0.3 rlang_0.4.7 yaml_2.2.1 xfun_0.16 withr_2.2.0
[16] dplyr_1.0.2 stringr_1.4.0 MatrixModels_0.4-1 generics_0.0.2 vctrs_0.3.4
[21] grid_4.0.2 tidyselect_1.1.0 glue_1.4.2 R6_2.4.1 rmarkdown_2.3
[26] purrr_0.3.4 magrittr_1.5 splines_4.0.2 scales_1.1.1 codetools_0.2-16
[31] ellipsis_0.3.1 htmltools_0.5.0.9000 colorspace_1.4-1 stringi_1.4.6 munsell_0.5.0
[36] crayon_1.3.4