Julian Faraway 22 September 2020
Load the libraries:
library(ggplot2)
library(INLA)
Load in and summarize the data:
data(jsp, package="faraway")
jspr <- jsp[jsp$year==2,]
mjspr <- data.frame(rbind(jspr[,1:6],jspr[,1:6]),subject=factor(rep(c("english","math"),c(953,953))),score=c(jspr$english/100,jspr$math/40))
mjspr$craven <- mjspr$raven-mean(mjspr$raven)
Plot the data
ggplot(mjspr, aes(x=raven, y=score))+geom_jitter(alpha=0.25)+facet_grid(gender ~ subject)
Need to construct unique labels for nested factor levels of class and student:
mjspr$school <- factor(mjspr$school)
mjspr$classch <- factor(paste(mjspr$school,mjspr$class,sep="."))
mjspr$classchid <- factor(paste(mjspr$school,mjspr$class,mjspr$id,sep="."))
formula <- score ~ subject*gender+craven*subject+social + f(school, model="iid") + f(classch, model="iid") + f(classchid, model="iid")
result <- inla(formula, family="gaussian", data=mjspr)
result <- inla.hyperpar(result)
summary(result)
Call:
"inla(formula = formula, family = \"gaussian\", data = mjspr)"
Time used:
Pre = 2.83, Running = 80.9, Post = 0.217, Total = 84
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 0.442 0.026 0.390 0.442 0.494 0.442 0
subjectmath 0.367 0.008 0.351 0.367 0.382 0.367 0
gendergirl 0.064 0.010 0.044 0.064 0.084 0.064 0
craven 0.017 0.001 0.016 0.017 0.019 0.017 0
social2 0.014 0.027 -0.040 0.014 0.067 0.014 0
social3 -0.022 0.029 -0.078 -0.022 0.035 -0.022 0
social4 -0.072 0.026 -0.122 -0.072 -0.021 -0.072 0
social5 -0.051 0.029 -0.107 -0.051 0.006 -0.051 0
social6 -0.089 0.031 -0.149 -0.089 -0.029 -0.089 0
social7 -0.099 0.032 -0.161 -0.099 -0.038 -0.099 0
social8 -0.082 0.042 -0.165 -0.082 0.001 -0.082 0
social9 -0.048 0.027 -0.102 -0.048 0.006 -0.048 0
subjectmath:gendergirl -0.059 0.011 -0.080 -0.059 -0.038 -0.059 0
subjectmath:craven -0.004 0.001 -0.006 -0.004 -0.002 -0.004 0
Random effects:
Name Model
school IID model
classch IID model
classchid IID model
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 73.56 3.38 67.07 73.50 80.33 73.40
Precision for school 498.97 238.25 242.25 448.35 1078.99 394.32
Precision for classch 16055.92 18365.70 773.69 9332.44 67433.35 1155.65
Precision for classchid 97.38 8.38 82.41 96.86 115.28 95.88
Expected number of effective parameters(stdev): 596.64(23.66)
Number of equivalent replicates : 3.19
Marginal log-Likelihood: 794.42
The class precision looks too high. Need to change the default prior
Now try more informative gamma priors for the precisions. Define it so
the mean value of gamma prior is set to the inverse of the variance of
the residuals of the fixed-effects only model. We expect the error
variances to be lower than this variance so this is an overestimate. The
variance of the gamma prior (for the precision) is controlled by the
apar
parameter.
apar <- 0.5
lmod <- lm(score ~ subject*gender+craven*subject+social,mjspr)
bpar <- apar*var(residuals(lmod))
lgprior <- list(prec = list(prior="loggamma", param = c(apar,bpar)))
formula = score ~ subject*gender+craven*subject+social+f(school, model="iid", hyper = lgprior)+f(classch, model="iid", hyper = lgprior)+f(classchid, model="iid", hyper = lgprior)
result <- inla(formula, family="gaussian", data=mjspr)
result <- inla.hyperpar(result)
summary(result)
Call:
"inla(formula = formula, family = \"gaussian\", data = mjspr)"
Time used:
Pre = 3.36, Running = 101, Post = 0.217, Total = 105
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 0.441 0.027 0.387 0.441 0.495 0.441 0
subjectmath 0.367 0.008 0.351 0.367 0.382 0.367 0
gendergirl 0.062 0.010 0.042 0.062 0.082 0.062 0
craven 0.017 0.001 0.016 0.017 0.019 0.017 0
social2 0.013 0.028 -0.041 0.013 0.067 0.013 0
social3 -0.020 0.029 -0.077 -0.020 0.038 -0.020 0
social4 -0.069 0.026 -0.120 -0.069 -0.018 -0.069 0
social5 -0.050 0.029 -0.108 -0.050 0.007 -0.050 0
social6 -0.085 0.031 -0.146 -0.085 -0.024 -0.085 0
social7 -0.098 0.032 -0.161 -0.098 -0.036 -0.098 0
social8 -0.080 0.043 -0.164 -0.080 0.004 -0.080 0
social9 -0.046 0.028 -0.101 -0.046 0.008 -0.046 0
subjectmath:gendergirl -0.059 0.011 -0.080 -0.059 -0.038 -0.059 0
subjectmath:craven -0.004 0.001 -0.006 -0.004 -0.002 -0.004 0
Random effects:
Name Model
school IID model
classch IID model
classchid IID model
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 73.74 3.38 67.27 73.69 80.52 73.59
Precision for school 362.55 111.15 191.06 347.04 622.74 317.70
Precision for classch 473.81 132.18 265.96 456.62 779.72 423.33
Precision for classchid 99.65 8.61 84.28 99.13 118.00 98.12
Expected number of effective parameters(stdev): 601.76(23.25)
Number of equivalent replicates : 3.17
Marginal log-Likelihood: 788.63
Compute the transforms to an SD scale for the field and error. Make a table of summary statistics for the posteriors:
sigmaschool <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[2]])
sigmaclass <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[3]])
sigmaid <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[4]])
sigmaepsilon <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[1]])
restab=sapply(result$marginals.fixed, function(x) inla.zmarginal(x,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaschool,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaclass,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaid,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaepsilon,silent=TRUE))
colnames(restab) = c(names(lmod$coef),"school","class","id","epsilon")
data.frame(restab)
X.Intercept. subjectmath gendergirl craven social2 social3 social4 social5 social6
mean 0.44093 0.36656 0.062045 0.017351 0.013333 -0.01977 -0.069006 -0.050485 -0.085307
sd 0.027399 0.0077026 0.010288 0.00093454 0.027514 0.02932 0.026188 0.029134 0.031016
quant0.025 0.38714 0.35145 0.041855 0.015517 -0.040674 -0.077322 -0.12041 -0.10767 -0.14619
quant0.25 0.42238 0.36135 0.055077 0.016718 -0.0052959 -0.039621 -0.086737 -0.07021 -0.1063
quant0.5 0.44085 0.36654 0.062014 0.017348 0.013251 -0.019856 -0.069083 -0.050571 -0.085396
quant0.75 0.45931 0.37173 0.068951 0.017978 0.031799 -9.1579e-05 -0.051429 -0.030931 -0.064489
quant0.975 0.49455 0.38164 0.08218 0.01918 0.067182 0.037611 -0.017755 0.006533 -0.024612
social7 social8 social9 subjectmath.gendergirl subjectmath.craven school class id
mean -0.098481 -0.080234 -0.046183 -0.059194 -0.0037203 0.054327 0.047252 0.10045
sd 0.031991 0.042753 0.027736 0.010696 0.00092953 0.0081964 0.0064836 0.0042736
quant0.025 -0.16128 -0.16416 -0.10063 -0.080189 -0.0055449 0.040116 0.035848 0.092085
quant0.25 -0.12014 -0.10918 -0.064961 -0.066436 -0.0043496 0.048502 0.042644 0.097548
quant0.5 -0.098574 -0.080359 -0.046264 -0.059226 -0.003723 0.053666 0.046787 0.10043
quant0.75 -0.077009 -0.051539 -0.027567 -0.052015 -0.0030964 0.059418 0.051347 0.10333
quant0.975 -0.035876 0.0034334 0.0080975 -0.038262 -0.0019012 0.072236 0.061233 0.10888
epsilon
mean 0.11654
sd 0.0026583
quant0.025 0.11146
quant0.25 0.11471
quant0.5 0.11649
quant0.75 0.11831
quant0.975 0.1219
Also construct a plot of the SD posteriors:
ddf <- data.frame(rbind(sigmaschool,sigmaclass,sigmaid,sigmaepsilon),errterm=gl(4,nrow(sigmaepsilon),labels = c("school","class","id","epsilon")))
ggplot(ddf, aes(x,y, linetype=errterm))+geom_line()+xlab("score")+ylab("density")+xlim(0,0.15)
Posteriors look OK. More variation is assigned to the school and class than previous models.
In Simpson et al (2015), penalized complexity priors are proposed. This requires that we specify a scaling for the SDs of the random effects. We use the SD of the residuals of the fixed effects only model (what might be called the base model in the paper) to provide this scaling.
lmod <- lm(score ~ subject*gender+craven*subject+social,mjspr)
sdres <- sd(residuals(lmod))
pcprior <- list(prec = list(prior="pc.prec", param = c(3*sdres,0.01)))
formula = score ~ subject*gender+craven*subject+social+f(school, model="iid", hyper = pcprior)+f(classch, model="iid", hyper = pcprior)+f(classchid, model="iid", hyper = pcprior)
result <- inla(formula, family="gaussian", data=mjspr)
result <- inla.hyperpar(result)
summary(result)
Call:
"inla(formula = formula, family = \"gaussian\", data = mjspr)"
Time used:
Pre = 2.79, Running = 134, Post = 0.239, Total = 137
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 0.441 0.027 0.389 0.441 0.494 0.441 0
subjectmath 0.367 0.008 0.351 0.367 0.382 0.367 0
gendergirl 0.063 0.010 0.043 0.063 0.083 0.063 0
craven 0.017 0.001 0.016 0.017 0.019 0.017 0
social2 0.014 0.027 -0.040 0.014 0.067 0.014 0
social3 -0.020 0.029 -0.078 -0.020 0.037 -0.020 0
social4 -0.070 0.026 -0.121 -0.070 -0.019 -0.070 0
social5 -0.050 0.029 -0.107 -0.050 0.006 -0.050 0
social6 -0.088 0.031 -0.148 -0.088 -0.027 -0.088 0
social7 -0.099 0.032 -0.162 -0.099 -0.037 -0.099 0
social8 -0.081 0.042 -0.165 -0.081 0.002 -0.081 0
social9 -0.047 0.028 -0.101 -0.047 0.007 -0.047 0
subjectmath:gendergirl -0.059 0.011 -0.080 -0.059 -0.038 -0.059 0
subjectmath:craven -0.004 0.001 -0.006 -0.004 -0.002 -0.004 0
Random effects:
Name Model
school IID model
classch IID model
classchid IID model
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 73.66 3.38 67.18 73.61 80.44 73.51
Precision for school 539.02 379.27 229.52 449.49 1426.11 375.96
Precision for classch 10606.93 34022.15 430.73 1828.48 100705.15 769.94
Precision for classchid 97.91 8.48 82.78 97.39 116.01 96.38
Expected number of effective parameters(stdev): 599.27(23.39)
Number of equivalent replicates : 3.18
Marginal log-Likelihood: 798.94
Compute the summaries as before:
sigmaschool <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[2]])
sigmaclass <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[3]])
sigmaid <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[4]])
sigmaepsilon <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[1]])
restab=sapply(result$marginals.fixed, function(x) inla.zmarginal(x,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaschool,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaclass,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaid,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaepsilon,silent=TRUE))
colnames(restab) = c(names(lmod$coef),"school","class","id","epsilon")
data.frame(restab)
X.Intercept. subjectmath gendergirl craven social2 social3 social4 social5 social6 social7
mean 0.44142 0.36656 0.063207 0.017388 0.013833 -0.020454 -0.070332 -0.050344 -0.087549 -0.099405
sd 0.026591 0.00771 0.01028 0.0009268 0.027299 0.029055 0.025946 0.028889 0.030768 0.031703
quant0.025 0.38922 0.35143 0.043034 0.015569 -0.039751 -0.077484 -0.12126 -0.10705 -0.14795 -0.16164
quant0.25 0.42342 0.36134 0.056245 0.016761 -0.0046496 -0.040125 -0.087898 -0.069903 -0.10838 -0.12087
quant0.5 0.44134 0.36654 0.063176 0.017386 0.013752 -0.020539 -0.070408 -0.050429 -0.087638 -0.099497
quant0.75 0.45927 0.37174 0.070108 0.018011 0.032154 -0.0009531 -0.052918 -0.030955 -0.066897 -0.078126
quant0.975 0.49346 0.38165 0.083329 0.019202 0.067258 0.03641 -0.019555 0.0061942 -0.02734 -0.037365
social8 social9 subjectmath.gendergirl subjectmath.craven school class id epsilon
mean -0.08134 -0.047095 -0.059194 -0.0037203 0.046993 0.023611 0.10134 0.11661
sd 0.042462 0.027512 0.010706 0.00093041 0.0097219 0.011976 0.0043231 0.0026644
quant0.025 -0.16469 -0.1011 -0.080209 -0.0055466 0.026517 0.0030768 0.092875 0.11152
quant0.25 -0.11009 -0.065721 -0.066443 -0.0043502 0.040925 0.014485 0.098405 0.11477
quant0.5 -0.081463 -0.047175 -0.059226 -0.003723 0.047148 0.023346 0.10132 0.11655
quant0.75 -0.05284 -0.02863 -0.052008 -0.0030958 0.053297 0.031898 0.10425 0.11838
quant0.975 0.0017572 0.0067447 -0.038242 -0.0018994 0.065879 0.048019 0.10986 0.12198
Make the plots:
ddf <- data.frame(rbind(sigmaschool,sigmaclass,sigmaid,sigmaepsilon),errterm=gl(4,nrow(sigmaepsilon),labels = c("school","class","id","epsilon")))
ggplot(ddf, aes(x,y, linetype=errterm))+geom_line()+xlab("score")+ylab("density")+xlim(0,0.15)
Class variation is quite small compared to the other sources.
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] gdtools_0.2.2 INLA_20.03.17 foreach_1.5.0 sp_1.4-2 Matrix_1.2-18 ggplot2_3.3.2 knitr_1.29
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 cpp11_0.2.1 pillar_1.4.6 compiler_4.0.2 iterators_1.0.12
[6] tools_4.0.2 digest_0.6.25 evaluate_0.14 lifecycle_0.2.0 tibble_3.0.3
[11] gtable_0.3.0 lattice_0.20-41 pkgconfig_2.0.3 rlang_0.4.7 yaml_2.2.1
[16] xfun_0.16 withr_2.2.0 dplyr_1.0.2 stringr_1.4.0 MatrixModels_0.4-1
[21] systemfonts_0.3.1 generics_0.0.2 vctrs_0.3.4 grid_4.0.2 tidyselect_1.1.0
[26] svglite_1.2.3.2 glue_1.4.2 R6_2.4.1 rmarkdown_2.3 farver_2.0.3
[31] purrr_0.3.4 magrittr_1.5 splines_4.0.2 scales_1.1.1 codetools_0.2-16
[36] ellipsis_0.3.1 htmltools_0.5.0.9000 colorspace_1.4-1 Deriv_4.0.1 labeling_0.3
[41] stringi_1.4.6 munsell_0.5.0 crayon_1.3.4