Plotting Regression Datasets

Consider a regression dataset with a response and several predictors. You want a single plot showing the response plotted against each of the predictors. You could use the pairs() but that also shows plots between the predictors. If there are more than a few predictors, there are too many plots to see any one of them clearly. Here’s a simple solution:

Here’s an example dataset:

head(swiss)
             Fertility Agriculture Examination Education Catholic
Courtelary        80.2        17.0          15        12     9.96
Delemont          83.1        45.1           6         9    84.84
Franches-Mnt      92.5        39.7           5         5    93.40
Moutier           85.8        36.5          12         7    33.77
Neuveville        76.9        43.5          17        15     5.16
Porrentruy        76.1        35.3           9         7    90.57
             Infant.Mortality
Courtelary               22.2
Delemont                 22.2
Franches-Mnt             20.2
Moutier                  20.3
Neuveville               20.6
Porrentruy               26.6

Now reorganise the data using the tidyr package so that there is one (x,y) pair on each line:

library(tidyr)
rdf <- gather(swiss, variable, value, -Fertility)
head(rdf)
  Fertility    variable value
1      80.2 Agriculture  17.0
2      83.1 Agriculture  45.1
3      92.5 Agriculture  39.7
4      85.8 Agriculture  36.5
5      76.9 Agriculture  43.5
6      76.1 Agriculture  35.3

Use ggplot2 to plot the response against each of the predictors (which are on different scales so we need to allow for that)

library(ggplot2)
ggplot(rdf, aes(x=value,y=Fertility)) + geom_point() + facet_wrap(~ variable, scale="free_x")

We can elaborate as needed.

Julian Faraway
Julian Faraway
Professor of Statistics

Professor of Statistics at the University of Bath