There are many books on regression and analysis of variance. These books expect different levels of preparedness and place different emphases on the material. This book is not introductory. It presumes some knowledge of basic statistical theory and practice. Readers are expected to know the essentials of statistical inference such as estimation, hypothesis testing and confidence intervals. A basic knowledge of data analysis is presumed. Some linear algebra and calculus are also required.
The emphasis of this text is on the practice of regression and analysis of variance. The objective is to learn what methods are available and more importantly, when they should be applied. Many examples are presented to clarify the use of the techniques and to demonstrate what conclusions can be made. There is relatively less emphasis on mathematical theory, partly because some prior knowledge is assumed and partly because the issues are better tackled elsewhere. Theory is important because it guides the approach we take. I take a wider view of statistical theory. It is not just the formal theorems. Qualitative statistical concepts are just as important in statistics because these enable us to actually do it rather than just talk about it. These qualitative principles are harder to learn because they are difficult to state precisely but they guide the successful experienced statistician.
Data analysis cannot be learned without actually doing it. This means
using a statistical computing package. There is a wide choice of
such packages. They are designed for different audiences and have
different strengths and weaknesses. I have chosen to use R
Why have I used R
? There are several reasons.
Versatility. R
is also a programming language, so I am not
limited by the procedures that are preprogrammed by a package. It is
relatively easy to program new methods in R
.
Interactivity. Data analysis is inherently interactive. Some
older statistical packages were designed when computing was more
expensive and batch processing of computations was the norm. Despite
improvements in hardware, the old batch processing paradigm lives on
in their use. R
does one thing at a time, allowing us to make
changes on the basis of what we see during the analysis.
Freedom. R
is based on S from which the commercial
package Splus
is derived. R
itself is open-source software and may
be obtained free of charge to all. Linux, Macintosh, Windows and
other UNIX versions are maintained and can be obtained from the
R-project at www.r-project.org
. R
is mostly compatible
with Splus
, meaning that Splus
could easily be used for most of the
examples provided in this book.
Popularity. SAS is the most common statistics package in general use
but R
or S is most popular with researchers in statistics. A look
at common statistical journals confirms this popularity. R
is also
popular for quantitative applications in finance.