No Title

$next$ $up$ $previous$

STAT 330 Lecture 22

Reading for Today's Lecture: 10.1.

Goals of Today's Lecture:

Learn how to do residual analysis to assess the model assumptions.
Introduce confidence intervals for contrasts.

Today's notes

Residual Analysis

Plot histogram or dot-plot of residuals
Plot residual versus time (order of collection of data)
Plot dot plot or histogram for residuals in each sample.
Plot residual versus group mean (called in general the fitted value)
Make a Q-Q plot.

Details: Residuals are

Fitted values are

Making a Q-Q plot: Plot sorted residuals against ``normal quantiles''.

Sort the residuals (total of of these) to get numbers
Compute the normal quantiles:

These are the points which split the area under the normal curve into n+1 equal pieces.
Plot against .
Look for a straight line if the data are really normal.

For the co-agulation data here is a dot plot of the residuals. Each point is labelled according to the corresponding diet. There are too few points for a histogram to really work and also too few to warrant separate plots for each of the 4 groups.

I am looking for signs of non-normality or for outlying residuals. I see no sign of any problems here. I am also looking for evidence that the assumption of homoscedasticity constant variance) is wrong; I don't see such evidence.

We do not have time order information for the coagulation data. Here is a plot of residual versus fitted value. Again I have labelled the group.

I see no problem here. I am looking for a trend in the variation with the mean; in more sophisticated models I would also be looking for evidence that for certain ranges of fitted values the residuals were either predominantly negative or predominantly positive, indicating a failure of the model equation.

Finally here is a Q-Q plot. There are 24 points so n+1=25. The normal quantiles are the points on the normal curve so that the area to the left of them is 1/25, 2/25, , 24/25. For instance the first normal quantile is -1.75 because the table shows that the area to the left of -1.75 is 0.04 = 1/25.

The plot is acceptably straight; there does not seem to be a major problem with assuming the population distributions are normal.

In practice you will make the Q-Q plot not by had but with software; a SAS example is here

What would I do if I saw problems?

For non-normality, non-constant variance, or a trend in variability with the fitted value I might entertain a transformation of the data such as taking square roots or logs or trying the so called Box-Cox transformation.

For non-normality with non-constant variance I would consider using a generalized linear model as in STAT 402.

For non-normality, outliers and heteroscedasticity I might try a robust (non-parametric) analysis using trimmed means, medians or ... . See STAT 430.

Confidence Intervals

Usually we are interested in confidence intervals for differences between group means, that is, for things like . Let

Then it is a fact that

displaymath80

where is the degrees of freedom used in computing the MSE (which will usually be more than as would be the case for a two sample comparison).

Thus

is a level confidence interval for .

$next$ $up$ $previous$

Richard Lockhart
Mon Feb 9 16:29:04 PST 1998