next up previous

STAT 330 Lecture 22

Reading for Today's Lecture: 10.1.

Goals of Today's Lecture:

Today's notes

Residual Analysis

Details: Residuals are

displaymath56

Fitted values are

displaymath58

Making a Q-Q plot: Plot sorted residuals against ``normal quantiles''.

For the co-agulation data here is a dot plot of the residuals. Each point is labelled according to the corresponding diet. There are too few points for a histogram to really work and also too few to warrant separate plots for each of the 4 groups.

I am looking for signs of non-normality or for outlying residuals. I see no sign of any problems here. I am also looking for evidence that the assumption of homoscedasticity constant variance) is wrong; I don't see such evidence.

We do not have time order information for the coagulation data. Here is a plot of residual versus fitted value. Again I have labelled the group.

I see no problem here. I am looking for a trend in the variation with the mean; in more sophisticated models I would also be looking for evidence that for certain ranges of fitted values the residuals were either predominantly negative or predominantly positive, indicating a failure of the model equation.

Finally here is a Q-Q plot. There are 24 points so n+1=25. The normal quantiles are the points on the normal curve so that the area to the left of them is 1/25, 2/25, tex2html_wrap_inline74 , 24/25. For instance the first normal quantile is -1.75 because the table shows that the area to the left of -1.75 is 0.04 = 1/25.

The plot is acceptably straight; there does not seem to be a major problem with assuming the population distributions are normal.

In practice you will make the Q-Q plot not by had but with software; a SAS example is here

What would I do if I saw problems?

For non-normality, non-constant variance, or a trend in variability with the fitted value I might entertain a transformation of the data such as taking square roots or logs or trying the so called Box-Cox transformation.

For non-normality with non-constant variance I would consider using a generalized linear model as in STAT 402.

For non-normality, outliers and heteroscedasticity I might try a robust (non-parametric) analysis using trimmed means, medians or ... . See STAT 430.

Confidence Intervals

Usually we are interested in confidence intervals for differences between group means, that is, for things like tex2html_wrap_inline76 . Let

displaymath78

Then it is a fact that

displaymath80

where tex2html_wrap_inline82 is the degrees of freedom used in computing the MSE (which will usually be more than tex2html_wrap_inline84 as would be the case for a two sample comparison).

Thus

displaymath86

is a level tex2html_wrap_inline88 confidence interval for tex2html_wrap_inline76 .


next up previous



Richard Lockhart
Mon Feb 9 16:29:04 PST 1998