STAT 350: Lecture 11 Example
Multiple Regression
In the data set below the hardness of plaster is measured for each of 9 combinations of sand content and fibre content. Sand content was set at one of 3 levels as was fibre content and all possible combinations tried on two batches of plaster.
Here is an excerpt of the data:
Sand Fibre Hardness Strength 0 0 61 34 0 0 63 16 15 0 67 36 15 0 69 19 30 0 65 28 ...The complete data set is here.
I fit submodels of the following "Full" model:
I adopt the idea that the interaction term is probably negligible unless each of S and F have some effect and that quadratic terms will probably not be present unless linear terms are present. This limits the set of potential reasonable models. I fit each of them and report the error sum of squares in the following table
Model for | Error Sum of Squares | Error df |
Full | 81.264 | 12 |
82.389 | 13 | |
104.167 | 14 | |
169.500 | 15 | |
174.194 | 16 | |
87.083 | 14 | |
189.167 | 15 | |
210.944 | 16 | |
108.861 | 15 |
I begin by asking whether the 2nd degree polynomial terms, that is, those involving and need be included. To do so I compare the top line with the model containg only . The extra SS is 108.861-81.264 on 3 degrees of freedom which gives a mean square of (108.861-81.264)/3= 9.199. The MSE is 81.264/12 = 6.772. This gives an F-statistic of 9.199/6.772=1.358 on 3 numerator and 12 denominator degrees of freedom. This gives a P-value of 0.30 which is not sigmificant. We would then delete the quadratic terms and consider the coefficients of S and F. We have a choice between pretending that the last line in the table is now the "Full" model and forming the F-statistics (210.944-108.861)/(108.861/15) = 14.066 and (174.194-108.861)/(108.861/15) = 9.002. The first is for testing and the second for . Each is on 1 and 15 degrees of freedom. The corresponding P-values are 0.002 and 0.009. This are both highly significant and we conclude that both Sand content and Fibre content have an impact on hardness and that there is little reason to look for non-linear impacts of the the two factors.
An alternative starting point would be to check first to see if the interaction terms could be eliminated, that is, test the hypothesis that . This hypothesis can be tested either using the F statistic [(82.389-81.264)/1}/[12.264/12] = 0.166 or using the t-statistic which is and which SAS calculates to be -0.41. Note that to within round-off error. Algebraically . Note, too, that the t test can be made one-sided while the F-test cannot.