STAT 350: Lecture 25
Interaction effects
Examples
Two way analysis of variance
DESIGN MATRIX
This design matrix corresponds to a model equation
where the are interaction effects satisfying .
Analysis of Covariance
This is the name given to the analysis of models in which there are categorical factors and continuous covariates. In the car example we had the categorical factor VEHICLE and the continuous covariate MILEAGE. Earlier I gave the design matrix for the model in which there are different intercepts for the two cars but 1 common slope. thus this model is 2 parallel lines. If we use corner point coding and fit a model in which VEHICLE and MILEAGE interact then the design matrix for the small data set above is
where the last column is the product of columns 2 and 3. The design matrix corresponds to a model equation with some slope for the first vehicle and a slope for the second vehicle. That is, the coefficient of the last column of the design matrix is the difference in slopes between the 2 vehicles. Use of the alternative coding based on an average intercept leads now to the design matrix
Again the last column is the product of columns 2 and 3. the coefficient of column 3 is the average slope while the coefficient of the last column is the difference between the slope for vehicle 1 and this average slope.
You saw, in assignment 3, how to test the hypothesis of no interaction in this model.
Analysis of Models with Interaction Terms
Examples
Two way ANOVA: influence of SCHOOL, REGION on STAY
options pagesize=60 linesize=80; data scenic; infile 'scenic.dat' firstobs=2; input Stay Age Risk Culture Chest Beds School Region Census Nurses Facil; proc glm data=scenic; class school region ; model Stay = School | Region / E SOLUTION SS1 SS2 SS3 SS4 XPX INVERSE; output out=scout P=Fitted PRESS=PRESS H=HAT RSTUDENT =EXTST R=RESID DFFITS=DFFITS COOKD=COOKD; run ; proc means data=scout; var stay; class school region; run; proc print data=scout;
EDITED SAS OUTPUT (Complete output)
The X'X Matrix INTERCEPT SCHOOL 1 SCHOOL 2 REGION 1 REGION 2 INTERCEPT 113 17 96 28 32 SCHOOL 1 17 17 0 5 7 SCHOOL 2 96 0 96 23 25 REGION 1 28 5 23 28 0 REGION 2 32 7 25 0 32 REGION 3 37 3 34 0 0 REGION 4 16 2 14 0 0 DUMMY001 5 5 0 5 0 DUMMY002 7 7 0 0 7 DUMMY003 3 3 0 0 0 DUMMY004 2 2 0 0 0 DUMMY005 23 0 23 23 0 DUMMY006 25 0 25 0 25 DUMMY007 34 0 34 0 0 DUMMY008 14 0 14 0 0 STAY 1090.26 186.85 903.41 310.49 309.87
X'X Generalized Inverse (g2) INTERCEPT SCHOOL 1 SCHOOL 2 REGION 1 REGION 2 INTERCEPT 0.0714285714 -0.071428571 0 -0.071428571 -0.071428571 SCHOOL 1 -0.071428571 0.5714285714 0 0.0714285714 0.0714285714 SCHOOL 2 0 0 0 0 0 REGION 1 -0.071428571 0.0714285714 0 0.1149068323 0.0714285714 REGION 2 -0.071428571 0.0714285714 0 0.0714285714 0.1114285714 REGION 3 -0.071428571 0.0714285714 0 0.0714285714 0.0714285714 REGION 4 0 0 0 0 0 DUMMY001 0.0714285714 -0.571428571 0 -0.114906832 -0.071428571 DUMMY002 0.0714285714 -0.571428571 0 -0.071428571 -0.111428571 DUMMY003 0.0714285714 -0.571428571 0 -0.071428571 -0.071428571 DUMMY004 0 0 0 0 0 DUMMY005 0 0 0 0 0 DUMMY006 0 0 0 0 0 DUMMY007 0 0 0 0 0 DUMMY008 0 0 0 0 0 STAY 7.89 1.79 0 2.9304347826 1.5372
Dependent Variable: STAY Sum of Mean Source DF Squares Square F Value Pr > F Model 7 132.06558693 18.86651242 7.15 0.0001 Error 105 277.14479360 2.63947422 Corrected Total 112 409.21038053 R-Square C.V. Root MSE STAY Mean 0.322733 16.83864 1.6246459 9.6483186 Source DF Type I SS Mean Square F Value Pr > F SCHOOL 1 36.08413010 36.08413010 13.67 0.0003 REGION 3 95.36410217 31.78803406 12.04 0.0001 SCHOOL*REGION 3 0.61735466 0.20578489 0.08 0.9718 Source DF Type II SS Mean Square F Value Pr > F SCHOOL 1 27.89404890 27.89404890 10.57 0.0015 REGION 3 95.36410217 31.78803406 12.04 0.0001 SCHOOL*REGION 3 0.61735466 0.20578489 0.08 0.9718 Source DF Type III SS Mean Square F Value Pr > F SCHOOL 1 26.05955792 26.05955792 9.87 0.0022 REGION 3 47.01938029 15.67312676 5.94 0.0009 SCHOOL*REGION 3 0.61735466 0.20578489 0.08 0.9718 Source DF Type IV SS Mean Square F Value Pr > F SCHOOL 1 26.05955792 26.05955792 9.87 0.0022 REGION 3 47.01938029 15.67312676 5.94 0.0009 SCHOOL*REGION 3 0.61735466 0.20578489 0.08 0.9718 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT 7.890000000 B 18.17 0.0001 0.43420487 SCHOOL 1 1.790000000 B 1.46 0.1480 1.22811685 2 0.000000000 B . . . REGION 1 2.930434783 B 5.32 0.0001 0.55072100 2 1.537200000 B 2.83 0.0055 0.54232171 3 1.180588235 B 2.29 0.0241 0.51591227 4 0.000000000 B . . . SCHOOL*REGION 1 1 -0.286434783 B -0.20 0.8455 1.46660342 1 2 -0.618628571 B -0.44 0.6620 1.41099883 1 3 -0.300588235 B -0.19 0.8486 1.57026346 1 4 0.000000000 B . . . SCHOOL*REGION 2 1 0.000000000 B . . . 2 2 0.000000000 B . . . 2 3 0.000000000 B . . . 2 4 0.000000000 B . . . NOTE: The X'X matrix has been found to be singular and a generalized inverse was used to solve the normal equations. Estimates followed by the letter 'B' are biased, and are not unique estimators of the parameters. SCHOOL REGION N Obs N Mean Std Dev Minimum -------------------------------------------------------------------------------- 1 1 5 5 12.3240000 3.3527198 9.7800000 2 7 7 10.5985714 1.1317454 8.2800000 3 3 3 10.5600000 0.7362744 10.1200000 4 2 2 9.6800000 0.6788225 9.2000000 2 1 23 23 10.8204348 2.5061460 8.0300000 2 25 25 9.4272000 1.0978635 7.3900000 3 34 34 9.0705882 1.1911516 7.0800000 4 14 14 7.8900000 0.8332420 6.7000000 -------------------------------------------------------------------------------- OBS STAY AGE RISK CULTURE CHEST BEDS SCHOOL REGION CENSUS NURSES FACIL 23 9.78 52.3 5.0 17.6 95.9 270 1 1 240 198 57.1 25 9.20 52.2 4.0 17.5 71.1 298 1 4 244 236 57.1 26 8.28 49.5 3.9 12.0 113.1 546 1 2 413 436 57.1 44 10.12 51.7 5.6 14.9 79.1 362 1 3 313 264 54.3 46 10.16 54.2 4.6 8.4 51.5 831 1 4 581 629 74.3 47 19.56 59.9 6.5 17.2 113.7 306 2 1 273 172 51.4 74 10.05 52.0 4.5 36.7 87.5 184 1 1 144 151 68.6 90 11.41 50.4 5.8 23.8 73.0 424 1 3 359 335 45.7 100 10.15 51.9 6.2 16.4 59.2 568 1 3 452 371 62.9 112 17.94 56.2 5.9 26.4 91.8 835 1 1 791 407 62.9 OBS FITTED PRESS HAT EXTST RESID DFFITS COOKD 23 12.3240 -3.18000 0.20000 -1.76835 -2.54400 -0.88418 0.09578 25 9.6800 -0.96000 0.50000 -0.41618 -0.48000 -0.41618 0.02182 26 10.5986 -2.70500 0.14286 -1.55177 -2.31857 -0.63351 0.04950 44 10.5600 -0.66000 0.33333 -0.33029 -0.44000 -0.23355 0.00688 46 9.6800 0.96000 0.50000 0.41618 0.48000 0.41618 0.02182 47 10.8204 9.13682 0.04348 6.48789 8.73957 1.38322 0.17189 74 12.3240 -2.84250 0.20000 -1.57592 -2.27400 -0.78796 0.07653 90 10.5600 1.27500 0.33333 0.63897 0.85000 0.45182 0.02566 100 10.5600 -0.61500 0.33333 -0.30774 -0.41000 -0.21761 0.00597 112 12.3240 7.02000 0.20000 4.15303 5.61600 2.07652 0.46676
Comments on code and results
Analysis of covariance example
Here I regress STAY on SCHOOL, REGION and FACILITIES. I begin by putting in all the possible interaction effects.
options pagesize=60 linesize=80; data scenic; infile 'scenic.dat' firstobs=2; input Stay Age Risk Culture Chest Beds School Region Census Nurses Facil; proc glm data=scenic; class school region ; model Stay = School | Region | Facil / SS1 SS2 SS3 ; output out=scout P=Fitted PRESS=PRESS H=HAT RSTUDENT =EXTST R=RESID DFFITS=DFFITS COOKD=COOKD; run ; proc print data=scout; proc glm data=scenic; class school region ; model Stay = School | Region Facil / SS1 SS2 SS3 ; run ;EDITED SAS OUTPUT (Complete output)
Dependent Variable: STAY Sum of Mean Source DF Squares Square F Value Pr > F Model 15 173.90201568 11.59346771 4.78 0.0001 Error 97 235.30836485 2.42585943 Corrected Total 112 409.21038053 R-Square C.V. Root MSE STAY Mean 0.424970 16.14289 1.5575171 9.6483186 Source DF Type I SS Mean Square F Value Pr > F SCHOOL 1 36.08413010 36.08413010 14.87 0.0002 REGION 3 95.36410217 31.78803406 13.10 0.0001 SCHOOL*REGION 3 0.61735466 0.20578489 0.08 0.9682 FACIL 1 9.52496125 9.52496125 3.93 0.0504 FACIL*SCHOOL 1 1.32686372 1.32686372 0.55 0.4613 FACIL*REGION 3 21.28634656 7.09544885 2.92 0.0377 FACIL*SCHOOL*REGION 3 9.69825722 3.23275241 1.33 0.2683 Source DF Type II SS Mean Square F Value Pr > F SCHOOL 1 4.73069924 4.73069924 1.95 0.1658 REGION 3 8.16560072 2.72186691 1.12 0.3441 SCHOOL*REGION 3 7.04260265 2.34753422 0.97 0.4113 FACIL 1 9.52496125 9.52496125 3.93 0.0504 FACIL*SCHOOL 1 3.76491803 3.76491803 1.55 0.2158 FACIL*REGION 3 21.28634656 7.09544885 2.92 0.0377 FACIL*SCHOOL*REGION 3 9.69825722 3.23275241 1.33 0.2683 Source DF Type III SS Mean Square F Value Pr > F SCHOOL 1 2.34679006 2.34679006 0.97 0.3278 REGION 3 2.46002453 0.82000818 0.34 0.7979 SCHOOL*REGION 3 7.04260265 2.34753422 0.97 0.4113 FACIL 1 0.70390965 0.70390965 0.29 0.5913 FACIL*SCHOOL 1 1.50831325 1.50831325 0.62 0.4323 FACIL*REGION 3 1.92051520 0.64017173 0.26 0.8513 FACIL*SCHOOL*REGION 3 9.69825722 3.23275241 1.33 0.2683 OBS STAY AGE RISK CULTURE CHEST BEDS SCHOOL REGION CENSUS NURSES FACIL 25 9.20 52.2 4.0 17.5 71.1 298 1 4 244 236 57.1 46 10.16 54.2 4.6 8.4 51.5 831 1 4 581 629 74.3 47 19.56 59.9 6.5 17.2 113.7 306 2 1 273 172 51.4 OBS FITTED PRESS HAT EXTST RESID DFFITS COOKD 25 9.2000 . 1.00000 . -0.00000 . . 46 10.1600 . 1.00000 . 0.00000 . . 47 11.8970 8.29701 0.07641 5.96177 7.66301 1.71483 0.13553COMMENTS
The slopes and intercepts have been decomposed in the same way that the means in a 2 way layout are decomposed into main effects and interactions. Normally we might begin by looking for any interaction of facility with anything by comparing the full model to a model with no interaction effects. This is what the second proc glm run does. More of the output follows.
Dependent Variable: STAY Sum of Mean Source DF Squares Square F Value Pr > F Model 8 141.59054818 17.69881852 6.88 0.0001 Error 104 267.61983235 2.57326762 Corrected Total 112 409.21038053 R-Square C.V. Root MSE STAY Mean 0.346009 16.62612 1.6041408 9.6483186 Source DF Type I SS Mean Square F Value Pr > F SCHOOL 1 36.08413010 36.08413010 14.02 0.0003 REGION 3 95.36410217 31.78803406 12.35 0.0001 SCHOOL*REGION 3 0.61735466 0.20578489 0.08 0.9708 FACIL 1 9.52496125 9.52496125 3.70 0.0571 Source DF Type II SS Mean Square F Value Pr > F SCHOOL 1 8.66242211 8.66242211 3.37 0.0694 REGION 3 82.48995156 27.49665052 10.69 0.0001 SCHOOL*REGION 3 0.48049197 0.16016399 0.06 0.9796 FACIL 1 9.52496125 9.52496125 3.70 0.0571 Source DF Type III SS Mean Square F Value Pr > F SCHOOL 1 8.45264294 8.45264294 3.28 0.0728 REGION 3 42.65719728 14.21906576 5.53 0.0015 SCHOOL*REGION 3 0.48049197 0.16016399 0.06 0.9796 FACIL 1 9.52496125 9.52496125 3.70 0.0571