SAS example: Multiple Regression
The data consist of casting hardnesses for 18 samples prepared under 3 levels of sand added and 3 levels of carbon fibre added. See Q 15 in Chapter 11. I use proc glm to regress hardness on sand content and fibre content but now treat them as continuous variables.
I ran the following SAS code:
options pagesize=60 linesize=80; data plaster; infile 'plaster.dat'; input sand fibre hardness strength; proc glm data=plaster; model hardness = sand fibre; output out=plasfit p=yhat r=resid ; proc univariate data=plasfit plot normal; var resid; proc plot; plot resid*sand; plot resid*fibre; plot resid*yhat; run;
The line labelled model says that I am interested in the effects of sand and fibre; the lack of the class statment makes glm do multiple regression
The abridged output from proc glm is:
General Linear Models Procedure Number of observations in data set = 18 Dependent Variable: HARDNESS Sum of Mean Source DF Squares Square F Value Pr > F Model 2 167.41666667 83.70833333 11.53 0.0009 Error 15 108.86111111 7.25740741 Corrected Total 17 276.27777778 R-Square C.V. Root MSE HARDNESS Mean 0.605972 3.870011 2.6939576 69.611111 Source DF Type I SS Mean Square F Value Pr > F SAND 1 102.08333333 102.08333333 14.07 0.0019 FIBRE 1 65.33333333 65.33333333 9.00 0.0090 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT 64.36111111 50.68 0.0001 1.26994378 SAND 0.19444444 3.75 0.0019 0.05184524 FIBRE 0.09333333 3.00 0.0090 0.03110714
The conclusions are that both sand and fibre have an effect on hardness. The last table permits confidence intervals for the slopes.
Diagnostic statistics and plots:
Univariate Procedure Variable=RESID Moments N 18 Sum Wgts 18 Mean 0 Sum 0 Std Dev 2.530533 Variance 6.403595 Skewness -0.1431 Kurtosis -0.29863 USS 108.8611 CSS 108.8611 CV . Std Mean 0.596452 T:Mean=0 0 Pr>|T| 1.0000 Num ^= 0 18 Num > 0 7 M(Sign) -2 Pr>=|M| 0.4807 Sgn Rank 0.5 Pr>=|S| 0.9915 W:Normal 0.976631 Pr<W 0.8888 Quantiles(Def=5) 100% Max 4.388889 99% 4.388889 75% Q3 2.055556 95% 4.388889 50% Med -0.40278 90% 3.805556 25% Q1 -1.36111 10% -3.36111 0% Min -5.19444 5% -5.19444 1% -5.19444 Range 9.583333 Q3-Q1 3.416667 Mode -0.86111 Extremes Lowest Obs Highest Obs -5.19444( 5) 2.055556( 16) -3.36111( 1) 2.305556( 7) -2.94444( 15) 2.305556( 8) -2.02778( 13) 3.805556( 6) -1.36111( 2) 4.388889( 10) Stem Leaf # Boxplot 4 4 1 | 2 1338 4 +-----+ 0 57 2 | + | -0 4996530 7 *-----* -2 490 3 | -4 2 1 | ----+----+----+----+ Normal Probability Plot 5+ ++*+++++ | **++++*++ | ++++**++ | *+*++** ** | ++*+*++* -5+ +++++*++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 Plot of RESID*SAND. Legend: A = 1 obs, B = 2 obs, etc. RESID | | 6 + | | | | | | A 4 + | A | | | | |B 2 + A | A | A | | | | 0 +A | A | A A | B | |A | -2 +A | | | A | |A | -4 + | | | | A | | -6 + | -+-----------------------------------+-----------------------------------+ 0 15 30 SAND Plot of RESID*FIBRE. Legend: A = 1 obs, B = 2 obs, etc. RESID | | 6 + | | | | | | A 4 + |A | | | | | B 2 + A |A | A | | | | 0 + A |A | B | B | |A | -2 + A | | | A | |A | -4 + | | | |A | | -6 + | -+-----------------------------------+-----------------------------------+ 0 25 50 FIBRE Plot of RESID*YHAT. Legend: A = 1 obs, B = 2 obs, etc. RESID | | 6 + | | | | | | A 4 + | A | | | | | B 2 + A | A | A | | | | 0 + A | A | A A | B | | A | -2 + A | | | A | | A | -4 + | | | | A | | -6 + | -+-----------+-----------+-----------+-----------+-----------+-----------+ 64 66 68 70 72 74 76 YHAT
The diagnostic plots seem fine to me.
The model can be run with an interaction term:
options pagesize=60 linesize=80; data plaster; infile 'plaster.dat'; input sand fibre hardness strength; proc anova data=plaster; model hardness = sand|fibre; run;which produces
General Linear Models Procedure Dependent Variable: HARDNESS Sum of Mean Source DF Squares Square F Value Pr > F Model 3 168.54166667 56.18055556 7.30 0.0035 Error 14 107.73611111 7.69543651 Corrected Total 17 276.27777778 R-Square C.V. Root MSE HARDNESS Mean 0.610044 3.985089 2.7740650 69.611111 Source DF Type I SS Mean Square F Value Pr > F SAND 1 102.08333333 102.08333333 13.27 0.0027 FIBRE 1 65.33333333 65.33333333 8.49 0.0113 SAND*FIBRE 1 1.12500000 1.12500000 0.15 0.7079 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT 63.98611111 39.14 0.0001 1.63463347 SAND 0.21944444 2.60 0.0210 0.08441211 FIBRE 0.10833333 2.14 0.0505 0.05064727 SAND*FIBRE -0.00100000 -0.38 0.7079 0.00261541
There is no sign of a need for an interaction term so the original model
seems to be reasonable. Notice that the resulting model with only 3
parameters is more parsimonious than the model for the two way layout which
has 5 parameters (or 9 with an interaction term). The model asserts that
hardness actually increases linearly with sand content and also with fibre
content.