next up previous
Next: About this document

STAT 330: 95-3

SAS example: Multiple Regression

The data consist of casting hardnesses for 18 samples prepared under 3 levels of sand added and 3 levels of carbon fibre added. See Q 15 in Chapter 11. I use proc glm to regress hardness on sand content and fibre content but now treat them as continuous variables.

I ran the following SAS code:

options pagesize=60 linesize=80;
  data plaster;
  infile 'plaster.dat';
  input sand fibre hardness strength;
  proc glm  data=plaster;
   model hardness = sand fibre;
   output out=plasfit p=yhat r=resid ;
  proc univariate data=plasfit plot normal;
   var resid;
  proc plot;
   plot resid*sand;
   plot resid*fibre;
   plot resid*yhat;
  run;

The line labelled model says that I am interested in the effects of sand and fibre; the lack of the class statment makes glm do multiple regression

The abridged output from proc glm is:

                        General Linear Models Procedure
                    Number of observations in data set = 18
Dependent Variable: HARDNESS   
                                     Sum of            Mean
Source                  DF          Squares          Square   F Value     Pr > F

Model                    2     167.41666667     83.70833333     11.53     0.0009
Error                   15     108.86111111      7.25740741
Corrected Total         17     276.27777778

                  R-Square             C.V.        Root MSE        HARDNESS Mean
                  0.605972         3.870011       2.6939576            69.611111

Source                  DF        Type I SS     Mean Square   F Value     Pr > F
SAND                     1     102.08333333    102.08333333     14.07     0.0019
FIBRE                    1      65.33333333     65.33333333      9.00     0.0090

                                        T for H0:    Pr > |T|   Std Error of
Parameter                  Estimate    Parameter=0                Estimate

INTERCEPT               64.36111111          50.68     0.0001     1.26994378
SAND                     0.19444444           3.75     0.0019     0.05184524
FIBRE                    0.09333333           3.00     0.0090     0.03110714

The conclusions are that both sand and fibre have an effect on hardness. The last table permits confidence intervals for the slopes.

Diagnostic statistics and plots:

                              Univariate Procedure
Variable=RESID
                                    Moments

                    N                18  Sum Wgts         18
                    Mean              0  Sum               0
                    Std Dev    2.530533  Variance   6.403595
                    Skewness    -0.1431  Kurtosis   -0.29863
                    USS        108.8611  CSS        108.8611
                    CV                .  Std Mean   0.596452
                    T:Mean=0          0  Pr>|T|       1.0000
                    Num ^= 0         18  Num > 0           7
                    M(Sign)          -2  Pr>=|M|      0.4807
                    Sgn Rank        0.5  Pr>=|S|      0.9915
                    W:Normal   0.976631  Pr<W         0.8888

                                Quantiles(Def=5)

                     100% Max  4.388889       99%  4.388889
                      75% Q3   2.055556       95%  4.388889
                      50% Med  -0.40278       90%  3.805556
                      25% Q1   -1.36111       10%  -3.36111
                       0% Min  -5.19444        5%  -5.19444
                                               1%  -5.19444
                     Range     9.583333                    
                     Q3-Q1     3.416667                    
                     Mode      -0.86111                    


                                    Extremes

                       Lowest    Obs     Highest    Obs
                     -5.19444(       5) 2.055556(      16)
                     -3.36111(       1) 2.305556(       7)
                     -2.94444(      15) 2.305556(       8)
                     -2.02778(      13) 3.805556(       6)
                     -1.36111(       2) 4.388889(      10)


                Stem Leaf                     #             Boxplot
                   4 4                        1                |   
                   2 1338                     4             +-----+
                   0 57                       2             |  +  |
                  -0 4996530                  7             *-----*
                  -2 490                      3                |   
                  -4 2                        1                |   
                     ----+----+----+----+              


                                 Normal Probability Plot              
                 5+                                         ++*+++++  
                  |                                **++++*++          
                  |                         ++++**++                  
                  |                  *+*++** **                       
                  |          ++*+*++*                                 
                -5+  +++++*++                                         
                   +----+----+----+----+----+----+----+----+----+----+
                       -2        -1         0        +1        +2     



            Plot of RESID*SAND.  Legend: A = 1 obs, B = 2 obs, etc.

RESID |
      |
    6 +
      |
      |
      |
      |
      |
      |                                    A
    4 +
      |                                                                        A
      |
      |
      |
      |
      |B
    2 +                                    A
      |                                    A
      |                                                                        A
      |
      |
      |
      |
    0 +A
      |                                    A
      |                                    A                                   A
      |                                                                        B
      |
      |A
      |
   -2 +A
      |
      |
      |                                    A
      |
      |A
      |
   -4 +
      |
      |
      |
      |                                                                        A
      |
      |
   -6 +
      |
      -+-----------------------------------+-----------------------------------+
       0                                  15                                  30

                                         SAND

                                                                                
            Plot of RESID*FIBRE.  Legend: A = 1 obs, B = 2 obs, etc.

RESID |
      |
    6 +
      |
      |
      |
      |
      |
      |                                    A
    4 +
      |A
      |
      |
      |
      |
      |                                    B
    2 +                                                                        A
      |A
      |                                    A
      |
      |
      |
      |
    0 +                                                                        A
      |A
      |                                    B
      |                                                                        B
      |
      |A
      |
   -2 +                                                                        A
      |
      |
      |                                                                        A
      |
      |A
      |
   -4 +
      |
      |
      |
      |A
      |
      |
   -6 +
      |
      -+-----------------------------------+-----------------------------------+
       0                                  25                                  50

                                         FIBRE

            Plot of RESID*YHAT.  Legend: A = 1 obs, B = 2 obs, etc.

RESID |
      |
    6 +
      |
      |
      |
      |
      |
      |                                  A
    4 +
      |                                     A
      |
      |
      |
      |
      |                B
    2 +                                                A
      |                    A
      |                                                   A
      |
      |
      |
      |
    0 +                              A
      |                    A
      |                                  A                A
      |                                                                 B
      |
      |  A
      |
   -2 +                              A
      |
      |
      |                                                A
      |
      |  A
      |
   -4 +
      |
      |
      |
      |                                     A
      |
      |
   -6 +
      |
      -+-----------+-----------+-----------+-----------+-----------+-----------+
      64          66          68          70          72          74          76

                                         YHAT

The diagnostic plots seem fine to me.

The model can be run with an interaction term:

 
  options pagesize=60 linesize=80;
  data plaster;
  infile 'plaster.dat';
  input sand fibre hardness strength;
  proc anova  data=plaster;
   model hardness = sand|fibre;
  run;
which produces
                        General Linear Models Procedure

Dependent Variable: HARDNESS   
                                     Sum of            Mean
Source                  DF          Squares          Square   F Value     Pr > F
Model                    3     168.54166667     56.18055556      7.30     0.0035
Error                   14     107.73611111      7.69543651
Corrected Total         17     276.27777778

                  R-Square             C.V.        Root MSE        HARDNESS Mean
                  0.610044         3.985089       2.7740650            69.611111


Source                  DF        Type I SS     Mean Square   F Value     Pr > F
SAND                     1     102.08333333    102.08333333     13.27     0.0027
FIBRE                    1      65.33333333     65.33333333      8.49     0.0113
SAND*FIBRE               1       1.12500000      1.12500000      0.15     0.7079


                                        T for H0:    Pr > |T|   Std Error of
Parameter                  Estimate    Parameter=0                Estimate

INTERCEPT               63.98611111          39.14     0.0001     1.63463347
SAND                     0.21944444           2.60     0.0210     0.08441211
FIBRE                    0.10833333           2.14     0.0505     0.05064727
SAND*FIBRE              -0.00100000          -0.38     0.7079     0.00261541

There is no sign of a need for an interaction term so the original model seems to be reasonable. Notice that the resulting model with only 3 parameters is more parsimonious than the model for the two way layout which has 5 parameters (or 9 with an interaction term). The model asserts that hardness actually increases linearly with sand content and also with fibre content.





Richard Lockhart
Mon Oct 21 23:28:57 PDT 1996