next up previous

STAT 330 Lecture 34

Reading for Today's Lecture: Chapter 13.

Goals of Today's Lecture:

Today's notes

The General Linear Model

displaymath152

for tex2html_wrap_inline154

Special Cases and Examples:

One Way Layout:

displaymath156

with parameters tex2html_wrap_inline158 , tex2html_wrap_inline160 and p=I or parameters tex2html_wrap_inline164 .

Note: tex2html_wrap_inline166 is redundant because

displaymath168

Special notes:

Two Way Layout without replicates:

displaymath190

with the restrictions

displaymath168

and

displaymath194

Multiple Regression

In multiple regression we have an equation like the above but with the tex2html_wrap_inline196 filled in with the values of more than 1 independent variable:

displaymath198

Example: We now regress hardness on SAND and FIBRE content. Previously we had treated each of these variables as merely having 3 (unordered) categories. Now we use the numerical values of those categories as the tex2html_wrap_inline200 and tex2html_wrap_inline202 .

All the models above can be written in the form

displaymath204

displaymath206

In the two way layout example we have, for instance:

displaymath208

Analysis Principles

  1. Estimate tex2html_wrap_inline210 by least squares, that is, minimize

    eqnarray95

  2. The solution tex2html_wrap_inline212 solves the normal equations

    displaymath214

    giving the matrix algebra solution

    displaymath216

  3. The vectors tex2html_wrap_inline218 and tex2html_wrap_inline220 are orthogonal and so we may form an ANOVA table as

    Source SS df
    Regression tex2html_wrap_inline222 p
    Error tex2html_wrap_inline226 n-p
    Total tex2html_wrap_inline230 n
    (not corrected)

  4. In most cases the first column of tex2html_wrap_inline234 is a column of 1s (we say the model has an intercept) and we ``correct'' the top and bottom lines of this table by subtracting tex2html_wrap_inline236 :

    Source SS df
    Regression tex2html_wrap_inline238 p-1
    Error tex2html_wrap_inline226 n-p
    Total tex2html_wrap_inline246 n
    (corrected)

  5. The ANOVA table can be used to give F tests of the hypothesis tex2html_wrap_inline252 (using the uncorrected table) or of tex2html_wrap_inline254 (using the corrected table).

SAS example: Multiple Regression

The data consist of casting hardnesses for 18 samples prepared under 3 levels of sand added and 3 levels of carbon fibre added. See Q 15 in Chapter 11. I use proc glm to regress hardness on sand content and fibre content but now treat them as continuous variables.

I ran the following SAS code:

options pagesize=60 linesize=80;
  data plaster;
  infile 'plaster.dat';
  input sand fibre hardness strength;
  proc glm  data=plaster;
   model hardness = sand fibre;
   output out=plasfit p=yhat r=resid ;
  proc univariate data=plasfit plot normal;
   var resid;
  proc plot;
   plot resid*sand;
   plot resid*fibre;
   plot resid*yhat;
  run;

The line labelled model says that I am interested in the effects of sand and fibre; the lack of the class statment makes glm do multiple regression.

The abridged output from proc glm is:

                        General Linear Models Procedure
                    Number of observations in data set = 18
Dependent Variable: HARDNESS   
                           Sum of          Mean
Source           DF       Squares        Square   F Value     Pr > F

Model             2  167.41666667   83.70833333     11.53     0.0009
Error            15  108.86111111    7.25740741
Corrected Total  17  276.27777778

          R-Square           C.V.        Root MSE        HARDNESS Mean
          0.605972       3.870011       2.6939576            69.611111

Source          DF      Type I SS     Mean Square   F Value     Pr > F
SAND             1   102.08333333    102.08333333     14.07     0.0019
FIBRE            1    65.33333333     65.33333333      9.00     0.0090

                                  T for H0:    Pr > |T|   Std Error of
Parameter            Estimate    Parameter=0                Estimate

INTERCEPT         64.36111111          50.68     0.0001     1.26994378
SAND               0.19444444           3.75     0.0019     0.05184524
FIBRE              0.09333333           3.00     0.0090     0.03110714

The conclusions are that both sand and fibre have an effect on hardness (I read the so called Type 1 SS table and see P values of 0.0019 and 0.0090 and reject the two null hypotheses). The last table permits confidence intervals for the slopes. You can, for instance, predict that a SAND content of 10% and a FIBRE content of 20% would produce a hardness of

displaymath258

The model fit should be checked by examining various diagnostic statistics and plots:

                              Univariate Procedure
Variable=RESID
                                    Moments

                    N                18  Sum Wgts         18
                    Mean              0  Sum               0
                    Std Dev    2.530533  Variance   6.403595
                    Skewness    -0.1431  Kurtosis   -0.29863
                    USS        108.8611  CSS        108.8611
                    CV                .  Std Mean   0.596452
                    T:Mean=0          0  Pr>|T|       1.0000
                    Num ^= 0         18  Num > 0           7
                    M(Sign)          -2  Pr>=|M|      0.4807
                    Sgn Rank        0.5  Pr>=|S|      0.9915
                    W:Normal   0.976631  Pr<W         0.8888

                                Quantiles(Def=5)

                     100% Max  4.388889       99%  4.388889
                      75% Q3   2.055556       95%  4.388889
                      50% Med  -0.40278       90%  3.805556
                      25% Q1   -1.36111       10%  -3.36111
                       0% Min  -5.19444        5%  -5.19444
                                               1%  -5.19444
                     Range     9.583333                    
                     Q3-Q1     3.416667                    
                     Mode      -0.86111                    


                                    Extremes

                       Lowest    Obs     Highest    Obs
                     -5.19444(       5) 2.055556(      16)
                     -3.36111(       1) 2.305556(       7)
                     -2.94444(      15) 2.305556(       8)
                     -2.02778(      13) 3.805556(       6)
                     -1.36111(       2) 4.388889(      10)


                Stem Leaf                     #             Boxplot
                   4 4                        1                |   
                   2 1338                     4             +-----+
                   0 57                       2             |  +  |
                  -0 4996530                  7             *-----*
                  -2 490                      3                |   
                  -4 2                        1                |   
                     ----+----+----+----+              


                                 Normal Probability Plot              
                 5+                                         ++*+++++  
                  |                                **++++*++          
                  |                         ++++**++                  
                  |                  *+*++** **                       
                  |          ++*+*++*                                 
                -5+  +++++*++                                         
                   +----+----+----+----+----+----+----+----+----+----+
                       -2        -1         0        +1        +2     



            Plot of RESID*SAND.  Legend: A = 1 obs, B = 2 obs, etc.

RESID |
      |
    6 +
      |
      |
      |
      |
      |
      |                                    A
    4 +
      |                                                                        A
      |
      |
      |
      |
      |B
    2 +                                    A
      |                                    A
      |                                                                        A
      |
      |
      |
      |
    0 +A
      |                                    A
      |                                    A                                   A
      |                                                                        B
      |
      |A
      |
   -2 +A
      |
      |
      |                                    A
      |
      |A
      |
   -4 +
      |
      |
      |
      |                                                                        A
      |
      |
   -6 +
      |
      -+-----------------------------------+-----------------------------------+
       0                                  15                                  30

                                         SAND

                                                                                
            Plot of RESID*FIBRE.  Legend: A = 1 obs, B = 2 obs, etc.

RESID |
      |
    6 +
      |
      |
      |
      |
      |
      |                                    A
    4 +
      |A
      |
      |
      |
      |
      |                                    B
    2 +                                                                        A
      |A
      |                                    A
      |
      |
      |
      |
    0 +                                                                        A
      |A
      |                                    B
      |                                                                        B
      |
      |A
      |
   -2 +                                                                        A
      |
      |
      |                                                                        A
      |
      |A
      |
   -4 +
      |
      |
      |
      |A
      |
      |
   -6 +
      |
      -+-----------------------------------+-----------------------------------+
       0                                  25                                  50

                                         FIBRE

            Plot of RESID*YHAT.  Legend: A = 1 obs, B = 2 obs, etc.

RESID |
      |
    6 +
      |
      |
      |
      |
      |
      |                                  A
    4 +
      |                                     A
      |
      |
      |
      |
      |                B
    2 +                                                A
      |                    A
      |                                                   A
      |
      |
      |
      |
    0 +                              A
      |                    A
      |                                  A                A
      |                                                                 B
      |
      |  A
      |
   -2 +                              A
      |
      |
      |                                                A
      |
      |  A
      |
   -4 +
      |
      |
      |
      |                                     A
      |
      |
   -6 +
      |
      -+-----------+-----------+-----------+-----------+-----------+-----------+
      64          66          68          70          72          74          76

                                         YHAT
The diagnostic plots seem fine to me.

In the two way ANOVA model fit for this data we allowed the possibility that effect of SAND depended on the level of FIBRE. We can do the same here and include an interaction term in the model. The model equation fitted by the previous run of SAS is

displaymath260

for tex2html_wrap_inline262 . Here Y is hardness, u is sand content (in %) and v is fibre content in percent. To include an interaction term we modify the model equation to

displaymath270

The coefficient tex2html_wrap_inline272 is then the interaction.

 
  options pagesize=60 linesize=80;
  data plaster;
  infile 'plaster.dat';
  input sand fibre hardness strength;
  proc anova  data=plaster;
   model hardness = sand|fibre;
  run;
which produces
                        General Linear Models Procedure

Dependent Variable: HARDNESS   
                              Sum of            Mean
Source           DF          Squares          Square   F Value     Pr > F
Model             3     168.54166667     56.18055556      7.30     0.0035
Error            14     107.73611111      7.69543651
Corrected Total  17     276.27777778

           R-Square             C.V.        Root MSE        HARDNESS Mean
           0.610044         3.985089       2.7740650            69.611111


Source           DF        Type I SS     Mean Square   F Value     Pr > F
SAND              1     102.08333333    102.08333333     13.27     0.0027
FIBRE             1      65.33333333     65.33333333      8.49     0.0113
SAND*FIBRE        1       1.12500000      1.12500000      0.15     0.7079


                                 T for H0:    Pr > |T|   Std Error of
Parameter           Estimate    Parameter=0                Estimate

INTERCEPT        63.98611111          39.14     0.0001     1.63463347
SAND              0.21944444           2.60     0.0210     0.08441211
FIBRE             0.10833333           2.14     0.0505     0.05064727
SAND*FIBRE       -0.00100000          -0.38     0.7079     0.00261541
There is no sign of a need for an interaction term so the original model seems to be reasonable. Notice that the resulting model with only 3 parameters is more parsimonious than the model for the two way layout which has 5 parameters (or 9 with an interaction term). The model asserts that hardness actually increases linearly with sand content and also with fibre content.


next up previous



Richard Lockhart
Wed Mar 18 11:16:06 PST 1998