next up previous
Next: About this document

STAT 330: 95-3

SAS example: Simple Linear Regression

The data consist of 14 pairs of measurements on the independent variable Burner Area Liberation Rate (in million BTU per hr-ft) and the dependent variable Nitrogen Oxides (NO) Emission Rate (in parts per million). See Q 9 in Chapter 12. I use proc glm to fit a simple linear regression model to assess the effect of Burner Area on NO emissions.

I ran the following SAS code:

  options pagesize=60 linesize=80;
  data nox;
  infile 'ch12q9.dat';
  input area emission ;
  proc glm  data=nox;
   model emission = area;
   output out=noxfit p=yhat r=resid ;
  proc univariate data=noxfit plot normal;
   var resid;
  proc plot;
   plot resid*area;
   plot resid*yhat;
  run;

The line labelled model says that I am interested in the effects of area (my shorthand name for ``Burner Area Liberation Rate'') on emissions.

The output from proc glm is

                                 The SAS System                                1
                                                 10:00 Monday, November 20, 1995

                        General Linear Models Procedure

                    Number of observations in data set = 14

                                 The SAS System                                2
                                                 10:00 Monday, November 20, 1995

                        General Linear Models Procedure

Dependent Variable: EMISSION
                                     Sum of            Mean
Source                  DF          Squares          Square   F Value     Pr > F

Model                    1     398030.26093    398030.26093    294.74     0.0001

Error                   12      16205.45335      1350.45445

Corrected Total         13     414235.71429

                  R-Square             C.V.        Root MSE        EMISSION Mean

                  0.960879         10.26905       36.748530            357.85714


Source                  DF        Type I SS     Mean Square   F Value     Pr > F

AREA                     1     398030.26093    398030.26093    294.74     0.0001

Source                  DF      Type III SS     Mean Square   F Value     Pr > F

AREA                     1     398030.26093    398030.26093    294.74     0.0001


                                        T for H0:    Pr > |T|   Std Error of
Parameter                  Estimate    Parameter=0                Estimate

INTERCEPT              -45.55190539          -1.79     0.0989    25.46779420
AREA                     1.71143233          17.17     0.0001     0.09968772



                                 The SAS System                                3
                                                 10:00 Monday, November 20, 1995

                        General Linear Models Procedure

                    Number of observations in data set = 14

                                 The SAS System                                4
                                                 10:00 Monday, November 20, 1995

                        General Linear Models Procedure

Dependent Variable: EMISSION
                                     Sum of            Mean
Source                  DF          Squares          Square   F Value     Pr > F

Model                    1     398030.26093    398030.26093    294.74     0.0001

Error                   12      16205.45335      1350.45445

Corrected Total         13     414235.71429

                  R-Square             C.V.        Root MSE        EMISSION Mean

                  0.960879         10.26905       36.748530            357.85714


Source                  DF        Type I SS     Mean Square   F Value     Pr > F

AREA                     1     398030.26093    398030.26093    294.74     0.0001

Source                  DF      Type III SS     Mean Square   F Value     Pr > F

AREA                     1     398030.26093    398030.26093    294.74     0.0001


                                        T for H0:    Pr > |T|   Std Error of
Parameter                  Estimate    Parameter=0                Estimate

INTERCEPT              -45.55190539          -1.79     0.0989    25.46779420
AREA                     1.71143233          17.17     0.0001     0.09968772



                                 The SAS System                                5
                                                 10:00 Monday, November 20, 1995

                        General Linear Models Procedure

                    Number of observations in data set = 14

                                 The SAS System                                6
                                                 10:00 Monday, November 20, 1995

                        General Linear Models Procedure

Dependent Variable: EMISSION
                                     Sum of            Mean
Source                  DF          Squares          Square   F Value     Pr > F

Model                    1     398030.26093    398030.26093    294.74     0.0001

Error                   12      16205.45335      1350.45445

Corrected Total         13     414235.71429

                  R-Square             C.V.        Root MSE        EMISSION Mean

                  0.960879         10.26905       36.748530            357.85714


Source                  DF        Type I SS     Mean Square   F Value     Pr > F

AREA                     1     398030.26093    398030.26093    294.74     0.0001

Source                  DF      Type III SS     Mean Square   F Value     Pr > F

AREA                     1     398030.26093    398030.26093    294.74     0.0001


                                        T for H0:    Pr > |T|   Std Error of
Parameter                  Estimate    Parameter=0                Estimate

INTERCEPT              -45.55190539          -1.79     0.0989    25.46779420
AREA                     1.71143233          17.17     0.0001     0.09968772



                                 The SAS System                                7
                                                 10:00 Monday, November 20, 1995

                        General Linear Models Procedure

                    Number of observations in data set = 14

                                 The SAS System                                8
                                                 10:00 Monday, November 20, 1995

                        General Linear Models Procedure

Dependent Variable: EMISSION
                                     Sum of            Mean
Source                  DF          Squares          Square   F Value     Pr > F

Model                    1     398030.26093    398030.26093    294.74     0.0001

Error                   12      16205.45335      1350.45445

Corrected Total         13     414235.71429

                  R-Square             C.V.        Root MSE        EMISSION Mean

                  0.960879         10.26905       36.748530            357.85714


Source                  DF        Type I SS     Mean Square   F Value     Pr > F

AREA                     1     398030.26093    398030.26093    294.74     0.0001

Source                  DF      Type III SS     Mean Square   F Value     Pr > F

AREA                     1     398030.26093    398030.26093    294.74     0.0001


                                        T for H0:    Pr > |T|   Std Error of
Parameter                  Estimate    Parameter=0                Estimate

INTERCEPT              -45.55190539          -1.79     0.0989    25.46779420
AREA                     1.71143233          17.17     0.0001     0.09968772



                                 The SAS System                                9
                                                 10:00 Monday, November 20, 1995

                              Univariate Procedure

Variable=RESID

                                    Moments

                    N                14  Sum Wgts         14
                    Mean              0  Sum               0
                    Std Dev    35.30685  Variance   1246.573
                    Skewness   -0.57524  Kurtosis    0.14238
                    USS        16205.45  CSS        16205.45
                    CV                .  Std Mean   9.436151
                    T:Mean=0          0  Pr>|T|       1.0000
                    Num ^= 0         14  Num > 0           7
                    M(Sign)           0  Pr>=|M|      1.0000
                    Sgn Rank        2.5  Pr>=|S|      0.9032
                    W:Normal   0.939768  Pr<W         0.3981


                                Quantiles(Def=5)

                     100% Max  47.69382       99%  47.69382
                      75% Q3   24.40867       95%  47.69382
                      50% Med  5.229961       90%  46.55059
                      25% Q1   -27.8778       10%   -29.021
                       0% Min  -77.8778        5%  -77.8778
                                               1%  -77.8778
                     Range     125.5716
                     Q3-Q1     52.28647
                     Mode      -77.8778


                                    Extremes

                       Lowest    Obs     Highest    Obs
                     -77.8778(      11) 23.26544(       6)
                      -29.021(      13) 24.40867(       1)
                     -28.3771(       2) 30.97898(      14)
                     -27.8778(      10) 46.55059(      12)
                     -21.1629(       5) 47.69382(       9)


                Stem Leaf                     #             Boxplot
                   4 78                       2                |
                   2 341                      3             +-----+
                   0 28                       2             *--+--*
                  -0 71                       2             |     |
                  -2 9881                     4             +-----+
                  -4                                           |
                  -6 8                        1                |
                     ----+----+----+----+
                 Multiply Stem.Leaf by 10**+1


                                 The SAS System                               10
                                                 10:00 Monday, November 20, 1995

                              Univariate Procedure

Variable=RESID

                                 Normal Probability Plot
                50+                                     *++++*
                  |                              *+*+*++
                  |                         +*+*++
               -10+                    ++*+*
                  |             *++*+*+*
                  |         +++++
               -70+   +++++*
                   +----+----+----+----+----+----+----+----+----+----+
                       -2        -1         0        +1        +2

The conclusions are that AREA has a very significant and strong effect on emissions, that the intercept of the linear regression might be 0 and that the estimated slope is 1.710.1. The diagnostic plots suggest no particularly obvious problems.

            Plot of RESID*AREA.  Legend: A = 1 obs, B = 2 obs, etc.
RESID |
      |
   60 +
      |
      |
      |
      |                                    A                       A
      |
   40 +
      |
      |
      |                                                                        A
      |
      |A                       A
   20 +
      |                                    A
      |
      |      A
      |
      |
    0 +            A
      |
      |
      |
      |
      |                        A
  -20 +            A
      |
      |                                                A
      |      A                                                                 A
      |
      |
  -40 +
      |
      |
      |
      |
      |
  -60 +
      |
      |
      |
      |
      |                                                A
  -80 +
      |
      -+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
      100   125   150   175   200   225   250   275   300   325   350   375  400
                                         AREA

            Plot of RESID*YHAT.  Legend: A = 1 obs, B = 2 obs, etc.
RESID |
      |
   60 +
      |
      |
      |
      |                                  A                   A
      |
   40 +
      |
      |
      |                                                                 A
      |
      |   A                    A
   20 +
      |                                  A
      |
      |        A
      |
      |
    0 +             A
      |
      |
      |
      |
      |                        A
  -20 +             A
      |
      |                                            A
      |        A                                                        A
      |
      |
  -40 +
      |
      |
      |
      |
      |
  -60 +
      |
      |
      |
      |
      |                                            A
  -80 +
      |
      -+-----------+-----------+-----------+-----------+-----------+-----------+
      100         200         300         400         500         600        700
                                         YHAT





Richard Lockhart
Thu Sep 5 16:06:36 PDT 1996