SAS example: Simple Linear Regression
The data consist of 14 pairs of measurements on the independent variable Burner Area Liberation Rate (in million BTU per hr-ft) and the dependent variable Nitrogen Oxides (NO) Emission Rate (in parts per million). See Q 9 in Chapter 12. I use proc glm to fit a simple linear regression model to assess the effect of Burner Area on NO emissions.
I ran the following SAS code:
options pagesize=60 linesize=80; data nox; infile 'ch12q9.dat'; input area emission ; proc glm data=nox; model emission = area; output out=noxfit p=yhat r=resid ; proc univariate data=noxfit plot normal; var resid; proc plot; plot resid*area; plot resid*yhat; run;
The line labelled model says that I am interested in the effects of area (my shorthand name for ``Burner Area Liberation Rate'') on emissions.
The output from proc glm is
The SAS System 1 10:00 Monday, November 20, 1995 General Linear Models Procedure Number of observations in data set = 14 The SAS System 2 10:00 Monday, November 20, 1995 General Linear Models Procedure Dependent Variable: EMISSION Sum of Mean Source DF Squares Square F Value Pr > F Model 1 398030.26093 398030.26093 294.74 0.0001 Error 12 16205.45335 1350.45445 Corrected Total 13 414235.71429 R-Square C.V. Root MSE EMISSION Mean 0.960879 10.26905 36.748530 357.85714 Source DF Type I SS Mean Square F Value Pr > F AREA 1 398030.26093 398030.26093 294.74 0.0001 Source DF Type III SS Mean Square F Value Pr > F AREA 1 398030.26093 398030.26093 294.74 0.0001 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT -45.55190539 -1.79 0.0989 25.46779420 AREA 1.71143233 17.17 0.0001 0.09968772 The SAS System 3 10:00 Monday, November 20, 1995 General Linear Models Procedure Number of observations in data set = 14 The SAS System 4 10:00 Monday, November 20, 1995 General Linear Models Procedure Dependent Variable: EMISSION Sum of Mean Source DF Squares Square F Value Pr > F Model 1 398030.26093 398030.26093 294.74 0.0001 Error 12 16205.45335 1350.45445 Corrected Total 13 414235.71429 R-Square C.V. Root MSE EMISSION Mean 0.960879 10.26905 36.748530 357.85714 Source DF Type I SS Mean Square F Value Pr > F AREA 1 398030.26093 398030.26093 294.74 0.0001 Source DF Type III SS Mean Square F Value Pr > F AREA 1 398030.26093 398030.26093 294.74 0.0001 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT -45.55190539 -1.79 0.0989 25.46779420 AREA 1.71143233 17.17 0.0001 0.09968772 The SAS System 5 10:00 Monday, November 20, 1995 General Linear Models Procedure Number of observations in data set = 14 The SAS System 6 10:00 Monday, November 20, 1995 General Linear Models Procedure Dependent Variable: EMISSION Sum of Mean Source DF Squares Square F Value Pr > F Model 1 398030.26093 398030.26093 294.74 0.0001 Error 12 16205.45335 1350.45445 Corrected Total 13 414235.71429 R-Square C.V. Root MSE EMISSION Mean 0.960879 10.26905 36.748530 357.85714 Source DF Type I SS Mean Square F Value Pr > F AREA 1 398030.26093 398030.26093 294.74 0.0001 Source DF Type III SS Mean Square F Value Pr > F AREA 1 398030.26093 398030.26093 294.74 0.0001 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT -45.55190539 -1.79 0.0989 25.46779420 AREA 1.71143233 17.17 0.0001 0.09968772 The SAS System 7 10:00 Monday, November 20, 1995 General Linear Models Procedure Number of observations in data set = 14 The SAS System 8 10:00 Monday, November 20, 1995 General Linear Models Procedure Dependent Variable: EMISSION Sum of Mean Source DF Squares Square F Value Pr > F Model 1 398030.26093 398030.26093 294.74 0.0001 Error 12 16205.45335 1350.45445 Corrected Total 13 414235.71429 R-Square C.V. Root MSE EMISSION Mean 0.960879 10.26905 36.748530 357.85714 Source DF Type I SS Mean Square F Value Pr > F AREA 1 398030.26093 398030.26093 294.74 0.0001 Source DF Type III SS Mean Square F Value Pr > F AREA 1 398030.26093 398030.26093 294.74 0.0001 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT -45.55190539 -1.79 0.0989 25.46779420 AREA 1.71143233 17.17 0.0001 0.09968772 The SAS System 9 10:00 Monday, November 20, 1995 Univariate Procedure Variable=RESID Moments N 14 Sum Wgts 14 Mean 0 Sum 0 Std Dev 35.30685 Variance 1246.573 Skewness -0.57524 Kurtosis 0.14238 USS 16205.45 CSS 16205.45 CV . Std Mean 9.436151 T:Mean=0 0 Pr>|T| 1.0000 Num ^= 0 14 Num > 0 7 M(Sign) 0 Pr>=|M| 1.0000 Sgn Rank 2.5 Pr>=|S| 0.9032 W:Normal 0.939768 Pr<W 0.3981 Quantiles(Def=5) 100% Max 47.69382 99% 47.69382 75% Q3 24.40867 95% 47.69382 50% Med 5.229961 90% 46.55059 25% Q1 -27.8778 10% -29.021 0% Min -77.8778 5% -77.8778 1% -77.8778 Range 125.5716 Q3-Q1 52.28647 Mode -77.8778 Extremes Lowest Obs Highest Obs -77.8778( 11) 23.26544( 6) -29.021( 13) 24.40867( 1) -28.3771( 2) 30.97898( 14) -27.8778( 10) 46.55059( 12) -21.1629( 5) 47.69382( 9) Stem Leaf # Boxplot 4 78 2 | 2 341 3 +-----+ 0 28 2 *--+--* -0 71 2 | | -2 9881 4 +-----+ -4 | -6 8 1 | ----+----+----+----+ Multiply Stem.Leaf by 10**+1 The SAS System 10 10:00 Monday, November 20, 1995 Univariate Procedure Variable=RESID Normal Probability Plot 50+ *++++* | *+*+*++ | +*+*++ -10+ ++*+* | *++*+*+* | +++++ -70+ +++++* +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2
The conclusions are that AREA has a very significant and strong effect on emissions, that the intercept of the linear regression might be 0 and that the estimated slope is 1.710.1. The diagnostic plots suggest no particularly obvious problems.
Plot of RESID*AREA. Legend: A = 1 obs, B = 2 obs, etc. RESID | | 60 + | | | | A A | 40 + | | | A | |A A 20 + | A | | A | | 0 + A | | | | | A -20 + A | | A | A A | | -40 + | | | | | -60 + | | | | | A -80 + | -+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ 100 125 150 175 200 225 250 275 300 325 350 375 400 AREA
Plot of RESID*YHAT. Legend: A = 1 obs, B = 2 obs, etc. RESID | | 60 + | | | | A A | 40 + | | | A | | A A 20 + | A | | A | | 0 + A | | | | | A -20 + A | | A | A A | | -40 + | | | | | -60 + | | | | | A -80 + | -+-----------+-----------+-----------+-----------+-----------+-----------+ 100 200 300 400 500 600 700 YHAT