Chapter Contents |
Previous |
Next |
The TRANSREG Procedure |
In this example, an artificial data set is created with a variable Y that is a discontinuous function of X. See the first plot in Output 65.1.7. Notice that the function has four unconnected parts, each of which is a curve. Notice too that there is an overall quadratic trend, that is, ignoring the shapes of the individual curves, at first the Y values tend to decrease as X increases, then Y values tend to increase.
The first PROC TRANSREG analysis fits a linear regression model. The predicted values of Y given X are output and plotted to form the linear regression line. The R2 for the linear regression is 0.10061, and it can be seen from the second plot in Output 65.1.7 that the linear regression model is not appropriate for these data. The following statements create the data set and perform the first PROC TRANSREG analysis. These statements produce Output 65.1.1.
title 'An Illustration of Splines and Knots'; * Create in Y a discontinuous function of X. * * Store copies of X in V1-V7 for use in PROC GPLOT. * These variables are only necessary so that each * plot can have its own x-axis label while putting * four plots on a page.; data A; array V[7] V1-V7; X=-0.000001; do I=0 to 199; if mod(I,50)=0 then do; C=((X/2)-5)**2; if I=150 then C=C+5; Y=C; end; X=X+0.1; Y=Y-sin(X-C); do J=1 to 7; V[J]=X; end; output; end; run; * Each of the PROC TRANSREG steps fits a * different spline model to the data set created * previously. The TRANSREG steps build up a data set with * various regression functions. All of the functions * are then plotted with the final PROC GPLOT step. * * The OUTPUT statements add new predicted values * variables to the data set, while the ID statements * save all of the previously created variables that * are needed for the plots.; proc transreg data=A; model identity(Y) = identity(X); title2 'A Linear Regression Function'; output out=A pprefix=Linear; id V1-V7; run;Output 65.1.1: Fitting a Linear Regression Model with PROC TRANSREG
proc transreg data=A; model identity(Y)=spline(X / degree=2); title2 'A Quadratic Polynomial Regression Function'; output out=A pprefix=Quad; id V1-V7 LinearY; run;Output 65.1.2: Fitting a Quadratic Polynomial
|
proc transreg data=A; model identity(Y) = spline(X / knots=5 10 15); title2 'A Cubic Spline Regression Function'; title3 'The Third Derivative is Discontinuous at X=5, 10, 15'; output out=A pprefix=Cub1; id V1-V7 LinearY QuadY; run;Output 65.1.3: Fitting a Piecewise Cubic Polynomial
|
data B; /* A is the data set used for transreg */ set a(keep=X Y); X1=X; /* X */ X2=X**2; /* X squared */ X3=X**3; /* X cubed */ X4=(X> 5)*((X-5)**3); /* change in X**3 after 5 */ X5=(X>10)*((X-10)**3); /* change in X**3 after 10 */ X6=(X>15)*((X-15)**3); /* change in X**3 after 15 */ run; proc reg; model Y=X1-X6; run;
In the next step each knot is repeated three times, so the first, second, and third derivatives are discontinuous at X=5, 10, and 15, but the spline is required to be continuous at the knots. The spline is a weighted sum of
The spline is continuous since there is not a separate constant in the formula for the spline for each knot. Now the R2 is 0.95542, and the spline closely follows the data, except at the knots. The following statements perform this analysis and produce Output 65.1.4:
proc transreg data=A; model identity(y) = spline(x / knots=5 5 5 10 10 10 15 15 15); title3 'First - Third Derivatives Discontinuous at X=5, 10, 15'; output out=A pprefix=Cub3; id V1-V7 LinearY QuadY Cub1Y; run;Output 65.1.4: Piecewise Polynomial with Discontinuous Derivatives
|
data B; set a(keep=X Y); X1=X; /* X */ X2=X**2; /* X squared */ X3=X**3; /* X cubed */ X4=(X>5) * (X- 5); /* change in X after 5 */ X5=(X>10) * (X-10); /* change in X after 10 */ X6=(X>15) * (X-15); /* change in X after 15 */ X7=(X>5) * ((X-5)**2); /* change in X**2 after 5 */ X8=(X>10) * ((X-10)**2); /* change in X**2 after 10 */ X9=(X>15) * ((X-15)**2); /* change in X**2 after 15 */ X10=(X>5) * ((X-5)**3); /* change in X**3 after 5 */ X11=(X>10) * ((X-10)**3); /* change in X**3 after 10 */ X12=(X>15) * ((X-15)**3); /* change in X**3 after 15 */ run; proc reg; model Y=X1-X12; run;
When the knots are repeated four times in the next step, the spline function is discontinuous at the knots and follows the data even more closely, with an R2 of 0.99254. In this step, each separate curve is approximated by a cubic polynomial (with no knots within the separate polynomials). The following statements perform this analysis and produce Output 65.1.5:
proc transreg data=A; model identity(Y) = spline(X / knots=5 5 5 5 10 10 10 10 15 15 15 15); title3 'Discontinuous Function and Derivatives'; output out=A pprefix=Cub4; id V1-V7 LinearY QuadY Cub1Y Cub3Y; run;Output 65.1.5: Discontinuous Function and Derivatives
|
X13=(X > 5); /* intercept change after 5 */ X14=(X > 10); /* intercept change after 10 */ X15=(X > 15); /* intercept change after 15 */
The last two steps use the NKNOTS= t-option to specify the number of knots but not their location. NKNOTS=4 places knots at the quintiles while NKNOTS=9 places knots at the deciles. The spline and its first two derivatives are continuous. The R2 values are 0.74450 and 0.95256. Even though the knots are placed in the wrong places, the spline can closely follow the data with NKNOTS=9. The following statements produce Output 65.1.6.
proc transreg data=A; model identity(Y) = spline(X / nknots=4); title3 'Four Knots'; output out=A pprefix=Cub4k; id V1-V7 LinearY QuadY Cub1Y Cub3Y Cub4Y; run; proc transreg data=A; model identity(Y) = spline(X / nknots=9); title3 'Nine Knots'; output out=A pprefix=Cub9k; id V1-V7 LinearY QuadY Cub1Y Cub3Y Cub4Y Cub4kY; run;Output 65.1.6: Specifying Number of Knots instead of Knot Location
|
|
goptions goutmode=replace nodisplay; %let opts = haxis=axis2 vaxis=axis1 frame cframe=ligr; * Depending on your goptions, these plot options may work better: * %let opts = haxis=axis2 vaxis=axis1 frame; proc gplot data=A; title; axis1 minor=none label=(angle=90 rotate=0); axis2 minor=none; plot Y*X=1 / &opts name='tregdis1'; plot Y*V1=1 linearY*X=2 /overlay &opts name='tregdis2'; plot Y*V2=1 quadY *X=2 /overlay &opts name='tregdis3'; plot Y*V3=1 cub1Y *X=2 /overlay &opts name='tregdis4'; plot Y*V4=1 cub3Y *X=2 /overlay &opts name='tregdis5'; plot Y*V5=1 cub4Y *X=2 /overlay &opts name='tregdis6'; plot Y*V6=1 cub4kY *X=2 /overlay &opts name='tregdis7'; plot Y*V7=1 cub9kY *X=2 /overlay &opts name='tregdis8'; symbol1 color=blue v=star i=none; symbol2 color=yellow v=dot i=none; label V1 = 'Linear Regression' V2 = 'Quadratic Regression Function' V3 = '1 Discontinuous Derivative' V4 = '3 Discontinuous Derivatives' V5 = 'Discontinuous Function' V6 = '4 Knots' V7 = '9 Knots' Y = 'Y' LinearY = 'Y' QuadY = 'Y' Cub1Y = 'Y' Cub3Y = 'Y' Cub4Y = 'Y' Cub4kY = 'Y' Cub9kY = 'Y'; run; quit; goptions display; proc greplay nofs tc=sashelp.templt template=l2r2; igout gseg; treplay 1:tregdis1 2:tregdis3 3:tregdis2 4:tregdis4; treplay 1:tregdis5 2:tregdis7 3:tregdis6 4:tregdis8; run; quit;Output 65.1.7: Plots Summarizing Analysis for Spline Example
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.