Chapter Contents |
Previous |
Next |
The TPSPLINE Procedure |
The following example demonstrates how you can use the TPSPLINE procedure to fit a semiparametric model.
Suppose that y is a continuous variable and x1 and x2 are two explanatory variables of interest. To fit a smoothing spline model, you can use a MODEL statement similar to that used in many regression procedures in the SAS System.
proc tpspline; model y = (x1 x2); run;
The TPSPLINE procedure can fit semiparametric models; the parentheses in the preceding MODEL statement separates the smoothing variables from the regression variables. The following statements illustrates this syntax.
proc tpspline; model y = x3 (x1 x2); run;
This model assumes a linear relation with x3 and an unknown functional relation with x1 and x2.
If you want to fit several responses using the same explanatory variables, you can save computation time by using the multiple responses feature in the MODEL statement. For example, if y1 and y2 are two response variables, the following MODEL statement can be used to fit two models. Separate analyses are then performed for each response variable.
proc tpspline; model y1 y2 = (x1 x2); run;
The following example illustrates the use of PROC TPSPLINE. The data are from Bates, Lindstrom, Wahba, and Yandell (1987).
data Measure; input x1 x2 y @@; datalines; -1.0 -1.0 15.54483570 -1.0 -1.0 15.76312613 -.5 -1.0 18.67397826 -.5 -1.0 18.49722167 .0 -1.0 19.66086310 .0 -1.0 19.80231311 .5 -1.0 18.59838649 .5 -1.0 18.51904737 1.0 -1.0 15.86842815 1.0 -1.0 16.03913832 -1.0 -.5 10.92383867 -1.0 -.5 11.14066546 -.5 -.5 14.81392847 -.5 -.5 14.82830425 .0 -.5 16.56449698 .0 -.5 16.44307297 .5 -.5 14.90792284 .5 -.5 15.05653924 1.0 -.5 10.91956264 1.0 -.5 10.94227538 -1.0 .0 9.61492010 -1.0 .0 9.64648093 -.5 .0 14.03133439 -.5 .0 14.03122345 .0 .0 15.77400253 .0 .0 16.00412514 .5 .0 13.99627680 .5 .0 14.02826553 1.0 .0 9.55700164 1.0 .0 9.58467047 -1.0 .5 11.20625177 -1.0 .5 11.08651907 -.5 .5 14.83723493 -.5 .5 14.99369172 .0 .5 16.55494349 .0 .5 16.51294369 .5 .5 14.98448603 .5 .5 14.71816070 1.0 .5 11.14575565 1.0 .5 11.17168689 -1.0 1.0 15.82595514 -1.0 1.0 15.96022497 -.5 1.0 18.64014953 -.5 1.0 18.56095997 .0 1.0 19.54375504 .0 1.0 19.80902641 .5 1.0 18.56884576 .5 1.0 18.61010439 1.0 1.0 15.86586951 1.0 1.0 15.90136745 ;
The data set Measure contains three variables x1, x2, and y. Suppose that you want to fit a surface by using the variables x1 and x2 to model the response y. The variables x1 and x2 are spaced evenly on a [-1×1]×[-1 ×1 ] square, and the response y is generated by adding a random error to a function f(x1,x2). The raw data are plotted using the G3D procedure. In order to plot those replicates, the data are jittered a little bit.
data Measure1; set Measure; run; proc sort data=Measure1; by x2 x1; run; data measure1; set measure1; by x1; if last.x1 then x1=x1+0.00001; run; proc g3d data=Measure1; scatter x2*x1=y /size=.5 zmin=9 zmax=21 zticknum=4; title "Raw Data"; run;
Figure 64.1 displays the raw data.
The following statements invoke the TPSPLINE procedure, using the Measure data set as input. In the MODEL statement, the x1 and x2 variables are listed as smoothing variables. The LOGNLAMBDA= option returns a list of GCV values with ranging from -4 to -2. The OUTPUT statement creates the data set estimate to contain the predicted values and the 95% upper and lower confidence limits.
proc tpspline data=Measure; model y=(x1 x2) /lognlambda=(-4 to -2 by 0.1); output out=estimate pred uclm lclm; run; proc print data=estimate; run;
The results of this analysis are displayed in the following figures. Figure 64.2 shows that the data set Measure contains 50 observations with 25 unique design points. The GCV values are listed along with the log10 of . The value of that minimizes the GCV function is around -3.5. The final thin-plate smoothing spline estimate is based on LOGNLAMBDA = -3.4762. The residual sum of squares is 0.246110, and the degrees of freedom is 24.593203. The standard deviation, defined as RSS/(Tr(I-A)), is 0.098421. The predictions and 95% confidence limits are displayed in Figure 64.3.
|
|
The fitted surface is plotted with PROC G3D as follows.
proc g3d data=estimate; plot x2*x1=p_y/grid zmin=9 zmax=21 zticknum=4; title 'Plot of Fitted Surface'; run;
The resulting plot is displayed in Figure 64.4.
Because the data in data set Measure are very sparse, the fitted surface is not smooth. To produce a smoother surface, the following statements generate the data set Pred in order to obtain a finer grid. The SCORE statement evaluates the fitted surface at those new design points.
data pred; do x1=-1 to 1 by 0.1; do x2=-1 to 1 by 0.1; output; end; end; run; proc tpspline data=measure; model y=(x1 x2)/lognlambda=(-4 to -2 by 0.1); score data=pred out=predy; run; proc g3d data=predy; plot x2*x1=p_y/grid zmin=9 zmax=21 zticknum=4; title 'Plot of Fitted Surface on a Fine Grid'; run;
The surface plot based on the finer grid is displayed in Figure 64.5. The plot shows that a parametric model with quadratic terms of x1 and x2 provides a reasonable fit to the data.
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.