Chapter Contents |
Previous |
Next |
The VARIOGRAM Procedure |
Recall that the goal of this example is spatial prediction. In particular, you would like to produce a contour map or surface plot on a regular grid of predicted values based on ordinary kriging. Ordinary kriging requires the complete specification of the spatial covariance or semivariogram.
You can use PROC VARIOGRAM, along with a DATA step and PROC GPLOT, to estimate visually a reasonable semivariogram model (both the form and associated parameters) for the thickness data.
Before proceeding with this estimation, consider the formula for the empirical or experimental semivariogram .Denote the coal seam thickness process by .You have measurements (Z(ri), i = 1, ... ,75). The standard formula for (isotropic case) is
For actual data, it is unlikely that any pair (i,j) would exactly satisfy | ri-rj | = h, so typically a range of pairwise distances, ,is used to group pairs (ri,rj) for a single term in the expression for . Using this range, N(h) is modified by
PROC VARIOGRAM performs this grouping with two required options for variogram computation: the LAGDISTANCE= and MAXLAGS= options.
The meaning of the required LAGDISTANCE= option is as follows. Classify all pairs of points into intervals according to their pairwise distance. The width of the distance interval is the LAGDISTANCE= value. The meaning of the required MAXLAGS= option is simply the number of intervals.
The problem is that a surface plot of the original data, or the scatter plot of the measurement locations, is not very helpful in determining the distribution of these pairwise distances; it is not clear what values to give to the LAGDISTANCE= and MAXLAGS= options.
You use PROC VARIOGRAM with the OUTDISTANCE= option to produce a modified histogram of the pairwise distances in order to find reasonable values for the LAGDISTANCE= and MAXLAGS= options. In the following analysis, you use the NOVARIOGRAM option in the COMPUTE statement and the OUTDISTANCE= option in the PROC VARIOGRAM statement. You need the NOVARIOGRAM option to keep an error message from being issued due to the absence of the LAGDISTANCE= and MAXLAGS= options.
The DATA step after the PROC VARIOGRAM statement computes the midpoint of each distance interval. This midpoint is then used in the GCHART procedure. Since the number of distance intervals is not specified by using the NHCLASSES= option in the COMPUTE statement, the default of 10 is used.
proc variogram data=thick outdistance=outd; compute novariogram; coordinates xc=east yc=north; var thick; run; title 'OUTDISTANCE= Data Set Showing Distance Intervals'; proc print data=outd; run; data outd; set outd; mdpt=round((lb+ub)/2,.1); label mdpt = 'Midpoint of Interval'; run; axis1 minor=none; axis2 minor=none label=(angle=90 rotate=0); title 'Distribution of Pairwise Distances'; proc gchart data=outd; vbar mdpt / type=sum sumvar=count discrete frame cframe=ligr gaxis=axis1 raxis=axis2 nolegend; run;
For plotting and estimations purposes, it is desirable to have as many points as possible for the plot of against h. This corresponds to having as many distance intervals as possible, that is, having a small value for the LAGDISTANCE= option.
However, a rule of thumb used in computing sample semivariograms is to use at least 30 point pairs in computing a single value of the empirical or experimental semivariogram.
If the LAGDISTANCE= value is set too small, there may be too few points in one or more of the intervals. On the other hand, if the LAGDISTANCE= value is set to a large value, the number of point pairs in the distance intervals may be much greater than that needed for estimation precision, thereby "wasting" point pairs at the expense of variogram points.
Hence, there is a tradeoff between the number of distance intervals and the number of point pairs within each interval.
As discussed in the section "OUTDIST=SAS-data-set " the first few distance intervals, corresponding to lag 0 and lag 1, are typically the limiting intervals. This is particularly true for lag 0 since it is half the width of the remaining intervals. For the default of NHCLASSES=10, the lag 0 class contains 45 points, which is reasonably close to 30, but the lag 1 class contains 263 points.
If you rerun PROC VARIOGRAM with NHCLASSES=20, these numbers become 8 and 83 for lags 0 and 1, respectively. Because of the asymmetrical nature of lag 0, you are willing to violate the rule of thumb for the 0th lag. You will, however, have sufficient numbers in lag 1 and above.
proc variogram data=thick outdistance=outd; compute nhc=20 novariogram; coordinates xc=east yc=north; var thick; run; title 'OUTDISTANCE= Data Set Showing Distance Intervals'; proc print data=outd; run; data outd; set outd; mdpt=round((lb+ub)/2,.1); label mdpt = 'Midpoint of Interval'; run; axis1 minor=none; axis2 minor=none label=(angle=90 rotate=0); title 'Distribution of Pairwise Distances'; proc gchart data=outd; vbar mdpt / type=sum sumvar=count discrete frame cframe=ligr gaxis=axis1 raxis=axis2 nolegend; run;
|
Figure 70.6: Distribution of Pairwise Distances
The length of the lag 1 class (3.484,10.453) is 6.969. You round off and use LAGDISTANCE=7.0 in the COMPUTE statement.
The use of the MAXLAGS= option is more difficult. From Figure 70.5, note that up to a pairwise distance of 101, you have a sufficient number of pairs. With your choice of LAGDISTANCE=7.0, this yields a maximum number of lags .
The problem with using the maximum lag value is that it includes pairs of points so far apart that they are likely to be independent. Using pairs of points that are independent adds nothing to the empirical semivariogram plot; they are essentially added noise.
If there is an estimate of correlation length, perhaps from a prior geologic study of a similar site, you can specify the MAXLAGS= value so that the maximum pairwise distance does not exceed two or three correlation lengths. If there is no estimate of correlation length, you can use the following rule of thumb: use (1/2) to (3/4) of the "diameter" of the region containing the data. A MAXLAGS= value of 10 is within this range.
You now rerun PROC VARIOGRAM with these values.
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.