Chapter Contents |
Previous |
Next |
The COMPARE Procedure |
After making these comparisons, PROC COMPARE compares the values in the parts of the data sets that match. PROC COMPARE either compares the data by the position of observations or by the values of an ID variable.
A Comparison by Position of Observations |
Comparison by the Positions of Observations
When you use PROC COMPARE to compare data set TWO with data set ONE, the procedure compares the first observation in data set ONE with the first observation in data set TWO, and it compares the second observation in the first data set with the second observation in the second data set, and so on. In each observation that it compares, the procedure compares the values of the IDNUM, NAME, GENDER, and GPA.
The procedure does not report on the values of the last two observations or the variable YEAR in data set TWO because there is nothing to compare them with in data set ONE.
A Comparison with an ID Variable |
For the two data sets shown in Comparison by the Value of the ID Variable , assume that IDNUM is an ID variable and that IDNUM has the same type in both data sets. The procedure compares the observations that have the same value for IDNUM. The data inside the shaded boxes show the part of the data sets that the procedure compares.
Comparison by the Value of the ID Variable
The data sets contain three
matching variables: NAME, GENDER, and GPA.
They also contain five matching observations - the observations with
values of
2998
,
9866
,
2118
,
3847
, and
2342
for IDNUM.
Data Set TWO contains two observations (IDNUM=
7565
and IDNUM=
1755
) for which data set ONE contains no matching observations. Similarly,
no variable in data set ONE matches the variable YEAR in data set TWO.
See Comparing Observations with an ID Variable for an example that uses an ID variable.
The Equality Criterion |
For a numeric variable compared, let x be its value in the base data set and let y be its value in the comparison data set. If both x and y are nonmissing, the values are judged unequal according to the value of METHOD= and the value of CRITERION= () as follows:
The values are equal if x=y=0.
or
If x or y is missing, then the comparison depends on the NOMISSING option. If NOMISSING is in effect, a missing value will always compare equal to anything. Otherwise, a missing value is judged equal only to a missing value of the same type, (that is, .=., .^=.A, .A=.A, .A^=.B, and so on).
If the value specified for CRITERION= is negative, the actual criterion used is made equal to the absolute value of times a very small number &egr; (epsilon) that depends on the numerical precision of the computer. This number &egr; is defined as the smallest positive floating-point value such that, using machine arithmetic, 1-&egr;<1<1+&egr;. Round-off or truncation error in floating-point computations is typically a few orders of magnitude larger than &egr;. This means that CRITERION=-1000 often provides a reasonable test of the equality of computed results at the machine level of precision.
The value added to the denominator in the RELATIVE method is specified in parentheses after the method name: METHOD=RELATIVE(). If not specified in METHOD=, defaults to 0. The value of can be used to control the behavior of the error measure when both x and y are very close to 0. If is not given and x and y are very close to 0, any error produces a large relative error (in the limit, 2).
Specifying a value for avoids this extreme sensitivity of the RELATIVE method for small values. If you specify METHOD=RELATIVE() CRITERION= when both x and y are much smaller than in absolute value, the comparison is as if you had specified METHOD=ABSOLUTE CRITERION=. However, when either x or y is much larger than in absolute value, the comparison is like METHOD=RELATIVE CRITERION=. For moderate values of x and y, METHOD=RELATIVE() CRITERION= is, in effect, a compromise between METHOD=ABSOLUTE CRITERION= and METHOD=RELATIVE CRITERION=.
For character variables, if one value has a greater length than the
other, the shorter value is padded with blanks for the comparison. Nonblank
character values are judged equal only if they agree at each character. If
NOMISSING is in effect, blank character values compare equal to anything.
Difference = | |
Percent Difference = | |
Percent Difference = missing for |
Formatted Values |
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.