Chapter Contents |
Previous |
Next |
The COMPARE Procedure |
SAS Log |
Macro Return Codes (SYSINFO) |
Macro Return Codes is a key for interpreting the SYSINFO return code from PROC COMPARE. For each of the conditions listed, the associated value is added to the return code if the condition is true. Thus, the SYSINFO return code is the sum of the codes listed in Macro Return Codes for the applicable conditions:
Bit | Condition | Code | Hex | Description | |
---|---|---|---|---|---|
1 | DSLABEL | 1 | 0001X | Data set labels differ | |
2 | DSTYPE | 2 | 0002X | Data set types differ | |
3 | INFORMAT | 4 | 0004X | Variable has different informat | |
4 | FORMAT | 8 | 0008X | Variable has different format | |
5 | LENGTH | 16 | 0010X | Variable has different length | |
6 | LABEL | 32 | 0020X | Variable has different label | |
7 | BASEOBS | 64 | 0040X | Base data set has observation not in comparison | |
8 | COMPOBS | 128 | 0080X | Comparison data set has observation not in base | |
9 | BASEBY | 256 | 0100X | Base data set has BY group not in comparison | |
10 | COMPBY | 512 | 0200X | Comparison data set has BY group not in base | |
11 | BASEVAR | 1024 | 0400X | Base data set has variable not in comparison | |
12 | COMPVAR | 2048 | 0800X | Comparison data set has variable not in base | |
13 | VALUE | 4096 | 1000X | A value comparison was unequal | |
14 | TYPE | 8192 | 2000X | Conflicting variable types | |
15 | BYVAR | 16384 | 4000X | BY variables do not match | |
16 | ERROR | 32768 | 8000X | Fatal error: comparison not done |
These codes are ordered and scaled to allow a simple check of the degree to which the data sets differ. For example, if you want to check that two data sets contain the same variables, observations, and values, but you do not care about differences in labels, formats, and so forth, use the following statements:
proc compare base=SAS-data-set compare=SAS-data-set; run; %if &sysinfo >= 64 %then %do; handle error; %end;
You can examine individual bits in the SYSINFO value by using DATA step bit-testing features to check for specific conditions. For example, to check for the presence of observations in the base data set that are not in the comparison data set, use the following statements:
proc compare base=SAS-data-set compare=SAS-data-set; run; %let rc=&sysinfo; data _null_; if &rc='1......'b then put 'Observations in Base but not in Comparison Data Set'; run;
PROC COMPARE must run before you check SYSINFO and you must obtain the SYSINFO value before another SAS step starts because every SAS step resets SYSINFO.
Procedure Output |
Partial Output shows the Data Set Summary.
COMPARE Procedure Comparison of PROCLIB.ONE with PROCLIB.TWO (Method=EXACT) Data Set Summary Dataset Created Modified NVar NObs Label PROCLIB.ONE 11SEP97:15:11:07 11SEP97:15:11:09 5 4 First Data Set PROCLIB.TWO 11SEP97:15:11:10 11SEP97:15:11:10 6 5 Second Data Set |
The second part of the report lists matching variables with different attributes and shows how the attributes differ. (The COMPARE procedure omits variable labels if the line size is too small for them.)
Partial Output shows the Variables Summary.
Variables Summary Number of Variables in Common: 5. Number of Variables in PROCLIB.TWO but not in PROCLIB.ONE: 1. Number of Variables with Conflicting Types: 1. Number of Variables with Differing Attributes: 3. Listing of Common Variables with Conflicting Types Variable Dataset Type Length student PROCLIB.ONE Num 8 PROCLIB.TWO Char 8 Listing of Common Variables with Differing Attributes Variable Dataset Type Length Format Label year PROCLIB.ONE Char 8 Year of Birth PROCLIB.TWO Char 8 state PROCLIB.ONE Char 8 PROCLIB.TWO Char 8 Home State gr1 PROCLIB.ONE Num 8 4.1 PROCLIB.TWO Num 8 5.2 |
Partial Output shows the Observation Summary.
Observation Summary Observation Base Compare First Obs 1 1 First Unequal 1 1 Last Unequal 4 4 Last Match 4 4 Last Obs . 5 Number of Observations in Common: 4. Number of Observations in PROCLIB.TWO but not in PROCLIB.ONE: 1. Total Number of Observations Read from PROCLIB.ONE: 4. Total Number of Observations Read from PROCLIB.TWO: 5. Number of Observations with Some Compared Variables Unequal: 4. Number of Observations with All Compared Variables Equal: 0. |
In addition, for the variables for which some matching observations have unequal values, the report lists
Partial Output shows the Values Comparison Summary.
Values Comparison Summary Number of Variables Compared with All Observations Equal: 1. Number of Variables Compared with Some Observations Unequal: 3. Total Number of Values which Compare Unequal: 6. Maximum Difference: 20. Variables with Unequal Values Variable Type Len Compare Label Ndif MaxDif state CHAR 8 Home State 2 gr1 NUM 8 2 1.000 gr2 NUM 8 2 20.000 |
Value Comparison Results for Variables __________________________________________________________ || Home State || Base Value Compare Value Obs || state state ________ || ________ ________ || 2 || MD MA 4 || MA MD __________________________________________________________ __________________________________________________________ || Base Compare Obs || gr1 gr1 Diff. % Diff ________ || _________ _________ _________ _________ || 1 || 85.0 84.00 -1.0000 -1.1765 3 || 78.0 79.00 1.0000 1.2821 __________________________________________________________ __________________________________________________________ || Base Compare Obs || gr2 gr2 Diff. % Diff ________ || _________ _________ _________ _________ || 3 || 72.0000 73.0000 1.0000 1.3889 4 || 94.0000 74.0000 -20.0000 -21.2766 __________________________________________________________ |
You can suppress the value comparison results with the NOVALUES option.
If you use both the NOVALUES and TRANSPOSE options, PROC COMPARE lists for
each observation the names of the variables with values judged unequal but
does not display the values and differences.
Note: In all cases PROC COMPARE calculates the summary statistics based
on all matching observations that do not contain missing values, not just
on those containing unequal values.
Partial Output shows the following summary statistics for base data set
values, comparison
data set values, differences, and percent differences:
Partial Output is from the ALLSTATS option using the two data sets shown in "Overview":
Value Comparison Results for Variables __________________________________________________________ || Base Compare Obs || gr1 gr1 Diff. % Diff ________ || _________ _________ _________ _________ || 1 || 85.0 84.00 -1.0000 -1.1765 3 || 78.0 79.00 1.0000 1.2821 ________ || _________ _________ _________ _________ || N || 4 4 4 4 Mean || 85.5000 85.5000 0 0.0264 Std || 5.8023 5.4467 0.8165 1.0042 Max || 92.0000 92.0000 1.0000 1.2821 Min || 78.0000 79.0000 -1.0000 -1.1765 StdErr || 2.9011 2.7234 0.4082 0.5021 t || 29.4711 31.3951 0.0000 0.0526 Prob>|t| || <.0001 <.0001 1.0000 0.9614 || Ndif || 2 50.000% DifMeans || 0.000% 0.000% 0 r, rsq || 0.991 0.983 __________________________________________________________ __________________________________________________________ || Base Compare Obs || gr2 gr2 Diff. % Diff ________ || _________ _________ _________ _________ || 3 || 72.0000 73.0000 1.0000 1.3889 4 || 94.0000 74.0000 -20.0000 -21.2766 ________ || _________ _________ _________ _________ || N || 4 4 4 4 Mean || 86.2500 81.5000 -4.7500 -4.9719 Std || 9.9457 9.4692 10.1776 10.8895 Max || 94.0000 92.0000 1.0000 1.3889 Min || 72.0000 73.0000 -20.0000 -21.2766 StdErr || 4.9728 4.7346 5.0888 5.4447 t || 17.3442 17.2136 -0.9334 -0.9132 Prob>|t| || 0.0004 0.0004 0.4195 0.4285 || Ndif || 2 50.000% DifMeans || -5.507% -5.828% -4.7500 r, rsq || 0.451 0.204 __________________________________________________________ |
Note: If you use a wide line size with PRINTALL, PROC COMPARE prints
the value comparison result for character variables next to the result for
numeric variables. In that case, PROC COMPARE calculates only NDIF for the
character variables.
_OBS_1=number-1 _OBS_2=number-2where number-1 is the number of the observation in the base data set for which the value of the variable is shown, and number-2 is the number of the observation in the comparison data set.
Partial Output shows the differences in PROCLIB.ONE and PROCLIB.TWO by observation instead of by variable.
Comparison Results for Observations _OBS_1=1 _OBS_2=1: Variable Base Value Compare Diff. % Diff gr1 85.0 84.00 -1.000000 -1.176471 _OBS_1=2 _OBS_2=2: Variable Base Value Compare state MD MA _OBS_1=3 _OBS_2=3: Variable Base Value Compare Diff. % Diff gr1 78.0 79.00 1.000000 1.282051 gr2 72.000000 73.000000 1.000000 1.388889 _OBS_1=4 _OBS_2=4: Variable Base Value Compare Diff. % Diff gr2 94.000000 74.000000 -20.000000 -21.276596 state MA MD |
If you use an ID statement, the identifying label has the following form:
ID-1=ID-value-1 ... ID-n=ID-value-nwhere ID is the name of an ID variable and ID-value is the value of the ID variable.
Note: When you use the TRANSPOSE option, PROC COMPARE prints only the
first 12 characters of the value.
Output Data Set (OUT=) |
In addition, the data set contains two variables created by PROC COMPARE to identify the source of the values for the matching variables: _TYPE_ and _OBS_.
Type of Observation
. The four possible
values of this variable are as follows:
For observations with _TYPE_ equal to BASE, _OBS_ is the number of the observation in the base data set from which the values of the VAR variables were copied. Similarly, for observations with _TYPE_ equal to COMPARE, _OBS_ is the number of the observation in the comparison data set from which the values of the VAR variables were copied.
For observations with _TYPE_ equal to DIF or PERCENT, _OBS_ is a sequence number that counts the matching observations in the BY group.
_OBS_ has the label
Observation Number
.
The COMPARE procedure takes variable names and attributes for the OUT= data set from the base data set except for the lengths of ID and VAR variables, for which it uses the longer length regardless of which data set that length is from. This behavior has two important repercussions:
BASE
contain the values of the VAR variables, while
observations with _TYPE_ equal to
COMPARE
contain the values of the
WITH variables.
Output Statistics Data Set (OUTSTATS=) |
N
,
MEAN
,
STD
,
MIN
,
MAX
,
STDERR
,
T
,
PROBT
,
NDIF
,
DIFMEANS
, and
R
,
RSQ
.Note: For both types of output data sets, PROC COMPARE assigns one of the following data set labels:
Comparison of base-SAS-data-set with comparison-SAS-data-set Comparison of variables in base-SAS-data-set
See Creating an Output Data Set of Statistics (OUTSTATS=) for an example of an OUTSTATS= data set.
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.