Chapter Contents |
Previous |
Next |
SET |
Valid: | in a DATA step |
Category: | File-handling |
Type: | Executable |
Syntax |
SET<SAS-data-set(s)
<(data-set-options(s) )>>
<options>; |
When you do not specify an argument, the SET statement reads an observation from the most recently created data set.
Arguments |
See Also: | See the "SAS Data Sets" chapter of SAS Language Reference: Concepts for a description of the levels of SAS data set names and when to use each level. |
See: | Refer to Data Set Options for a list of the data set options to use with input data sets. |
Options |
Restriction: | END= cannot be used with POINT=. When random access is used, the END= variable is never set to 1. |
Featured in: | Writing an Observation Only After All Observations Have Been Read |
Range: | Specify the name of a simple or a composite index of the data set that is being read. |
Restriction: | KEY= cannot be used with POINT=. |
Tip: | Using the _IORC_ automatic variable in conjunction with the SYSRC autocall macro provides you with more error-handling information than was previously available. When you use the SET statement with the KEY= option, the new automatic variable _IORC_ is created. This automatic variable is set to a return code that shows the status of the most recent I/O operation that is performed on an observation in a SAS data set. If the KEY= value is not found, the _IORC_ variable returns a value that corresponds to the SYSRC autocall macro's mnemonic _DSENOM and the automatic variable _ERROR_ is set to 1. |
Featured in: | Performing a Table-Lookup and Performing a Table-Lookup When the Master File Contains Duplicate Observations. |
See Also: | UNIQUE option |
Restriction: | For certain SAS views, SAS cannot determine the number of observations. In these cases, SAS sets the value of the NOBS= variable to the largest positive integer value that is available in your operating environment. |
Tip: | At compilation time, SAS reads the descriptor portion of each data set and assigns the value of the NOBS= variable automatically. Thus, you can refer to the NOBS= variable before the SET statement. The variable is available in the DATA step but is not added to any output data set. |
Interaction: | The NOBS= and POINT= options are independent of each other. |
Featured in: | Performing a Function Until the Last Observation Is Reached |
Restriction: | When you use the IMMEDIATE option KEY=, POINT=, and BY statement processing are mutually exclusive. |
Tip: | If a variable on a subsequent data set is of a different type (character versus numeric, for example) than that of the same-named variable on the first data set, the DATA step will stop processing and produce an error message. |
Restriction: | When you specify the DEFER option, you cannot use the KEY= statement option, the POINT= statement option, or the BY statement. These constructs imply either random processing or interleaving of observations from the data sets, which is not possible unless all data sets are open. |
Requirement: | You can use the
DROP=, KEEP=, or RENAME= data set options to process a set of variables, but
the set of variables that are processed for each data set must be identical.
In most cases, if the set of variables defined by any subsequent data set
differs from that defined by the first data set, SAS prints a warning message
to the log but does not stop execution. Exceptions to this behavior are
|
Default: | IMMEDIATE |
Requirement: | a STOP statement |
Restriction: | You cannot use POINT= with a BY statement, a WHERE statement, or a WHERE= data set option. In addition, you cannot use it with transport format data sets, data sets in sequential format on tape or disk, and SAS/ACCESS views or the SQL procedure views that read data from external files. |
Restriction: | You cannot use POINT= with KEY=. |
Tip: | You must supply the values of the POINT= variable. For example, you can use the POINT= variable as the index variable in some form of the DO statement. |
Tip: | The POINT= variable is available anywhere in the DATA step, but it is not added to any new SAS data set. |
Featured in: | Combining One Observation with Many and Reading a Subset by Using Direct Access |
Restriction: | UNIQUE can only appear with the KEY= argument. |
Explanation: | By default, SET begins searching at the top of the index only when the KEY= value changes. If the KEY= value does not change on successive executions of the SET statement, the search begins by following the most recently retrieved observation. In other words, when consecutive duplicate KEY= values appear, the SET statement attempts a one-to-one match with duplicate indexed values in the data set that is being read. If more consecutive duplicate KEY= values are specified than exist in the data set that is being read, the extra duplicates are treated as not found. |
Featured in: | Performing a Table-Lookup When the Master File Contains Duplicate Observations |
See Also: | For extensive examples, see "Examples" in Combining and Modifying SAS Data Sets: Examples . |
Details |
Each time
the SET statement is executed, SAS reads one observation into the program
data vector. SET reads all variables and all observations from the input
data sets unless you tell SAS to do otherwise. A SET statement can contain
multiple data sets; a DATA step can contain multiple SET statements. See Combining and Modifying SAS Data Sets: Examples
.
The SET statement is flexible and has a variety of uses in SAS programming. These uses are determined by the options and statements that you use with the SET statement. They include
Only one BY statement can accompany each SET statement in a DATA step. The
BY statement should immediately follow the SET statement to which it applies.
The data sets that are listed in the SET statement must be sorted by the
values of the variables that are listed in the BY statement, or they must
have an appropriate index. SET when it is used with a BY statement interleaves
data sets. The observations in the new data set are arranged by the values
of the BY variable or variables, and within each BY group, by the order of
the data sets in which they occur. See Interleaving SAS Data Sets
for an example of BY group processing with the SET statement.
Use a single SET statement with multiple data sets that are specified to concatenate the specified data sets. That is, the number of observations in the new data set is the sum of the number of observations in the original data sets, and the order is all the observations from the first data set followed by all observations from the second data set, and so on. See Concatenating SAS Data Sets for an example of concatenating data sets.
Use a single SET statement with a BY statement to interleave the specified data sets. The observations in the new data set are arranged by the values of the BY variable or variables, and within each BY group, by the order of the data sets in which they occur. See Interleaving SAS Data Sets for an example of interleaving data sets.
Use multiple SET statements to perform one-to-one reading (also called one-to-one matching) of the specified data sets. The new data set contains all the variables from all the input data sets. The number of observations in the new data set is the number of observations in the smallest original data set. If the data sets contain common variables, the values that are read in from the last data set replace those read in from earlier ones. See Combining One Observation with Many, Performing a Table-Lookup, and Performing a Table-Lookup When the Master File Contains Duplicate Observations for examples of one-to-one reading of data sets.
For extensive examples, see Combining and Modifying SAS Data Sets: Examples .
Comparisons |
Examples |
If more than one data set name appears in the SET statement, the resulting output data set is a concatenation of all the data sets that are listed. SAS reads all observations from the first data set, then all from the second data set, and so on until all observations from all the data sets have been read. This example concatenates the three SAS data sets into one output data set named FITNESS:
data fitness; set health exercise well; run;
To interleave two or more SAS data sets, use a BY statement after the SET statement:
data april; set payable recvable; by account; run;
In this DATA step, each observation in the data set NC.MEMBERS is read into
the program data vector. Only those observations whose value of CITY is
Raleigh
are output to the new data set RALEIGH.MEMBERS:
data raleigh.members; set nc.members; if city='Raleigh'; run;
An observation to be merged into an exisitng data set can be one that is created by a SAS procedure or another DATA step. In this example, the data set AVGSALES has only one observation:
data national; if _n_=1 then set avgsales; set totsales; run;
In this example, SAS treats each SET statement independently; that is, it reads from one data set as if it were reading from two separate data sets:
data drugxyz; set trial5(keep=sample); if sample>2; set trial5; run;
For each iteration of the DATA step, the first SET statement
reads one observation. The next time the first SET statement is executed,
it reads the next observation. Each SET statement can read different observations
with the same iteration of the DATA step.
You can subset observations from one data set and combine them with observations from another data set by using direct access methods, as follows:
data south; set revenue; if region=4; set expense point=_n_; run;
This example illustrates using the KEY= option to perform a table-lookup. The DATA step reads a primary data set that is named INVTORY and a lookup data set that is named PARTCODE. It uses the index PARTNO to read PARTCODE nonsequentially, by looking for a match between the PARTNO value in each data set. The purpose is to obtain the appropriate description, which is available only in the variable DESC in the lookup data set, for each part that is listed in the primary data set:
data combine; set invtory(keep=partno instock price); set partcode(keep=partno desc) key=partno; run;
This example uses the KEY= option to perform a table lookup. The DATA step reads a primary data set that is named INVTORY, which is indexed on PARTNO, and a lookup data set named PARTCODE. PARTCODE contains quantities of new stock (variable NEW_STK). The UNIQUE option ensures that, if there are any duplicate observations in INVTORY, values of NEW_STK are added only to the first observation of the group:
data combine; set partcode(keep=partno new_stk); set invtory(keep=partno instock price) key=partno/unique; instock=instock+new_stk; run;
These statements select a subset of 50 observations from the data set DRUGTEST by using the POINT= option to access observations directly by number:
data sample; do obsnum=1 to 100 by 2; set drugtest point=obsnum; if _error_ then abort; output; end; stop; run;
These statements use NOBS= to set the termination value for DO-loop processing. The value of the temporary variable LAST is the sum of the observations in SURVEY1 and SURVEY2:
do obsnum=1 to last by 100; set survey1 survey2 point=obsnum nobs=last; output; end; stop;
This example uses the END= variable LAST to tell SAS to assign a value to the variable REVENUE and write an observation only after the last observation of RENTAL has been read:
set rental end=last; totdays + days; if last then do; revenue=totdays*65.78; output; end;
See Also |
Statements:
| |||||||||||||
"Rules for Words and Names" in SAS Language Reference: Concepts | |||||||||||||
"Reading, Modifying, and Combining SAS Data Sets" in SAS Language Reference: Concepts | |||||||||||||
Data Set Options | |||||||||||||
Combining and Modifying SAS Data Sets: Examples |
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.