APPEND Statement
Adds the observations from one SAS data set to the end of another
SAS data set.
Reminder: |
You can use data set options with the BASE=
and DATA= options.
See Statements with the Same Function in Multiple Procedures for details. You can also use any global statements as well. See
Global Statements for a list.
|
Restriction: |
The BASE= data set must be a member of a
SAS library that supports UPDATE access. The TAPE engine and the XPORT engine
are examples of engines that do not support UPDATE access.
|
Restriction: |
If the BASE= data set is accessed through
a SAS server and if no other user has the data set open at the time the APPEND
statement begins processing, the BASE= data set defaults to CNTLLEV=MEMBER.
When this happens, no other user can update the file while the data set is
processed.
|
Featured
in: |
Concatenating Two SAS Data Sets
|
APPEND
BASE=<libref.>SAS-data-set <options>
;
|
- BASE=<libref.> SAS-data-set
- names the data set to which you want to add observations.
- libref
- specifies the library that contains the SAS data set. If
you omit libref, the default is the libref for the procedure
input library. If you are using PROC APPEND, the default for libref is either WORK or USER.
- SAS-data-set
- names a SAS data set. If the APPEND statement cannot find
an existing data set with this name, it creates a new data set in the library.
In other words, you can use the APPEND statement to create a data set by
specifying a new data set name in the BASE= argument.
The BASE= data set is the current SAS data set after all append operations
regardless of whether you are creating a new data set or appending to an existing
data set.
- APPENDVER=V6
- uses the Version 6 behavior for appending observations to
the BASE= data set. You must specify V6.
- DATA=<libref.>
SAS-data-set
- names the SAS data set containing observations that you
want to append to the end of the SAS data set specified in the BASE= argument.
- libref
- specifies the library that contains the SAS data set. If
you omit libref, the default is the libref for the procedure
input library. The DATA= data set can be from any SAS data library, but you
must use the two-level name if the data set resides in a library other than
the procedure input library.
- SAS-data-set
- names a SAS data set. If the APPEND statement cannot find
an existing data set with this name, it stops processing.
- FORCE
- forces the APPEND statement to concatenate data sets when
the DATA= data set contains variables that either
- are not in the BASE= data set
- do not have the same type as the variables in the BASE= data
set
- are longer than the variables in the BASE= data set.
You can use the
WHERE= data set option with the DATA= data set
to restrict the observations that are appended. Likewise, you can use the
WHERE statement to restrict the observations from the DATA= data set. The
WHERE statement has no affect on the BASE= data set. If you use the WHERE=
data set option with the BASE= data set, WHERE= has no affect.
If
you use the SET statement in a DATA step to concatenate two data sets, the
SAS System must process all the observations in both data sets to create a
new one. The APPEND statement bypasses the processing of data in the original
data set and adds new observations directly to the end of the original data
set. Using the APPEND statement can be more efficient than using a SET statement
if
The APPEND statement is especially useful if you frequently add observations
to a SAS data set (for example, in production programs that are constantly
appending data to a journal-type data set).
In order to use the APPEND statement, you need read access to
the DATA= data set and write access to the BASE= data set. To gain access,
use the READ= and WRITE= data set options in the APPEND statement the way
you would use them in any other SAS statement, in parentheses immediately
after the data set name. When you are appending password-protected data sets,
remember the following guidelines:
- If you do not give the read password for the DATA= data set in
the APPEND statement, by default the procedure looks for the read password
for the DATA= data set in the PROC DATASETS statement. However, the procedure
does not look for the write password for the BASE= data set in the PROC DATASETS
statement. Therefore, you must specify the write password for the BASE= data
set in the APPEND statement.
- If the BASE= data set is read-protected only, you must specify
its read password in the APPEND statement.
You
can concatenate compressed SAS data sets. Either or both of the BASE= and
DATA= data sets can be compressed. If the BASE= data set allows the reuse
of space from deleted observations, the APPEND statement may insert the observations
into the middle of the BASE= data set to make use of available space.
For information on the COMPRESS= and REUSE= data set and system options,
see SAS Language: Reference.
Beginning with Version 7, the behavior of appending to an indexed
data set has changed to improve performance.
- In Version 6, when you appended to an indexed data set, the
index was updated for each added observation. Index updates tend to be random;
therefore, disk I/O could have been high.
- Currently, SAS does not update the index until all observations
are added to the data set. After the append, SAS internally sorts the
observations and inserts the data into the index in sequential order, which
reduces most of the disk I/O and results in a faster append method.
The current method is used by default when the following requirements
are met; otherwise, the Version 6 method is used:
- The BASE= data set is open for member-level locking.
- The BASE= data set does not contain
referential integrity constraints.
- The BASE= data set is not accessed using the Cross Environment
Data Access (CEDA) facility.
- The BASE= data set is not using a WHERE= data set option.
To display information in the SAS log about
the append method that is
being used, you can specify the MSGLEVEL= system option as follows:
options msglevel=i;
Either a
message displays if the fast-append method is in use or a message or messages
display as to why the fast-append method is not in use.
The current append method initially adds observations to the BASE= data
set regardless of the restrictions that are determined by the index. For example,
a variable that has an index that was created with the UNIQUE option does
not have its values validated for uniqueness until the index is updated. Then,
if a nonunique value is detected, the offending observation is deleted from
the data set. This means that after observations are appended, some of them
may subsequently be deleted.
For a simple example, consider that the BASE= data set has ten observations
numbered from 1 to 10 with a UNIQUE index for the variable ID. You append
a data set that contains five observations numbered from 1 to 5, and observations
3 and 4 both contain the same value for ID. The following occurs
- After the observations are appended, the BASE= data set contains
15 observations numbered from 1 to 15.
- SAS updates the index for ID, validates the values, and determines
that observations 13 and 14 contain the same value for ID.
- SAS deletes one of the observations from the BASE= data set,
resulting in 14 observations that are numbered from 1 to 15. For example,
observation 13 is deleted. Note that you cannot predict which observation
will be deleted, because the internal sort may place either observation first.
(In Version 6, you could predict that observation 13 would be added and
observation 14 would be rejected.)
If you do not want the current behavior (which could result in deleted
observations) or if you want to be able to predict which observations are
appended, request the Version 6 append method by specifying the APPENDVER=V6
option:
proc data sets;
append base=a data=b appendver=v6;
run;
Note: In Version 6, deleting the index and then recreating it after
the append could improve performance. The current method may eliminate the
need to do that. However, the performance depends on the nature of your data.
If
the DATA= data set contains variables that are not in the
BASE= data set, use the FORCE option in the APPEND statement to force the
concatenation of the two data sets. The APPEND statement drops the extra variables
and issues a warning message.
If the BASE= data set contains a variable that is not in the DATA= data
set, the APPEND statement concatenates the data sets, but the observations
from the DATA= data set have a missing value for the variable that was not
present in the DATA= data set. The FORCE option is not necessary in this
case.
If a variable has different attributes in the BASE= data set
than it does in the DATA= data set, the attributes in the BASE= data set prevail.
If the length of a variable is longer in the DATA= data set than in
the BASE= data set, or if the same variable is a character variable in one
data set and a numeric variable in the other, use the FORCE option. Using
FORCE has these consequences:
- The length of the variables in the BASE= data set takes precedence.
The SAS System truncates values from the DATA= data set to fit them into
the length that is specified in the BASE= data set.
- The type of the variables in the BASE= data set takes precedence.
The APPEND statement replaces values of the wrong type (all values for the
variable in the DATA= data set) with missing values.
If the DATA= data set contains integrity constraints and the
BASE= data set does not exist, the APPEND statement copies both general and
referential integrity constraints. If the BASE= data set exists, the APPEND
action copies only observations.
You can
use the GENNUM= data set option to append to a specific generation file. Here
are examples:
SAS Statements |
Result |
proc datasets;
append base=a
data=b(gennum=2);
|
appends B (GENNUM=2) to A |
proc datasets;
append base=a(gennum=2)
data=b(gennum=2);
|
appends B (GENNUM=2) to A(GENNUM=2) |
The only difference between the APPEND procedure and the APPEND
statement in PROC DATASETS, is the default for libref in the
BASE= and DATA= arguments. For PROC APPEND, the default is either WORK or
USER. For the APPEND statement, the default is the libref of the procedure
input library.
If a system failure or some other type of interruption
occurs
while the procedure is executing, the append operation may not be successful;
it is possible that not all, perhaps none, of the observations will be added
to the BASE= data set. In addition, the BASE= data set may suffer damage.
The APPEND operation performs an update in place, which means that it does
not make a copy of the original data set before it begins to append observations.
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.