To do this |
Use this option |
Specify the input data set |
DATA= |
Create an output data set |
OUT= |
Specify the collating sequence |
|
|
Specify ASCII |
ASCII |
|
Specify EBCDIC |
EBCDIC |
|
Specify Danish |
DANISH |
|
Specify Finnish |
FINNISH |
|
Specify Norwegian |
NORWEGIAN |
|
Specify Swedish |
SWEDISH |
|
Specify a customized sequence |
NATIONAL |
|
Specify any of these collating sequences: ASCII, EBCDIC,
DANISH, FINNISH, ITALIAN, NORWEGIAN, SPANISH, SWEDISH |
SORTSEQ= |
Specify the output order |
|
|
Reverse the order for character variables |
REVERSE |
|
Maintain the order within BY groups |
EQUALS |
|
Allow for variation within BY groups |
NOEQUALS |
Eliminate duplicate observations |
|
|
Delete observations with common BY values |
NODUPKEY |
|
Delete observations that have duplicate values |
NODUPRECS |
Specify the available memory |
SORTSIZE= |
Force redundant sorting |
FORCE |
Reduce temporary disk usage |
TAGSORT |
-
ASCII
- sorts character variables using the ASCII
collating sequence. You need this option only when you sort by ASCII on a
system where EBCDIC is the native collating sequence.
-
DANISH
NORWEGIAN
- sort characters according to the Danish
and Norwegian national standard.
The Danish and Norwegian collating sequence is shown
in
National Collating Sequences of Alphanumeric Characters
.
Operating Environment Information: For
information about operating environment-specific behavior,
see the SAS documentation for your operating environment.
Restriction: |
You can specify
only one collating sequence option in a PROC SORT step. |
-
DATA= SAS-data-set
- identifies the input SAS data set.
-
EBCDIC
- sorts character variables using the EBCDIC
collating sequence. You need this option only when you sort by EBCDIC on a
system where ASCII is the native collating sequence.
-
EQUALS | NOEQUALS
- specifies the order of the observations
in the output data set. For observations with identical BY-variable values,
EQUALS maintains the order from the input data set in the output data set.
NOEQUALS does not necessarily preserve this order in the output data set.
Default: |
EQUALS |
Interaction: |
When you use NODUPRECS
to remove consecutive duplicate observations in the output data set, the choice
of EQUALS or NOEQUALS can have an effect on which observations are removed. |
Tip: |
Using NOEQUALS can save
CPU time and memory. |
-
FINNISH
SWEDISH
- sort characters according to the Finnish
and Swedish national standard. The Finnish and Swedish collating sequence
is shown in
National Collating Sequences of Alphanumeric Characters
.
Operating Environment Information: For
information about operating environment-specific behavior,
see the SAS documentation for your operating environment.
Restriction: |
You can specify
only one collating sequence option in a PROC SORT step. |
-
FORCE
- sorts and replaces an indexed or subsetted
data set when the OUT= option is not specified. Without the FORCE option,
PROC SORT does not sort and replace an indexed data set because sorting destroys
user-created indexes for the data set. When you specify FORCE, PROC SORT sorts
and replaces the data set and destroys all user-created indexes for the data
set. Indexes that were created or required by integrity constraints are preserved.
Tip: |
Since, by default, PROC
SORT does not sort a data set according to how it is already sorted, you can
use FORCE to override this behavior. This might be necessary if the SAS System
cannot verify the sort specification in the data set option SORTEDBY=. For
information about SORTEDBY=, see the section on SAS system options in SAS Language Reference: Dictionary. |
Restriction: |
You cannot use PROC
SORT with the FORCE option and without the OUT= option on data sets that were
created with the Version 5 compatibility engine or with a sequential engine
such as a tape format engine. |
-
NATIONAL
- sorts character variables using an alternate
collating sequence, as defined by your installation, to reflect a country's
National Use Differences. To use this option, your site must have a customized
national sort sequence defined. Check with the SAS Installation Representative
at your site to determine if a customized national sort sequence is available.
Restriction: |
You can specify
only one collating sequence option in a PROC SORT step. |
-
NODUPKEY
- checks for and eliminates observations with
duplicate BY values. If you specify this option, PROC SORT compares all BY
values for each observation to those for the previous observation written
to the output data set. If an exact match is found, the observation is not
written to the output data set.
Operating Environment Information: If you use the
VMS operating
environment sort, the observation that is written to the output data set is
not always the first observation of the BY group.
-
NODUPRECS
- checks for and eliminates duplicate observations.
If you specify this option, PROC SORT compares all variable values for each
observation to those for the previous observation that was written to the
output data set. If an exact match is found, the observation is not written
to the output data set.
Alias : |
NODUP |
Interaction: |
When you are removing
consecutive duplicate observations in the output data set with NODUPRECS,
the choice of EQUALS or NOEQUALS can have an effect on which observations
are removed. |
Interaction: |
The action of NODUPRECS
is directly related to the setting of the SORTDUP data set option. When SORTDUP=
is set to LOGICAL, NODUPRECS removes only the duplicate variables that are
present in the input data set after a DROP or KEEP operation. Setting SORTDUP=LOGICAL
increases the number of duplicate records that are removed because it eliminates
variables before record comparisons takes place. Also, setting SORTDUP=LOGICAL
can improve performance because dropping variables before sorting reduces
the amount of memory required to perform the sort. When SORTDUP= is set to
PHYSICAL, NODUPRECS removes all duplicate variables in the data set, regardless
if they have been kept or dropped. For more information about the data set
option SORTDUP=, see SAS Language Reference: Dictionary. |
Tip: |
Because NODUPRECS checks
only consecutive observations, some nonconsecutive duplicate observations
may remain in the output data set. You can remove all duplicates with this
option by sorting on all variables. |
See also: |
NODUPKEY |
- NOEQUALS
- See EQUALS | NOEQUALS.
- NORWEGIAN
- See DANISH.
-
OUT=SAS-data-set
- names the output data set. If SAS-data-set does not exist, PROC SORT creates it.
-
REVERSE
- sorts character variables using a collating
sequence that is reversed from the normal collating sequence.
Interaction: |
Using REVERSE with
the DESCENDING option in the BY statement restores the sequence to the normal
order. |
See also: |
The DESCENDING option
in the BY statement. The difference is that the DESCENDING option can be used
with both character and numeric variables. |
-
SORTSEQ= collating-sequence
- specifies the collating sequence. The value
of collating-sequence can be any one of the individual options
in the PROC SORT statement that specify a collating sequence, or the value
can be the name of a translation table, either a default translation table
or one that you have created in the TRANTAB procedure. For an example of using
PROC TRANTAB and PROC SORT with SORTSEQ=, see Using Different Translation Tables for Sorting . The available translation
tables are
| Danish |
| Finnish |
| Italian |
| Norwegian |
| Spanish |
| Swedish |
To
see how the alphanumeric characters in each language
will sort, refer to National Collating Sequences of Alphanumeric Characters .
Restriction: |
You can specify
only one collating sequence, either by SORTSEQ= or by one of the individual
options that are available in the PROC SORT statement. |
National Collating Sequences of Alphanumeric Characters
-
SORTSIZE=memory-specification
- specifies the maximum amount of memory that
is available to PROC SORT. memory-specification is one of the
following:
- MAX
- specifies that all available memory can
be used.
- n
- specifies the amount of memory in bytes,
where n is a real number.
- nK
- specifies the amount of memory in kilobytes,
where n is a real number.
- nM
- specifies the amount of memory in megabytes,
where n is a real number.
- nG
- specifies the amount of memory in gigabytes,
where n is a real number.
Specifying the SORTSIZE= option in the PROC SORT statement
temporarily overrides the SAS system option SORTSIZE=. For information about
the system option, see the section on SAS system options in SAS Language Reference:
Dictionary
Operating Environment Information: Some
system sort utilities may treat this option differently.
Refer to the SAS documentation for your operating environment.
Default: |
the value of the SAS
system option SORTSIZE= |
Tip: |
This option can help improve
sort performance by restricting the virtual memory paging that the operating
environment controls. If PROC SORT needs more memory, it uses a temporary
utility file. As a general rule, the value of SORTSIZE should not exceed the
amount of physical memory that will be available to the sorting process. |
- SWEDISH
- See FINNISH.
-
TAGSORT
- stores only the BY variables and the observation
numbers in temporary files. The BY variables and the observation numbers
are called tags. At the completion of the sorting process, PROC
SORT uses the tags to retrieve records from the input data set in sorted order.
Tip: |
When the total length of
BY variables is small compared with the record length, TAGSORT reduces temporary
disk usage considerably. However, processing time may be much
higher. |