![]() Chapter Contents |
![]() Previous |
![]() Next |
The SORT Procedure |
PROC SORT <option(s)> <collating-sequence-option>; |
To do this | Use this option | |
---|---|---|
Specify the input data set | DATA= | |
Create an output data set | OUT= | |
Specify the collating sequence | ||
Specify ASCII | ASCII | |
Specify EBCDIC | EBCDIC | |
Specify Danish | DANISH | |
Specify Finnish | FINNISH | |
Specify Norwegian | NORWEGIAN | |
Specify Swedish | SWEDISH | |
Specify a customized sequence | NATIONAL | |
Specify any of these collating sequences: ASCII, EBCDIC, DANISH, FINNISH, ITALIAN, NORWEGIAN, SPANISH, SWEDISH | SORTSEQ= | |
Specify the output order | ||
Reverse the order for character variables | REVERSE | |
Maintain the order within BY groups | EQUALS | |
Allow for variation within BY groups | NOEQUALS | |
Eliminate duplicate observations | ||
Delete observations with common BY values | NODUPKEY | |
Delete observations that have duplicate values | NODUPRECS | |
Specify the available memory | SORTSIZE= | |
Force redundant sorting | FORCE | |
Reduce temporary disk usage | TAGSORT |
Options |
Restriction: | You can specify only one collating sequence option in a PROC SORT step. |
See also: | Sorting Orders for Character Variables |
Default: | NO |
The Danish and Norwegian collating sequence is shown in National Collating Sequences of Alphanumeric Characters .
Operating Environment Information: For
information about operating environment-specific behavior,
see the SAS documentation for your operating environment.
Restriction: | You can specify only one collating sequence option in a PROC SORT step. |
Main discussion: | Input Data Sets |
Restriction: | You can specify only one collating sequence option in a PROC SORT step. |
See also: | Sorting Orders for Character Variables |
Default: | EQUALS |
Interaction: | When you use NODUPRECS to remove consecutive duplicate observations in the output data set, the choice of EQUALS or NOEQUALS can have an effect on which observations are removed. |
Tip: | Using NOEQUALS can save CPU time and memory. |
Operating Environment Information: For
information about operating environment-specific behavior,
see the SAS documentation for your operating environment.
Restriction: | You can specify only one collating sequence option in a PROC SORT step. |
Tip: | Since, by default, PROC SORT does not sort a data set according to how it is already sorted, you can use FORCE to override this behavior. This might be necessary if the SAS System cannot verify the sort specification in the data set option SORTEDBY=. For information about SORTEDBY=, see the section on SAS system options in SAS Language Reference: Dictionary. |
Restriction: | You cannot use PROC SORT with the FORCE option and without the OUT= option on data sets that were created with the Version 5 compatibility engine or with a sequential engine such as a tape format engine. |
Restriction: | You can specify only one collating sequence option in a PROC SORT step. |
Operating Environment Information: If you use the
VMS operating
environment sort, the observation that is written to the output data set is
not always the first observation of the BY group.
See also: | NODUPRECS |
Featured in: | Displaying the First Observation of Each BY Group |
Alias : | NODUP |
Interaction: | When you are removing consecutive duplicate observations in the output data set with NODUPRECS, the choice of EQUALS or NOEQUALS can have an effect on which observations are removed. |
Interaction: | The action of NODUPRECS is directly related to the setting of the SORTDUP data set option. When SORTDUP= is set to LOGICAL, NODUPRECS removes only the duplicate variables that are present in the input data set after a DROP or KEEP operation. Setting SORTDUP=LOGICAL increases the number of duplicate records that are removed because it eliminates variables before record comparisons takes place. Also, setting SORTDUP=LOGICAL can improve performance because dropping variables before sorting reduces the amount of memory required to perform the sort. When SORTDUP= is set to PHYSICAL, NODUPRECS removes all duplicate variables in the data set, regardless if they have been kept or dropped. For more information about the data set option SORTDUP=, see SAS Language Reference: Dictionary. |
Tip: | Because NODUPRECS checks only consecutive observations, some nonconsecutive duplicate observations may remain in the output data set. You can remove all duplicates with this option by sorting on all variables. |
See also: | NODUPKEY |
Default: | Without OUT=, PROC SORT overwrites the original data set. |
Tip : | You can use data set options with OUT=. |
Featured in: | Sorting by the Values of Multiple Variables |
Interaction: | Using REVERSE with the DESCENDING option in the BY statement restores the sequence to the normal order. |
See also: | The DESCENDING option in the BY statement. The difference is that the DESCENDING option can be used with both character and numeric variables. |
Danish | |
Finnish | |
Italian | |
Norwegian | |
Spanish | |
Swedish |
To see how the alphanumeric characters in each language will sort, refer to National Collating Sequences of Alphanumeric Characters .
Restriction: | You can specify only one collating sequence, either by SORTSEQ= or by one of the individual options that are available in the PROC SORT statement. |
National Collating Sequences of Alphanumeric Characters
Specifying the SORTSIZE= option in the PROC SORT statement temporarily overrides the SAS system option SORTSIZE=. For information about the system option, see the section on SAS system options in SAS Language Reference: Dictionary
Operating Environment Information: Some
system sort utilities may treat this option differently.
Refer to the SAS documentation for your operating environment.
Default: | the value of the SAS system option SORTSIZE= |
Tip: | This option can help improve sort performance by restricting the virtual memory paging that the operating environment controls. If PROC SORT needs more memory, it uses a temporary utility file. As a general rule, the value of SORTSIZE should not exceed the amount of physical memory that will be available to the sorting process. |
Tip: | When the total length of BY variables is small compared with the record length, TAGSORT reduces temporary disk usage considerably. However, processing time may be much higher. |
![]() Chapter Contents |
![]() Previous |
![]() Next |
![]() Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.