Chapter Contents |
Previous |
Next |
SAS Companion for the Microsoft Windows Environment |
Improving Performance of the SORT Procedure |
Two options for the PROC SORT statement are available under
Windows, the SORTSIZE= and TAGSORT options. These two options control the
amount of memory the SORT procedure uses during a sort and are discussed in
the next two sections. Also included is a discussion of determining where
the sorting process occurs for a given data set and determining how much disk
space you need for the sort. For more information about the SORT procedure,
see SORT.
If you do not use the SORTSIZE= option in the PROC SORT statement, PROC SORT uses the value of the SORTSIZE= system option. If the system option is not set, PROC SORT uses all available memory and causes unnecessary amounts of swapping. If you use the SORTSIZE= option to limit the amount of available memory to about 1 or 2 megabytes, most of the unneeded SAS files and operating system files are swapped out, and the 1 to 2 megabytes of sort buffers stay in memory for an optimum sort. If PROC SORT needs more memory than you specify, it creates a temporary utility file in your SASWORK directory to complete the sort.
The default value of this option is 2 megabytes (MB), which is optimal. If your machine has more than 12 MB of physical memory and you are sorting large data sets, setting this option to a value between 4 MB and 8 MB may improve performance.
Note: You can also use the SORTSIZE system option, which
has the same effect as the SORTSIZE= option in the PROC SORT statement.
When you sort a SAS data set, a temporary utility file is opened in the WORK data library (that is, in a subdirectory of the SASWORK directory) if there is not enough memory to hold the data set during the sort. This file has a .sasv7butl file extension. This file can be several times as large as your data set. Before you sort, be sure your WORK data library has room for this temporary utility file.
Note: If you work with especially large data sets, and
you use a Windows NT NTFS disk volume, you should redirect your WORK data
library to that volume. Windows NT with NTFS is not restricted by the 2 gigabyte
file size limit you might encounter under other Windows systems. For more
information, see Using Large Data Sets with Windows NT and NTFS.
A second file with a .SU7 file extension is also created, which, if the sort completes successfully, is renamed to the data set name of the file being sorted (with a .sasv7bdat file extension). The original data set is then deleted. This technique ensures data integrity. Be sure that you have space for this .SU7 file. Use the following rules to determine where the .SU7 file and the resulting sorted data set are created:
libname mylib 'c:\sas\mydata'; proc sort data=mylib.report; by name; run;
Similarly, if you specify a one-level data set name, the .SU7 file is created in your WORK data library.
proc sort data=report out=newrpt; by name; run; libname january 'f:\jandata'; proc sort data=report out=january.newrpt; by name; run;
Calculating Data Set Size |
To estimate the amount of disk space needed for a SAS data set:
For example, for a data set with one character variable and four numeric variables, you would submit the following statements:
data oranges; input variety $ flavor texture looks; total=flavor+texture+looks; datalines; navel 9 8 6 ; proc contents data=oranges; title 'Example for Calculating Data Set Size'; run;
These statements generate the output shown in Example for Calculating Data Set Size with PROC CONTENTS.
Example for Calculating Data Set Size with PROC CONTENTS
Example for Calculating Data Set Size 1 07:44 Tuesday, February 2, 1999 The CONTENTS Procedure Data Set Name: WORK.ORANGES Observations: 1 Member Type: DATA Variables: 5 Engine: V8 Indexes: 0 Created: 7:45 Tuesday February 2, 1999 Observation Length: 40 Last Modified: 7:45 Tuesday February 2, 1999 Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label: -----Engine/Host Dependent Information----- Data Set Page Size: 4096 Number of Data Set Pages: 1 First Data Page: 1 Max Obs per Page: 101 Obs in First Data Page: 1 Number of Data Set Repairs: 0 File Name: C:\TEMP\SAS Temporary Files\_Td200\oranges.sas7bdat Release Created: 8.00.008 Host Created: WIN_NT -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 2 flavor Num 8 0 4 looks Num 8 16 3 texture Num 8 8 5 total Num 8 24 1 variety Char 8 32 |
The size of the resulting data set depends on the data set page size and the number of observations. The following formula can be used to estimate the data set size:
number of data pages = 1 + (floor(number of obs /
Max Obs per Page )) | |
size = 256 + (
Data Set Page Size * number of data pages) |
Taking the information shown in Example for Calculating Data Set Size with PROC CONTENTS, you can calculate the size of the example data set:
number of data pages = 1 + (floor(1/101)) | |
size = 256 + (4096 * 1) = 4352 |
Thus, the example data set uses 4,352 bytes of storage space.
Increasing the Efficiency of Interactive Processing |
If you are running a SAS job using the SAS System interactively
and the job generates numerous log messages or extensive output, consider
using the AUTOSCROLL command to suppress the scrolling of windows. This makes
your job run faster because the SAS System does not have to use resources
to update the display of the LOG and OUTPUT windows during the job. For example,
issuing
autoscroll 0
in the LOG window causes the LOG window not to scroll
until your job is finished. (For the OUTPUT window, AUTOSCROLL is set to 0
by default.)
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.