Chapter Contents |
Previous |
Next |
Overview of Techniques for Optimizing I/O |
I/O is one of the most important factors for optimizing performance. Most SAS jobs consist of repeated cycles of reading a particular set of data to perform various data analysis and data manipulation tasks. To improve the performance of a SAS job, you must reduce the number of times SAS accesses disk or tape devices.
To do this, you can modify your SAS programs to process only the necessary variables and observations by:
You can also modify your programs to reduce the number of times it processes the data internally by:
You can reduce the number of data accesses by processing more data each time a device is accessed by setting the BUFNO=, BUFSIZE=, CATCACHE=, and COMPRESS= system options.
Sometimes you might be able to use more than one method, making your SAS job even more efficient.
Using WHERE-Expression Processing |
For example, the following DATA step creates a data set SEATBELT, which contains only those observations from the AUTO.SURVEY data set for which the value of SEATBELT is YES. The new data set is then printed.
libname auto '/users/autodata'; data seatbelt; set auto.survey; if seatbelt='yes'; run; proc print data=seatbelt; run;
However, you can get the same output from the PROC PRINT step without creating a data set if you use a WHERE statement in the PROC PRINT step, as in the following example:
proc print data=auto.survey; where seatbelt='yes'; run;The WHERE statement can save resources by eliminating the number of times you process the data. In this example, you might be able to use less time and memory by eliminating the DATA step. Also, you use less I/O because there is no intermediate data set. Note that you cannot use a WHERE statement in a DATA step that reads raw data.
The extent of savings that you can achieve depends on many factors, including the size of the data set. It is recommended that you test your programs to determine which is the most efficient solution. See Deciding Whether to Use a WHERE Expression or a Subsetting IF Statement for more information.
Using DROP and KEEP Statements |
Using LENGTH Statements |
You can also use LENGTH statements to reduce the size of your observations. When you include only the necessary storage space for each variable, you can reduce the number of I/O operations that are required to process the data. Before you change the length of a numeric variable, however, see Specifying Variable Lengths. See SAS Language Reference: Dictionary for more information on the LENGTH statement.
Using the OBS= and FIRSTOBS= Data Set Options |
Creating SAS Data Sets |
Another consideration involves whether you are using data sets created with previous releases of SAS. If you frequently process data sets created with previous releases, it is sometimes more efficient to convert that data set to a new one by creating it in the most recent version of SAS. See Compatibility of Version 8 with Earlier Releases for more information.
Using Indexes |
Note: Indexing might or might not, however, improve the performance
of an application. If you are continually rewriting a data set, indexing
its variables would be wasteful because an index must be recreated each time
the data set is rewritten.
See SAS Data Files for more information about indexes.
Accessing Data Through Views |
You can use the SQL procedure or a DATA step to create views of your data. A view is a stored set of instructions that subsets your data with fewer statements. Also, you can use a view to group data from several data sets without creating a new one, saving both processing time and disk space. See SAS Data Views and the SAS Procedures Guide for more details.
Using Engines Efficiently |
/* Engine specified. */ libname fruits v8 '/users/myid/mydir';The following statement does not explicitly specify the V8 engine; notice the NOTE about mixed engine types that is generated:
/* Engine not specified. */ libname fruits '/users/myid/mydir'; NOTE: Directory for library FRUITS contains files of mixed engine types. NOTE: Libref FRUITS was successfully assigned as follows: Engine: V8 Physical Name: /users/myid/mydir
Operating Environment Information: In the OS/390 operating environment, you do not need to specify an engine for certain types of libraries. See SAS I/O Engines for more information about SAS engines.
Setting the BUFNO=, BUFSIZE=, CATCACHE=, and COMPRESS= System Options |
Note: You can also use the CBUFNO= system option to control the number of
extra page buffers to allocate for each open SAS catalog.
See "System Options" in SAS Language Reference: Dictionary and the SAS documentation for your operating environment for more details on this option.
If you know that the total amount of data is going to be small, you can enforce a small page size with the BUFSIZE= option, so that the total data set size remains small and you minimize the amount of wasted space on a page. In contrast, if you know that you are going to have many observations in a data set, you should optimize BUFSIZE= so that as little overhead as possible is needed. Note that each page requires some additional overhead.
Large data sets that are accessed sequentially benefit from larger page sizes because sequential access reduces the number of system calls that are required to read the data set. Note that because observations cannot span pages, typically there is unused space on a page.
Calculating Data Set Size discusses how to estimate data set size.
See "System Options" in SAS Language Reference: Dictionary and the SAS documentation for your operating environment for more details on this option.
See "System Options" in SAS Language Reference: Dictionary and the SAS documentation for your operating environment for more details on this option.
See SAS Language Reference: Dictionary for more details on this option.
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.