Chapter Contents |
Previous |
Next |
SAS/SHARE User's Guide |
The information in this paper so far has been about SAS files and how they are used by an application. You will be a more effective application developer if, in addition to understanding how to make optimum use of SAS files, you also understand the computer resources that a server consumes. That understanding will allow you to design your applications to make optimum use of a server as well as optimum use of SAS files.
A server is an independently running SAS session that brokers requests for data from other SAS sessions. There are 4 kinds of computer resources that a server consumes:
CPU, I/O, and memory resources are consumed by every SAS session. Messages is a name for one measurable aspect of the complex area of communications resources; communications resources are consumed by SAS/SHARE software and SAS/CONNECT software because these two products enable SAS sessions to communicate with one another.
Any work done by a server consumes more than one kind of resource (if you are looking for simple uncomplicated truths, you may want to skip this section). A server can do several kinds of work and, as you might expect, not all kinds of work consume resources in the same relative amounts. For example, some work a server can do consumes much of the CPU resource but little of the other resources, while other work consumes much of the memory resource, less of the CPU resource, and very little of the other resources.
CPU |
Most requests handled by the processes in a server require small bursts of CPU time. But there are several requests that can consume especially large amounts of CPU time:
When a SAS data set is accessed through a server, every WHERE clause used to select observations from that data set is evaluated by a process in the server's SAS session. This increases the server's overall use of the CPU resource to reduce its use of the messages resource. Often, evaluation of a WHERE clause can be optimized by using an index to locate the desired observations. But when an index is not used, or selects more observations than satisfy the WHERE clause, the process in the server's session must search for observations that completely satisfy the WHERE clause. Searching can consume a significant amount of the CPU resource. While a process conducts a search, it yields periodically to allow other processes in the server's session to do work for other users.
A PROC SQL view can consume quite a bit of the CPU resource. The SQL view engine may join tables, it may need to sort intermediate files, and there may be several WHERE clauses in the view that require evaluation. The process in which the SQL view engine executes yields periodically while a view is interpreted.
DATA step views and SAS/ACCESS views also consume the CPU resource. The process in which either of these engines executes does not yield to allow other processes to run, although the server itself allows other processes to run when a group of observations has been prepared for transmission to a user's SAS session. A DATA step view that does a great deal of calculation while preparing each observation can have a visibly harmful impact on a server's response time to other users' requests.
When a compressed SAS data file is read, processes in the server's session decompress each observation; when a compressed SAS data file is created or replaced, a process in the server's session compresses each observation. In many cases the time required to decompress (or compress) is shorter than the time required to read the additional pages of an uncompressed file. In other words, trading increased use of the CPU resource for decreased use of the I/O resource can, on balance, reduce the length of time users wait for a server to respond. While a user processes a compressed data file through a server, other processes in the server's session may execute between groups of observations requested by that user; a SAS data file is not compressed or decompressed in its entirety in a single operation.
The "Programming Techniques" section of this paper offers ideas for reducing the CPU consumption of processes in a server's session under the topics:
I/O |
That waiting could, it would seem, become a bottleneck for a server, and in a few situations this problem is realized. But in practice most of a server's memory is used for I/O buffers and processes in a server's session typically satisfy most requests for data from I/O buffers that are already in memory.
A server typically allocates memory for one page of a file each time the file is opened, up to the number of pages in the file. For example, if the application being executed by a user opens a file twice, enough of the server's memory to contain two pages of the file is allocated; if ten users run the application, space for 20 pages of the file is allocated in the server's memory. The number of buffers allocated for a file will not exceed the number of pages in the file.
Of course, the pages of the file maintained in memory are not the same set of pages all the time: as users request pages of the file that are not in memory, pages that are in memory are written back to the file on disk if they have been modified, or if an in-memory page has not been modified its buffer is simply used to read the new page.
A larger page size can reduce the number of I/O operations required to process a SAS data file. But it takes longer to read a large page than it takes to read a small one, so unless most of the observations in a large page are likely to be accessed by users, large page sizes can increase the amount of time required to perform I/O activity in the server's SAS session.
There are two patterns in which data is read from or written to SAS files:
When an application processes a SAS file in sequential order, no page of the file is read into or written from the server's memory more than once each time the file is read or written. Also, observations are transmitted to and from users' sessions in groups, which conserves the messages resource.
In many applications that are used with concurrently accessed files, data is accessed in random order, i.e., a user reads the 250th observation, then the 10,000th observation, then the 5th observation, and so forth. When a file is processed in random order, it is much more difficult to predict how many times each page of the file will be read into or written from the memory of a server's SAS session. In addition, only one observation is transmitted on each message between server and user, which does not conserve the messages resource.
The "Programming Techniques" section of this paper offers ideas for reducing the I/O load of a server under the topics:
Memory |
Large amounts of a server's memory are consumed by:
Since the ORDER BY clause causes the observations produced by a view to be sorted every time the view is interpreted, it requires memory to be used for a work area for the sorting step. Your application should only use this clause in its views when it has a clear benefit for your users.
When a SAS data file is opened, all indexes on the file are opened. Therefore, when a SAS data file has many indexes, a large amount of memory in the server's SAS session can be used to store pages of the index file and related control information. Of course, when many SAS data files that are accessed through a server each have many indexes, this effect is multiplied.
At SAS Institute, we have observed that the majority of servers' memory has been consumed by I/O buffers. Carefully selecting the number of times each file is opened by your application and the page size of each file can have considerable impact on the amount of memory required by a server.
The "Programming Techniques" section of this paper offers ideas for reducing the memory requirements of a server under the topics:
Messages |
Messages and replies are transmitted by communications access methods. The cost of a message varies greatly with access method. Memory-to-memory communication within a single computer, for example via the Cross-Memory Services (COMAMID=XMS) or Inter-User Communications Vehicle (COMAMID=IUCV) access methods is very rapid, while messages that flow on cables between computers, for example via the DECnet (COMAMID=DECNET) or TCP/IP (COMAMID=TCP) access methods take much longer to travel between SAS sessions.
SAS Institute has observed that the cost of sending data via most communications access methods is more directly a function of the number of messages than the amount of data. In other words, to move a million characters of data between a user and a server, it takes less time to send the data in 100 messages than to send the data in 10,000 messages.
SAS/SHARE software conserves the messages resource by:
The "Programming Techniques" section of this paper offers some ideas for conserving the messages resource under the topics:
The "Tuning Options" section shows options you can use to control the grouping of observations on messages between servers and users:
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.