Chapter Contents |
Previous |
Next |
SAS/ACCESS Software for Relational Databases: Reference |
General Information |
When you use a view descriptor in a DATA step or SAS procedure, columns that you specified in the VAR and KEEP statements are passed to the DBMS for processing. Columns that you specified in DROP statements are not passed to the DBMS. Therefore, only a subset of the columns is returned to the SAS System, and performance is usually enhanced. (This approach can be applied in some of DBMSs, such as CA-OpenIngres, ORACLE, Oracle Rdb, and SYBASE.)
Sorting DBMS Data |
If you reference a view descriptor in a SAS program and the program includes a BY statement for a variable that corresponds to a column in the DBMS table, the SAS/ACCESS interface view engine automatically generates an ORDER BY clause for that variable. Thus, the ORDER BY clause causes the DBMS to sort the data before the SAS procedure or DATA step uses the data in a SAS program. If the DBMS table is very large, this sorting can adversely affect your performance. Use a BY variable that is based on an indexed DBMS column to help reduce this negative impact.
For example, if the view descriptor has an ORDER BY clause and you have specified a BY statement in your SAS program, the BY statement overrides the view descriptor's ORDER BY clause. If you use an ORDER BY clause in an SQL procedure statement that references a view descriptor, this ORDER BY clause also overrides the view descriptor's ORDER BY clause.
Extracting Data Using a View |
A DBMS table is read every time a view descriptor is referred to in a SAS program and the program is executed; the program's output reflects the latest updated level of the DBMS table. If many users are reading the same DBMS table repeatedly, DBMS performance may decrease. If you create several reports during the same SAS session, they may not be based on the same DBMS data due to updating by other users. Therefore, in the following circumstances, it is better to extract data:
If a view descriptor describes a large DBMS table and you plan to use the same DBMS data in several procedures or DATA steps during the same SAS session, you might improve performance by extracting the data. Placing the data into a SAS data file requires a certain amount of disk space to store the data and I/O to write the data. However, SAS data files are organized to provide optimal performance with PROC and DATA steps. Programs that use SAS data files are often more efficient than programs that read DBMS data directly.
(Exception: If you are using a SAS WHERE statement to create small subsets of data and you are reading the subsets based on indexed columns, it may be better not to extract the data from a large DBMS table. In this case, the DBMS can probably retrieve subsets of data based on indexed columns faster.)
If you intend to use DBMS data in a particular sorted order several times, run the SORT procedure on the view descriptor by using the OUT= option to extract the data. This OUT= option is required whenever PROC SORT references a view descriptor. PROC SORT sends an ORDER BY clause to the DBMS so that the data is returned in sorted order to the SAS data file. PROC SORT does not perform any additional sorting of the data. Extracting the data in this way is more efficient than requesting the same sort repeatedly (with an ORDER BY clause) on the DBMS data.
If you are the owner of a DBMS table and do not want anyone else to read the data, you might want to extract the data (or a subset of the data) and not distribute information about either the access descriptor or view descriptor. Or, you might want to assign DBMS security features to your DBMS tables to prevent unauthorized reading or writing to them.
On the SAS System side, you might also want to assign SAS System passwords to your descriptors for additional security. If a view descriptor has a password assigned to it and you extract the data, the new SAS data file is automatically assigned the same password. If a view descriptor does not have a password, you can assign a password to the extracted SAS data file. You can assign a password by using PROC DATASETS on your descriptor. See the SAS Procedures Guide for more information.
Using a Subset of the DBMS Data |
As a general rule, a view descriptor's WHERE clause is passed to the DBMS for processing. How WHERE Clauses Are Processed with View Descriptors and the explanation afterwards describe this process in more detail.
View descriptor with a WHERE clause | System that processes the WHERE clause |
---|---|
used in a DATA step, PROC step, or PROC SQL query | |
without additional WHERE clause | DBMS |
with additional WHERE clause | DBMS: compound WHERE clause built with AND; otherwise, SAS System (table note 1) |
used with a PROC SQL Pass-Through query (table note 2) | |
without additional WHERE clause | DBMS |
with additional WHERE clause | SAS System. Each DBMS table in the join is processed by the DBMS and then returned to the SAS System for final processing. PROC SQL cannot optimize a join of this kind. |
The interface view engine builds a compound WHERE clause using AND operator(s) if the clauses are valid for the DBMS. Otherwise, only the valid part of the WHERE clause is sent to the DBMS for processing and the SAS System processes the remaining part(s). See "Using Multiple WHERE Clauses" below for more information.
When a view descriptor and a Pass-Through
query are used within the same PROC SQL query, they are usually joined in
the FROM clause.
Note: See
SQL Procedure's Interaction with SAS/ACCESS Software for more information on Pass-Through queries.
However, if a WHERE clause contains SAS enhancements or features that are not supported by the DBMS, the WHERE conditions are split up, and only the valid condition (that is, valid DBMS-specific syntax) is sent to the DBMS. The SAS System processes the remaining part(s) of the WHERE clause.
For example, a DBMS-specific SQL cannot parse a colon modifier on a comparison operator, and therefore, the first half of the following WHERE clause cannot be passed to the DBMS.
The second half of the WHERE clause is passed to the DBMS, and the DBMS returns to the SAS System all the rows in the INVOICE table for which PAIDON is on or after 01JAN94. The SAS System must then process the first half of the WHERE clause to subset the rows that were returned from the DBMS for the countries Argentina and Australia:
where country =: 'A' and paidon >= '01JAN94'd;
It is more efficient to use a LIKE operator that is valid in both the DBMS-specific SQL and the SAS System:
where country like 'A%' and paidon >= '01JAN94'd;
In this case, the DBMS can process both halves of the WHERE clause.
Note that the SAS/ACCESS interface view engine can translate certain SAS conventions, such as datetime formats or the IS MISSING operator, to their DBMS-specific SQL equivalents and pass these clauses to the DBMS.
In most cases it is more efficient for the DBMS to process the WHERE conditions. Therefore, write WHERE conditions using features (such as SQL operators) that are valid in both the SAS System and the DBMS so that the DBMS can process the entire WHERE clause.
It is also more efficient to use a SAS WHERE statement
instead of a subsetting IF statement. As just described, a WHERE clause is
passed to the DBMS for processing and returns a subset of rows to the SAS
System for further processing. In contrast, when you use a subsetting IF
statement, every row is returned to the SAS System to be evaluated by the
IF statement. Therefore, using a SAS WHERE statement often improves performance.
You cannot tell the DBMS to use an index, but you can write WHERE clauses that enable it to use its DBMS indexes effectively. Here are some guidelines for writing WHERE clauses that enable the DBMS to use indexes effectively.
Note: The guidelines for each specific
DBMS may vary.
Inefficient:
where zipcode not>8000 | |
Efficient:
where zipcode<=8000 |
Inefficient:
where ZIPCODE>=70000 and ZIPCODE<=80000 | |
Efficient:
where ZIPCODE between 70000 and
80000 |
Inefficient:
where COUNTRY like '%INA' | |
Efficient:
where COUNTRY like 'A%INA' |
Inefficient:
where SALARY>12*4000.00 | |
Efficient:
where SALARY>48000.00 |
For a list of operators that are generally accepted by the SQLs of most DBMSs, see the SAS Procedures Guide.
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.