Chapter Contents |
Previous |
Next |
SAS/ACCESS Software for Relational Databases: Reference |
A major objective of SAS/ACCESS when reading DBMS tables is to take advantage of Teradata's rate of data transfer. This section describes the actions that you can take to ensure that SAS/ACCESS delivers optimal read performance including
Using PreFetch as a LIBNAME Option | |
Using Prefetch as a Global Option. |
About the PreFetch Facility |
PreFetch is a SAS/ACCESS for Teradata facility that speeds up a SAS job by exploiting the parallel processing capability of Teradata. To obtain benefit from the facility, your SAS job must run more than once and have the following characteristics:
|
In brief, the
ideal job is a stable read-only SAS job.
Use of PreFetch is optional. To use the facility, you must explicitly
enable it with the LIBNAME option PREFETCH=.
When reading DBMS tables, SAS/ACCESS submits SQL statements on your behalf to Teradata. Each SQL statement that is submitted has an execution cost: the amount of time Teradata spends processing the statement before it returns the requested data to SAS/ACCESS.
When PreFetch is enabled, the first time you run your SAS job, SAS/ACCESS identifies and selects statements with a high execution cost. SAS/ACCESS then stores (caches) the selected SQL statements to one or more Teradata macros that it creates.
On subsequent runs of the job, when PreFetch is enabled, SAS/ACCESS extracts
statements from the cache and submits them to Teradata in advance. The rows
selected by these SQL statements are immediately available to SAS/ACCESS because
Teradata 'prefetches' them. Your SAS job runs faster because PreFetch reduces
the wait for SQL statements with a high execution cost. However, Prefetch
improves elapsed time only on subsequent runs of a SAS job. During the first
run, SAS/ACCESS only creates the SQL
cache and stores selected SQL statements; no prefetching is performed.
unique_storename | As mentioned, when PreFetch is enabled, SAS/ACCESS creates
one or more Teradata macros to store the selected SQL statements that PreFetch
caches. You can easily distinguish a PreFetch macro from other Teradata macros.
The PreFetch Teradata macro contains a comment that is prefaced with the text,
"SAS/ACCESS PreFetch Cache" .
The name that the PreFetch facility assigns the macro is the value that you enter for the unique_storename argument. The unique_storename name must be unique. Do not specify a name that exists in the Teradata DBMS already for a DBMS table, view or macro. Also, do not enter a name that exists already in another SAS job that employs the Prefetch facility. |
#sessions | This argument specifies how many cached SQL statements SAS/ACCESS submits
in parallel to Teradata. In general, your SAS job completes faster if you
increase the number of statements that Teradata works on in advance. However,
a large number (too many sessions) can strain client and server resources.
A valid value is 1 through 9. If you do not specify a value for this argument,
the default is 3.
In addition to the specified number of sessions, SAS/ACCESS adds an additional session for submitting SQL statements that are not stored in the PreFetch cache. Thus, if the default is 3, SAS/ACCESS actually opens up to 4 sessions on the Teradata server. |
algorithm | This argument is present to handle future enhancements. Currently PreFetch only supports one algorithm, SEQUENTIAL. |
If you have a read-only SAS job that runs frequently, this is an ideal candidate for PreFetch. For example, a daily job that extracts data from Teradata tables. To help you decide when to use PreFetch, consider the following daily jobs:
Reads and collects data from the Teradata DBMS.
Contains a WHERE clause that reads in values from an external, variable data source. As a result, the SQL code that the job submits through a Teradata LIBNAME statement or PROC SQL changes from run to run.
In these examples, Job 1 is an excellent candidate for the facility. In contrast, Job 2 is not. Using PreFetch with Job 2 does not return incorrect results but can impose a performance penalty. PreFetch uses stored SQL statements. Thus, Job 2 is not a good candidate because the SQL statements that the job generates with the WHERE clause change each time the job is run. Consequently, the SQL statements that the job generates never match the statements that are stored.
The impact of Prefetch on processing performance varies by SAS job.
Some jobs improve elapsed time 5% or less; others improve elapsed time 25%
or more.
It is unlikely, but possible, to write a SAS job that delivers unexpected
or incorrect results. This can occur if the job contains code that waits on
some Teradata or system event before proceeding. For example, SAS code that
pauses the SAS job until another user updates a given data item in a Teradata
table. Or, SAS code that pauses the SAS job until a given time; for example,
5:00 p.m. In both cases, PreFetch would generate SQL statements in advance.
But, table results from these SQL statements would not reflect data changes
that will be made by the scheduled Teradata or system event.
PreFetch Facility is designed to handle unusual conditions gracefully. Some of these unusual conditions include:
Using PreFetch as a LIBNAME Option |
If you specify the PreFetch= option in a LIBNAME statement, PreFetch applies the option to tables read by the libref.
Note: If you have more
than one LIBNAME in your SAS job, and you specify PreFetch for each LIBNAME,
remember to make the SQL cache name for each LIBNAME unique.
libname one teradata user=kamdar password=ellis prefetch='pf_store1'; libname two teradata user=larry password=riley;
In Example 1, you apply PreFetch to one of two LIBNAMES. During the
first job run, PreFetch stores SQL statements for tables referenced by the
LIBNAME ONE in a Teradata macro named PF_STORE1 for reuse later.
libname emp teradata user=kamdar password=ellis prefetch='emp_sas_macro'; libname sale teradata user=larry password=riley prefetch='sale_sas_macro';
In Example 2, you apply PreFetch to multiple LIBNAMES. During the first job run, PreFetch stores SQL statements for tables referenced by LIBNAME EMP to a Teradata macro named EMP_SAS_MACRO and SQL statements for tables referenced by LIBNAME SALE to a Teradata macro named SALE_SAS_MACRO.
Using Prefetch as a Global Option |
Unlike other Teradata LIBNAME options, you can also invoke PreFetch globally for a SAS job. To do this, place the OPTION DEBUG= statement in your SAS program before all LIBNAME statements and PROC SQL steps. If your job contains multiple LIBNAME statements, the global PreFetch invocation creates a uniquely named SQL cache name for each of the LIBNAMES.
Note: Do not be confused by the DEBUG option here. It is merely a mechanism
to deliver the PreFetch capability globally. PreFetch is not for debugging;
it is a supported feature of SAS/ACCESS for Teradata.
option debug="PREFETCH(unique_mac,2,SEQUENTIAL)"; libname one teradata user=kamdar password=ellis; libname two teradata user=kamdar password=ellis database=larry; libname three teradata user=kamdar password=ellis database=wayne; proc print data=one.kamdar_goods; run; proc print data=two.larry_services; run; proc print data=three.wayne_miscellaneous; run;
In Example 1, the first time you run the job with Prefetch enabled,
the facility creates 3 Teradata macros: UNIQUE_MAC1, UNIQUE_MAC2, and UNIQUE_MAC3.
In subsequent runs of the job, PreFetch extracts SQL statements from these
Teradata macros, enhancing the job performance across all three LIBNAMES referenced
by the job.
option debug='prefetch(pf_unique_sas,3)';
In Example 2, PreFetch selects the algorithm, that is, the order of
the SQL statements. (The option debug statement must the first statement in
your SAS job.)
option debug='prefetch(sas_pf_store,3,sequential)';
In Example 3, the user specifies for PreFetch to use the SEQUENTIAL algorithm. (The option debug statement must the first statement in your SAS job.)
Matching Teradata Data Types to More Efficient SAS Formats |
You can further optimize read operations, with or without use of the PreFetch facility, by matching the DBMS data types to more efficient SAS formats. Internally, SAS employs floating point numbers and fixed-length character strings. The corresponding Teradata data types are FLOAT and CHAR. When SAS/ACCESS issues SELECT statements on your behalf, it can optimize read processing if the SELECT statement requests FLOAT and CHAR columns from Teradata. In this case, SAS/ACCESS can access the data directly, reading it from the buffers that Teradata controls on the client (your PC, workstation or terminal). SAS/ACCESS then avoids having to move each column from Teradata's buffers into buffers maintained by SAS/ACCESS. Reading data from Teradata's client buffers improves performance. This is true especially when SAS/ACCESS reads a large amount of Teradata data.
In order to take advantage of the optimization just described, you may want to create new tables. For the new tables, where possible define FLOAT columns in place of DECIMAL, INTEGER, SMALLINT, and BYTEINT numeric Teradata columns. Similarly, where possible define CHAR columns in place of VARCHAR and LONG VARCHAR character Teradata columns. These data type changes can speed up processing. But, only when you read the data using SAS/ACCESS. Therefore, create a new table only if you intend to access the table primarily through SAS/ACCESS.
When creating a table, remember that changing other numeric types to FLOAT automatically increases the size of a Teradata table. Other numeric types require less physical storage than does the FLOAT data type. Therefore, before creating a table, consider the overall table design. Additionally, consider running benchmarks with SAS/ACCESS.
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.