Chapter Contents |
Previous |
Next |
SAS Companion for the OpenVMS Operating Environment |
In contrast to the V8 engine, the CONCUR engine does not support indexing and compression of observations. The CONCUR engine can only access files within a single machine or OpenVMS cluster; access to SAS data sets on other operating environments and concurrent read/write access to SAS data sets across DECnet are features that are provided by SAS/SHARE software. For more information about using SAS/SHARE software, refer to SAS/SHARE User's Guide. The CONCUR engine is optimized for random concurrent access, while the V8 engine is better suited to sequential access. So, for example, if you intend to use the FSEDIT procedure or the POINT= option in the SET statement to access your data randomly, the CONCUR engine may be the best choice for you, even if you do not need any of the concurrent access capabilities.
Version 8 of the SAS System introduces support for several new features related to data sets. The CONCUR engine supports many of these features: member names with lengths up to 32 characters; variable names with lengths up to 32 characters; and member or variable labels with lengths up to 256 characters. Note that while the CONCUR engine supports the creation and access of Version 6 format files, the long character strings are not allowed when accessing or creating a Version 6 concurrency engine file. For more information about support for these longer character strings, see SAS Language Reference: Concepts.
How to Select the CONCUR Engine |
There are three ways to select the CONCUR engine:
Engine:
field of the
New Library dialog box.
Default
as the type in the
Engine:
field of the New Library dialog box. SAS selects the CONCUR engine automatically.
The CONCUR engine creates and accesses SAS data sets in an acceptable format to allow record-locking and file-sharing.
If you have a SAS data set that you want to share after it is created, you can copy it, using the CONCUR engine as the output engine. Then it will be in the correct format for sharing. For example, if you want shared update access to a data set that was created using the V8 engine, you can use the following statements to convert it:
libname inlib v8 '[mydir.base]'; libname outlib concur '[mydir.share]'; proc copy in=inlib out=outlib; run;
After you run this SAS program, all SAS data sets that are created with the V8 engine in the data library that is referenced by INLIB are copied to the data library referenced by OUTLIB using the CONCUR engine. To create data sets using the CONCUR engine, your directory must have a version limit greater than 1.
Member Types Supported |
The CONCUR engine supports the Version 8 member type DATA.
Engine/Host Options for the CONCUR Engine |
Note: Data sets created with the CONCUR engine have
a maximum observation length of 32K.
You can use the following engine/host options with the CONCUR engine:
The ALQ= option (allocation quantity) corresponds to the FAB$L_ALQ field in OpenVMS RMS. For additional details, see the data set option ALQ= and Guide to OpenVMS File Applications.
When deciding on the bucket size to use, consider whether the file is usually accessed randomly (small bucket size), sequentially (large bucket size), or both (medium bucket size). The bucket size is a permanent attribute of the file, so this option applies to output files only.
The BKS= option (bucket size) corresponds to the FAB$B_BKS field in OpenVMS RMS or the FILE BUCKET_SIZE attribute when using File Definition Language (FDL). For additional details, see the data set option BKS= and Guide to OpenVMS File Applications.
If the value specified is 0, OpenVMS RMS uses the default value for the process. The DEQ= option defaults to the bucket size.
The DEQ= option (default file extension quantity) corresponds to the FAB$W_DEQ field in OpenVMS RMS. For additional details, see the data set option DEQ= and Guide to OpenVMS File Applications.
The following example shows how to create a file in Release 6.07 format:
libname clib concur '[]'; data clib.v607 (filefmt=607); . . . more SAS statements . . . run;
In the following example, the two data steps produce the same results:
data clib.vaxfile (hostfmt=vax); . . . more SAS statements . . . run;
data clib.vaxfile (outrep=vax_vms); . . . more SAS statements . . . run;
For more information about the OUTREP= data set option, see SAS Language Reference: Dictionary.
The MBF= option (multibuffer count) corresponds to the RAB$B_MBF field in OpenVMS RMS or the CONNECT MULTIBUFFER_COUNT attribute when using FDL. For additional details, see the data set option MBF= and Guide to OpenVMS File Applications.
Data Set Options Supported by the CONCUR Engine |
The CONCUR engine recognizes all data set options that are documented in SAS Language Reference: Dictionary except the FILECLOSE=, COMPRESS=, and REUSE options. Of special importance to the CONCUR engine is the portable data set option CNTLLEV=. (For details, see CNTLLEV=.) Other data set options that are likely to be useful include LOCKREAD= and LOCKWAIT=. (For details, see LOCKREAD= and LOCKWAIT=.) For more information, refer to SAS Language Reference: Dictionary.
The engine/host options that are discussed in Engine/Host Options for the CONCUR Engine can also be used as data set options when you use the CONCUR engine. For details, see Specifying Data Set Options.
System Option Values Used by the CONCUR Engine |
The CONCUR engine does not use the values of any SAS system options.
DECnet Access |
libname mylib concur 'mynode::bldgc:[testdata]';
Passwords |
Internals of a Concurrency Engine Data Set |
A concurrency engine data set is a relative format file. The record length is determined by the length of one observation, with a minimum length of 8 bytes. Because the data set is a relative format file, the maximum observation length of a concurrency engine data set is 32,767 bytes. The first portion of the file contains header records that provide information to the engine concerning the number of observations in the file, the number of variables, some positioning information to optimize access, the date and time, SAS System release, operating environment the data set was created on, and so forth.
Following the header information is information pertaining to each individual variable in the file. A NAMESTR is stored for each variable on the data set. The NAMESTR includes the variable name, type, label, and size. Multiple NAMESTRs are stored in a single record, up to the maximum number of NAMESTRs that the record length accommodates.
After the NAMESTRs, the observations begin. There is always one observation per record. With one exception, the record length is the observation length. If the observation length is less than 8 bytes, the record length defaults to 8. If you delete a record in a relative format file, the record still exists in the file, but it is marked as deleted.
Note: In a concurrency engine data set, a data set of
deleted observations takes the same amount of disk space as a data set of
valid observations. To remove the deleted observations, you must use the COPY
procedure and copy the data set to a new data set type, such as a data set
created with the V8 or V8TAPE engine.
Although all record-locking capabilities are provided through the use of OpenVMS RMS features, some file-sharing capabilities are provided by OpenVMS RMS and some are provided by the engine itself. The engine can correctly set the share options of a file when the file is opened for input or update, because the SAS System uses the name of the existing data set directly. However, output data sets are created with a temporary name and then renamed to the actual data set name after the data set is closed. This ensures the integrity of existing data sets of the same name in case an error occurs during creation of the new data set. Therefore, the engine must handle all file-sharing issues that disallow sharing of output files. This is done through the locking of specific filenames, which is why your directory must have a version limit of at least 2 to create concurrency engine data sets.
Optimizing the Performance of the CONCUR Engine |
Depending on the type of record access your SAS application performs, you need to consider both the size of buffers (bucket size) and the number of buffers (multibuffer count). For complete details about specifying the size and number of buffers, see the BKS= and MBF= data set options in BKS= and MBF=.
The two extremes of record access are records that are accessed completely sequentially or completely randomly. For example, many SAS procedures typically access data sets sequentially, processing the records from first to last. On the other hand, you may access observations in a completely random order when using the FSEDIT procedure to edit or browse observations in a data set.
There are also cases in which records are accessed randomly but may be reaccessed frequently. One example is an application that uses a data set in which particular observations contain information that is referred to frequently. Again, using the FSEDIT procedure as an example, the data set can be designed in such a way that you must access the first observation followed by observation 200, then the first observation again followed by observation 300, and so on.
Finally, there are cases in which records are accessed randomly, but then adjacent records are likely to be accessed. An application can use the POINT= option in a SET statement to selectively input the first 10 observations out of every 100 observations.
Most often, an application accesses a data set by a combination of several of these methods. The following list gives suggestions for the number of buffers and bucket size you should use for each method:
If your program accesses the data set by several methods,
you must find a compromise between the number of buffers and bucket sizes.
This is what the SAS System attempts to do with the defaults, because the
intended use of the file is unknown. Because you know the intended use of
your CONCUR engine data sets, you can improve the CONCUR engine's performance
by optimizing the buffer settings.
The CNTLLEV= data set option takes one of two values:
MEM | specifies that the application requires exclusive access to the data set. Member-level control restricts any other application from accessing the data set until the step has completed. |
REC | specifies that concurrent access is allowed and OpenVMS RMS record-level locking is enabled. This option entails more processing overhead and should be used only when necessary. |
Each SAS procedure specifies a required control level to the engine, depending on the intended access of the observations. If you use CNTLLEV=REC and the SAS procedure requires member-level control to ensure the integrity of the data during processing, a warning is written to the SAS log indicating that inaccurate or unpredictable results can occur if the data set is being updated by another process during the analysis.
A common example of improving performance by overruling the CNTLLEV default of the procedure is with the FSEDIT procedure, which uses a default of CNTLLEV=REC. A session using the FSEDIT procedure with a concurrency engine data set does not need to incur the overhead of record-level locking if concurrent access is not required. By using the data set option CNTLLEV=MEM, the application tells the engine to override the control level specification of the procedure because exclusive access at the member level is desired. This disables record-level locking, decreases the overhead for processing the data set, and improves performance. In tests using the SET statement to input a concurrency engine data set, using the CNTLLEV=MEM option caused the step to run in one-third the CPU time as the same step using the CNTLLEV=REC option.
For syntax and usage examples for the CNTLLEV= data set option, see CNTLLEV= and SAS Language Reference: Dictionary.
The value of the FIRSTOBS= data set option specifies the first observation that should be included for processing in the SAS DATA step. Some engines have to read the records sequentially, discarding them until the requested observation is reached. Because a concurrency engine data set is a relative format file, the engine can directly access the beginning observation without having to first read any other observations in the file.
Using the OBS= data set option to specify the last observation that you want to process can improve performance by terminating the input of observations without having to read records until the end-of-file character is reached.
For more information about the FIRSTOBS= and OBS= data set options, see SAS Language Reference: Dictionary.
However, there is a file format for both uncompressed
and compressed data sets that makes the V8 engine disk space usage more efficient.
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.