Chapter Contents

Previous

Next
SAS/ACCESS Software for Relational Databases: Reference

Selecting and Combining Data By Using the SQL Procedure


Joining Data from Various Sources

The SQL procedure provides another way to select and combine data. For example, suppose you have three data sets: two view descriptors, VLIB.CUSPHON and VLIB.CUSORDR, which are based on the ORACLE tables CUSTOMERS and ORDERS, respectively, and a SAS data file, DLIB.OUTOFSTK, which contains product names and numbers that are out of stock.

Note:   See the appendix for a description of DLIB.OUTOFSTK.  [cautionend]
You can use the SQL procedure to create a view that joins the data from these three sources and displays their output. The SAS WHERE or subsetting IF statements would not be appropriate in this case because you want to compare variables from several sources, rather than simply merging or concatenating the data.

The following SAS statements select and combine data from the view descriptors and the SAS data file to create a PROC SQL view, SLIB.BADORDR. SLIB.BADORDR retrieves customer and product information that the sales department uses to notify customers of unavailable products.

Note:   Although this example shows you an alternate way to join data by using access descriptors, SAS/ACCESS now provides more efficient ways to join your data by using the LIBNAME statement.  [cautionend]

proc access dbms=oracle;

/* create access descriptor  */

   create adlib.customr.access;
   user=scott;  orapw=tiger;
   path='myorapath';
   table=customers;
   list all;

/* create vlib.cusphon view */

   create vlib.cusphon.view;
   select customer phone name;
   rename customer = custnum;
run;

/* create access descriptor  */

proc access dbms=oracle;
   create adlib.orders.access;
   user=scott;  orapw=tiger;
   path='myorapath';
   table=orders;
   list all;

/* create vlib.cusordr view  */

   create vlib.cusordr.view;
   select ordernum stocknum shipto;
   rename ordernum ordnum;
   format ordernum 5.0
          stocknum 4.0;
run;

proc sql;
   create view slib.badordr as
      select distinct cusphon.custnum, 
        cusphon.name, cusphon.phone,
        cusordr.stocknum, outofstk.fibernam
             as product
        from vlib.cusphon, vlib.cusordr, 
             dlib.outofstk
        where cusordr.stocknum=outofstk.fibernum 
             and cusphon.custnum=cusordr.shipto;
quit;

The CREATE VIEW statement incorporates a WHERE clause as part of its SELECT clause. The DISTINCT keyword eliminates any duplicate rows of customer numbers that occur when companies order an unavailable product more than once.

Note:   It is recommended that you not include an ORDER BY clause in a CREATE VIEW statement. This causes the output data to be sorted every time the PROC SQL view is submitted and might have a negative impact on performance. It is more efficient to add an ORDER BY clause to a SELECT statement that displays your output data, as shown below.  [cautionend]

options linesize=120;
title 'Data Described by SLIB.BADORDR';

select * from slib.badordr
   order by custnum, product;

This SELECT statement uses the PROC SQL view SLIB.BADORDR to display joined ORACLE and SAS data in ascending order by the CUSTNUM column and then by the PRODUCT (that is, FIBERNAM) column. The data is ordered by PRODUCT because one customer might have ordered more than one product. To select all the columns from the view, use an asterisk (*) in place of column names. When an asterisk is used, the columns are displayed in the order specified in the SLIB.BADORDR view. Data Described by the PROC SQL View SLIB.BADORDR shows the data described by the SLIB.BADORDR view.

Data Described by the PROC SQL View SLIB.BADORDR
                                            Data Described by SLIB.BADORDER

        CUSTOMER  NAME                                                          PHONE         STOCKNUM  PRODUCT
        --------------------------------------------------------------------------------------------------------
        15432147  GREAT LAKES LABORATORY EQUIPMENT MANUFACTURERS                616/582-3906      4789  dacron
        18543489  LONE STAR STATE RESEARCH SUPPLIERS                            512/478-0788      8934  gold
        29834248  BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY                  (0552)715311      3478  olefin
        31548901  NATIONAL COUNCIL FOR MATERIALS RESEARCH                       406/422-3413      8934  gold
        43459747  RESEARCH OUTFITTERS                                           03/734-5111       8934  gold

Although the query uses SAS variable names like CUSTNUM, you may notice that the output uses DBMS column names like CUSTOMER. By default, PROC SQL displays SAS variable labels, which default to DBMS column names. (You can use the NOLABEL option to change this default.)


Creating New Columns and Using the GROUP BY Clause

Instead of creating a new PROC SQL view, you might want to summarize your data and create new columns in a report. Although you cannot use the ACCESS procedure to create new columns, you can easily do this by using the SQL procedure with data that is described by a view descriptor.

This example uses the SQL procedure to retrieve and manipulate data from the view descriptor VLIB.ALLEMP, which is based on the DB2 table SASDEMO.EMPLOYEES. When this query (as a SELECT statement is often called) is submitted, it calculates and displays the average salary for each department. The query enables you to manipulate your data and display the results without creating a SAS data set.

Because this example reports on employees' salaries, the view descriptor VLIB.ALLEMP is assigned a SAS System password (MONEY) using the DATASETS procedure. Because of the READ= level of protection, the password must be specified in the PROC SQL SELECT statement before you can see the DB2 data accessed by VLIB.ALLEMP.

In the following example, the DISTINCT keyword in the SELECT statement removes duplicate rows. The AVG function in the SQL procedure is equivalent to the SAS MEAN function.

options linesize=80;

proc access dbms=db2;
/* create access descriptor  */
   create adlib.employ.access;
   ssid=db2;
   table=sasdemo.employees;
   assign=yes;
   format empid     6.0
          salary    dollar12.2
          jobcode   5.0
          birthdate date9.
          hiredate  date9.;
   list all;
run;

/* create vlib.allemp view */
proc access dbms=db2
            accdesc=adlib.employ;
   create vlib.allemp.view;
   select all;
run;

/* assign a password  */
proc datasets library=vlib  memtype=view;
   modify allemp (read=money);
run;

/* example            */
title 'Average Salary Per ACC Department';
proc sql;
   select distinct dept,
          avg(salary) label='Average Salary'
          format=dollar12.2
      from vlib.allemp(pw=money)
      where dept like 'ACC%'
      group by dept;

The columns are displayed in the order specified in the SELECT clause of the query. Data Retrieved by an SQL Procedure Query shows the result of the query.

Data Retrieved by an SQL Procedure Query
                  Average Salary Per ACC Department                      

                                      Average
                        DEPT           Salary
                        --------------------
                        ACC013    $54,591.33
                        ACC024    $55,370.55
                        ACC043    $75,000.34

To delete a password on an access or view descriptor or any SAS data set, put a slash after the password:

/* delete the password  */
proc datasets library=vlib  memtype=view;
   modify allemp (read=money/);
run;


Using Advanced PROC SQL Features

This example combines a number of PROC SQL components including summary functions, a HAVING clause with a subquery, and an inline view within that subquery. The reason you use each component and the way in which each is evaluated are described following the example. The example displays the employees who took the most orders that have already shipped. It uses the VLIB.ALLEMP and VLIB.ALLORDR view descriptors to join data from the tables SASDEMO.EMPLOYEES and SASDEMO.ORDERS.

options ls=80;

proc access dbms=db2;
/* create access descriptor  */
   create adlib.employ.access;
   database=sample;
   table=sasdemo.employees;
   assign=yes;
   format empid     6.0
          salary    dollar12.2
          jobcode   5.0
          birthdate date9.
          hiredate  date9.;

/* create vlib.allemp view */
   create vlib.allemp.view;
   select all;
   list view;
run;

proc access dbms=db2;
/* create access descriptor    */
   create adlib.order.access;
   database=sample;
   table=sasdemo.orders;
   assign=yes;
   rename dateorderd = dateord
          processdby = procesby;
   format dateorderd datetime9.
          shipped datetime9.
          ordernum      5.0
          length        4.0
          stocknum      4.0
          takenby       6.0
          processdby    6.0
          fabcharges    12.2;

/* create vlib.allordr view   */
   create vlib.allordr.view;
   select all;
   list view;
run;
proc sql; 
title 'Employees Who Took the Most Orders That
Shipped';
select distinct lastname label "Took Orders",
takenby, count(shipped) as ordship
   from vlib.allemp, vlib.allordr
   where takenby=empid
   group by takenby
   having ordship=
      (select max(ordship)
          from (select distinct takenby,
                   count(shipped) as ordship
                   from vlib.allemp, vlib.allordr
                   where takenby=empid
                   group by takenby));
quit;
You begin to evaluate the query at its innermost level. Here the query begins by evaluating the inline view:
          from (select distinct takenby,
                   count(shipped) as ordship
                   from vlib.allemp, vlib.allordr
                   where takenby=empid
                   group by takenby));

The inline view lists employees who have taken orders and counts the number of orders that have been shipped. A column alias ORDSHIP is assigned to the number of orders and is used elsewhere in the query.

   TAKENBY   ORDSHIP
   -----------------
   119012         6
   212916         0
   234967         0
   321783         6
   456910         5

The SELECT MAX(ORDSHIP) clause uses the results of the inline view to determine the highest number of orders taken and shipped, 6. When this amount is supplied to the outer query, it evaluates as if the query were written:

proc sql;
select distinct lastname label "Took Orders",
   takenby, count(shipped) as ordship
   from vlib.allemp, vlib.allordr
   where takenby=empid
   group by takenby
   having ordship=6;

The first part of the outer query adds the names of the employees who took the orders and joins the data from the two view descriptors with matching EMPLOYEE numbers ( where takenby=empid). The COUNT function computes the number of shipped orders that were taken by each employee to reduce the number of rows to one per employee. The HAVING expression then selects rows if the number of orders is 6. Data on Orders Shipped shows the results of the PROC SQL query.

Data on Orders Shipped
        Employees Who Took the Most Orders That Shipped               

             Took Orders         TAKENBY   ORDSHIP
             -------------------------------------
             GONZALES             321783         6
             WOLF-PROVENZA        119012         6


Accessing Tables That Are Located on Different Nodes or Databases

In a networking environment, you can often access data from DBMS tables that is stored on different machines or in different databases. When using the SAS/ACCESS Interface to SYBASE, for example, you use the SERVER= and DATABASE= statements to specify the locations of the tables that you want to access. Use TABLE= to specify the names of the SYBASE tables.

In the following example, you create access descriptors and view descriptors for SYBASE tables that have different owners and are stored in databases that reside on different machines. The USER= and PASSWORD= statements identify the owners of the EMPLOYEES and INVOICE tables and their passwords.

After creating the descriptors, you use the SQL procedure to join the tables' data. Because the database identification information is stored permanently in each descriptor, PROC SQL can use the view descriptors to access and join the remote SYBASE data:

/* create access descriptor  */
proc access dbms=sybase;
   create work.employ.access;
   server=server1;
   database=personnel;
   user=carmen;
   password=aria;
   table=employees;

/* create vlib.emp_acc view  */
   create vlib.emp_acc.view;
   select all;
   format empid 6.0
          salary dollar12.2
          jobcode 5.0
          hiredate date9.
          birthdate date9.;
   subset where DEPT like 'ACC%';
   list all;
run;

/* create access descriptor  */
proc access dbms=sybase;
   create work.invoice.access;
   server=server2;
   database=inventory;
   user=joachim;
   password=machauf;
   table=invoice;

/* create vlib.sainv view   */
   create vlib.sainv.view;
   select all;
   rename invoicenum invnum
          amtbilled  amtbilld
          amountinus amtinus;
   format invoicenum 5.
          billedby   6.
          amtbilled  15.2
          amountinus 15.2
          billedon date9.
          paidon   date9.;
   subset where COUNTRY in ('Argentina','Brazil');
   list all;
run;
SYBASE objects, such as table names and columns, are case sensitive. The WHERE clauses in PROC ACCESS SUBSET statements are passed to SYBASE exactly as you type them, so you must use the correct case for SYBASE column names. The database identification statements and column names in all other statements are converted to uppercase unless they are enclosed in quotes.
options linesize=120;
title 'South American Invoices and Who Submitted 
       Them';

proc sql;
   select invnum, country, billedon,
          paidon, billedby, lastname, firstnam
      from vlib.emp_acc, vlib.sainv
      where emp_acc.empid=sainv.billedby;
quit;

Data Joined from Tables in Different Databases shows the results of the PROC SQL query.

Data Joined from Tables in Different Databases
                                     South American Invoices and Who Submitted Them

           INVOICENUM  COUNTRY               BILLEDON    PAIDON     BILLEDBY  LASTNAME            FIRSTNAME
           --------------------------------------------------------------------------------------------------
                12476  Argentina              24DEC1998        .      135673  HEMESLY             STEPHANIE
                11270  Brazil                 05OCT1998  18OCT1998    239185  DOS REMEDIOS        LEONARD
                11285  Argentina              10OCT1998  30NOV1998    239185  DOS REMEDIOS        LEONARD
                11280  Brazil                 07OCT1998  20OCT1998    423286  MIFUNE              YUKIO
                12051  Brazil                 02NOV1998        .      457232  LOVELL              WILLIAM
                12471  Brazil                 27DEC1998        .      457232  LOVELL              WILLIAM


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.