Chapter Contents |
Previous |
Next |
Doing More with SAS/ASSIST Software |
You often need data that are stored in separate tables. For example, you may want to produce a report that gets information about flight delays from one table and boarding capacity from another. Information from separate tables can be combined by matching values in columns that relate the tables. This is called joining tables. You can join a maximum of 16 tables in one query. Note that some limitations apply when you join DB2 tables. For more information, see Additional Considerations for Joining DB2 Tables.
Introducing Table Joins |
To avoid a Cartesian product, you must specify how the tables should be combined. Typically, you want to pair rows based on matching values in one or more key columns of each table. The following example uses the tables in the following figure to illustrate a join.
Joining Two Tables
Pair the SALARY information in Table B with the NAME of the person in Table A by matching the values in the ID columns. Therefore, your join criterion is A.ID=B.ID. In SQL, the code is:
select name, salary from a, b where a.id = b.id;
Output from Joined Tables
You can also join tables that have missing values. For example, ID 3 is missing from Table B. A left join outputs all the names in Table A (the left table) and matches them to the available salary information in Table B. If a row in Table B is missing, missing values are supplied in the joined output. In this case, Michael from Table A is output with a missing value for SALARY next to his name. The following figure shows the results of this query.
Output from Left Joined Tables
A right join works similarly. All the rows from Table B (the right table) are output, and any missing rows from table A are output as missing values (or blanks, if the values are character data). The following figure shows the results of this query.
Output from Right Joined Tables
If you join two tables that both have missing rows, you can combine the effect of a left and a right join by using a full join. The following figure shows the results of this query.
Output from Fully Joined Tables
Inner joins can be performed on up to 16 tables at a time, whereas outer joins (left, right, full) work on only two tables at a time.
Joining Tables Manually |
If tables selected for joining have a
relation defined between
them, Query and Reporting joins them automatically; see Joining Tables Automatically for more information. If
the tables do not have a relation defined between them, or if they have more
than one such relation defined, you have to join the tables manually. The
following sections show examples of how to join tables manually in both cases,
as well as an example of performing outer joins, which must be done manually.
Follow these instructions to join two tables manually:
The Incomplete Inner join message appears under Join. The warning Not all tables are joined appears because the tables do not contain a relation defined on them. Therefore, Query and Reporting has not automatically joined the tables. The following display shows the Query window with the two tables to be joined.
Note: To
clear previously-defined subsets or expressions, select subset or Expression. Select Clear from the
Edit menu, then select Close from the File menu.
Query Window with Tables for Manual Join
Note: If you are joining more than five tables, not all of the table names appear
in the Join window; overwrite the first number
in the Tables field to see more tables.
The following display shows the Join Manager window with AIRLINE.JOBCODES and AIRLINE.PAYROLL before they are joined. The Group field indicates whether groups of tables are joined. 1,2 indicates that this group is not joined.
Join Manager Window With Two Tables To Be Joined
1
next to A.JOBCODE and
1
next to B.JOBCODE
and press ENTER. Note that the column names for the two tables are the same
in this example. The column names do not have to be the same, but their data
types and values must match. You can select Sort by
from the Edit menu to reorder columns in the tables
by their names, join number, index, or matching names.
After you press ENTER and the tables are joined, the highlighting of the table names is removed, the Group field changes to 1,1 (indicating one group), and a message appears saying All tables are joined. The following display shows the Join Manager window after the tables have been joined.
Join Manager Window With Two Joined Tables
The table names are displayed at the top of the Join Manager window. AIRLINE.SCHEDULE is highlighted because it is not yet joined with another table. AIRLINE.SCHEDULE has no table alias under the JOINS/relations field for the same reason. This table is also listed as a separate group in the Group field.
The following display shows the Join Manager window before the third table AIRLINE.SCHEDULE is joined with AIRLINE.JOBCODES and AIRLINE.PAYROLL.
Join Manager Window With Third Table Before Joining
B
in the first Alias field and the alias
C
in the second Alias field. When you press ENTER, the tables' columns are
listed. Type
1
next to B.IDNUM and C.IDNUM, respectively, and press ENTER.
When the three tables are joined, the highlighting is removed, the Group field changes to 1,1,1 (indicating one group), and a message appears saying All tables are joined, as shown in the following display.
Join Manager Table With Three Joined Tables
When you join tables, Query and Reporting uses an inner join by default. However, you can choose another type of join if you are joining only two tables. In DB2, outer joins are allowed on only two tables. Note that there may be other restrictions under DB2; for further information, see Additional Considerations for Joining DB2 Tables. Outer joins must be performed manually.
1
in the A.IDNUM column and
1
in the B.SUPID column and press ENTER. (These steps are described earlier
in Joining Tables Without Defined Relations.)
Show SQL Window for Right Join
Exit the Show SQL window.
Results of Right Join
Defining Relations on Tables |
You define a relation on tables based on one or more columns that tables contain. For example, the SAS AIRLINE.PAYROLL table has a column named IDNUM that has the same attributes as the IDNUM column in AIRLINE.STAFF. Therefore, you would define and name a relation for IDNUM on the two tables.
Defining a relation between or among tables enables you to build intelligence into your query creation process. When one relation is defined on tables that you join in a query, Query and Reporting automatically generates a WHERE or ON clause for that query. This saves you from defining a WHERE predicate for each join query. (See Joining Tables Automatically for more information.) You can, of course, still select Subset from the Query window to add other predicates to your WHERE clause; see Retrieving a Subset of Data for more information.
If no relations are defined or if more than one relation is defined for the tables in the join, you must join the tables manually, as described in Joining Tables Manually and Joining Tables with Multiple Relations.
The examples in this section are used to demonstrate joining tables automatically and joining tables with multiple relations later in this chapter.
1
next to the A.FLIGHT and B.FLIGHT columns and press ENTER. A message indicates that the
tables are joined.
View | Relations | Current Tables |
Relations for Current Tables Window
MARFLT
. A relation
name can be up to eight characters long.
March - Flight Info
Relation
. A description can be up to 32 characters long.
Repeat steps 2 through 9 above to define a relation
on the AIRLINE.DEST and AIRLINE.FLINFO tables. Define your relation on the
A.DEST and B.DEST columns. Name this relation
FLDEST
and give it a description of
Destination Name Join
. Then, repeat steps 5 through 9 to define a second
relation on the AIRLINE.DEST and AIRLINE.FLINFO tables. By typing
2
(instead of
1
) next to the columns, define this relation on the A.DEST and B.ORIG
columns. Name this relation
FLORIG
and give it a description of
Origin Name Join
.
To modify a relation, follow these instructions:
View | Relations | Current Tables |
?
in the Relation name
field and select the relation from the resulting list.
To delete a relation, follow these instuctions:
View | Relations | Current Tables |
?
in the Relation name
field and select the relation from the resulting list.
Joining Tables Automatically |
Query and Reporting automatically joins tables that have exactly one relation defined on them. The default action is to create an inner join. (See Joining Tables Manually for information on joining tables without a defined relation and for performing outer joins.)
In this section you join the AIRLINE.MARCH and AIRLINE.FLINFO tables on which you defined a relation in the previous section.
I
next to the table name in the Select window. The Table Information window
appears, as shown in the following display.
Table Information Window
At this point, you can customize your query, run it, and save it as described in Saving a Query. Alternatively, you can save the query's output to a table, as described in Creating New Tables. If you intend to access the joined data frequently, it is often more efficient to save the output to a SAS table and then specify the new SAS table in your SAS code.
Joining Tables with Multiple Relations |
Tables with multiple relations must be joined manually because you have to decide which columns to use to join the tables. This example uses the relations you defined on the AIRLINE.DEST and AIRLINE.FLINFO tables in Defining Relations on Tables.
Join Manager Window With Multiple Relations
Join Manager Window With Multiple Relations After Joining
Return to the Query window to customize, save, or run your query.
Additional Considerations for Joining DB2 Tables |
If no primary key/foreign key relation is defined or if more than one relation exists between the contributing tables, you must specify manually which columns are used to join the tables (see Joining Tables Manually).
Versions 4 and later of DB2 includes outer join operators. Therefore, when you use the Query window and specify a left, right, or full join, Query and Reporting generates the WHERE or ON clause automatically.
DB2 versions prior to Version 4 lack outer join operators. You can still specify an outer join type in Query and Reporting. If relations are defined on the tables, a WHERE or ON clause is automatically generated. If no relations are defined, you have to join the tables manually. When you review your SQL code (by selecting Show SQL from the View menu), you see that DB2 SQL has used set operators (such as UNION) to produce an outer join result, instead of using outer join operators.
Because earlier versions of DB2 SQL do not have outer join operators, certain limitations apply to outer joins in Query and Reporting:
The following table shows the default values in outer joins.
Data Type | Default Value |
---|---|
Character data | blank (' ') |
Numeric Data | 0 |
Date | January 1, 0001 |
Time | Midnight |
Timestamp | January 1, 0001, midnight |
If referential constraints have been established, the Select window uses
them to generate automatic joins between
tables. In the Query, Select,
or Report Engine windows, you can enter the
I
command next to a table or column name to open the Select Table information or Column information window. These windows display information on
referential constraints
for your convenience.
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.