Using Spatial Data with SAS/GIS Software |
SAS/GIS software
uses two basic types of data:
-
Spatial Data
- Contain the coordinates and identifying
information describing the map features such as streets, rivers, and railroads.
-
Attribute
Data
- Are the information that you want to use
for analysis or presentation. This information must be spatial in nature.
Sales figures for each of your store locations, population data for each county,
and total income for each household in a region are examples of information
that are spatial in nature because the information applies to a specific geographic
feature.
For example, the U.S. Census Bureau distributes both
types of data:
- TIGER Line files
- Contain spatial information that you can
use to build maps.
- Summary Tape files
- Contain population and other demographic
information that you can link to the map features.
Attribute data provide the information that you want
to analyze, and spatial data provide the context in which you want to analyze
it. For example, consider the SAS/GIS map
shown in Spatial and Attribute Data in SAS/GIS Maps.
Spatial data provide the boundaries for the map areas, and attribute data
provide the population information that is used to color the map areas.
Spatial and Attribute Data in SAS/GIS Maps
Spatial
data contain the coordinates and identifying information that is necessary
to draw maps. For SAS/GIS software,
spatial data are stored in SAS/GIS spatial
databases, which consist of collections of SAS data sets and SAS catalog entries.
The primary method for creating a SAS/GIS spatial database is through the
SAS/GIS Import facility, either in batch or in interactive mode. You can also
use the GIS procedure to create, modify, and manage the catalog entries in
a spatial
database.
Features in the
spatial data are organized into layers. A layer is a collection
of all the features in the map that share some common characteristic. The
various physical aspects of the map--political boundaries, roads, railroads,
waterways, and so forth--are assigned to layers according to their common
spatial data values. Some features can appear in multiple layers. For example,
a street can also be a ZIP code boundary and a city boundary line. The street
could appear in three layers: one containing the streets, one containing the
ZIP code boundaries, and one containing the city boundaries.
Three types of layers
can be represented in SAS/GIS maps:
points, lines, and areas. For example, the collection of all the points in
the map that represent park locations can be organized into a point layer
for parks, the collection of all the lines in the map that represent streets
can be organized into a line layer for streets, and the collection of all
the areas that represents census tracts can be organized into an area layer
for tracts. When the various layers are overlaid, they form a map, as shown
in Layers Forming a SAS/GIS Map.
Layers Forming a SAS/GIS Map
A layer can be displayed as either
static or thematic. When a layer is displayed as static, it uses the same
graphical characteristics (color, line, width, and so forth) for all features
in that layer. For example, a street layer could use the same color and line
style to display all the streets. When a layer is displayed as thematic, it
uses different graphical characteristics to classify the features in that
layer. For example, a theme representing sales regions could use different
colors to show the quarterly sales performance of each region. A theme in
a layer representing highways could use different line widths to show the
classes of roads. A layer can have multiple themes stored in it, and you can
easily change which theme is currently
displayed.
In
SAS/GIS software,
maps display only the portion of the spatial data that falls within a given
coverage. A coverage defines a subset of the spatial data that
is available to a map. The coverage can include all the spatial data in the
database, or only selected portions. For example, a spatial database may contain
geographic data for an entire country, but a coverage may restrict the portion
that is available for a given map to only one region. You can define more
than one coverage for each spatial database, although a map uses only one
coverage at a time.
Most operations
in SAS/GIS software use composites
of spatial data variables rather than the actual spatial data variables themselves. Composites identify the relationships and purpose of the variables
in the spatial data.
For example, if the spatial data have variables STATEL
and STATER that contain the state ID codes for the left and right sides of
each feature, then the spatial database could define a composite named STATE
that identifies the relationship between these variables and specifies that
they delineate state areas in the map. You would use the STATE composite,
rather than the actual STATEL and STATER variables, to link state areas in
the map to attribute data for the corresponding state.
See Details of SAS/GIS Spatial Databases
for more information about the structure of SAS/GIS spatial
databases.
The
second type of data that are used in a GIS is attribute data. In SAS/GIS software,
your attribute data must be stored in either a SAS data set or a SAS view.
SAS views allow you to transparently access data in other formats. For example,
you can create a SAS/ACCESS view to
access data in a database such as DB2 using the SAS/ACCESS to
DB2 software. A DATA step view or an SQL view also allows you to access an
external file, or any other type of data for which you can create a SAS view.
Once your data are accessible either as a SAS data set or through a SAS view,
they can be linked to your spatial data for use in labeling, analysis through
an action, or theming. For instance, your spatial data might represent a county
and contain information for city boundaries, census tract boundaries, streets,
and so forth. An attribute data set with population information for each census
tract can be linked to a map using the corresponding tract composite in the
spatial data.
Some of the ways in which you can use attribute data
in SAS/GIS software include the following:
- Using values in your attribute data as labels.
For example, you could use attribute data containing population data to provide
the text of labels for each of the census tracts.
- Using the values in your attribute data as themes
for layers. For example, you could use attribute data containing average household
income data as a theme for the census tract layer.
See Chapter 5, "Customizing Maps," in SAS/GIS Software:
Usage and Reference, Version 6 for more information about assigning
themes to map layers.
- Defining actions that display or manipulate the
attribute data when features are selected in the map. This way, you can explore
your attribute data interactively rather than simply view static results.
The actions can range from simple, such as displaying observations from an
attribute data set that relate to features in the map, to complex, such as
submitting procedures from SAS/STAT software
to perform statistical analyses.
Actions can be defined to do the following:
- Display observations
from the attribute data sets
that relate to the selected map features.
- Open additional maps that relate to selected map
features.
- Display images that relate to the selected map
features.
- Interactively subset the attribute data sets according
to the subset of selected map features.
- Submit SAS programs.
- Issue SAS
commands.
- Issue host commands.
- Display and edit information for the selected
map features.
- Organize area features into groups that are based
on the attribute data.
See Chapter 4, "Performing Actions for Selected
Map Features" in SAS/GIS Software:
Usage and Reference, Version 6 for more information on defining and
performing actions.
|
Designing a SAS/GIS Spatial Database |
One of the first steps in
a SAS/GIS project
is determining the design of your SAS/GIS spatial
database. The database should include all of the spatial data that the user
wants to see and all of the associated attribute data that the user needs
to use for analysis or presentation purposes.
Although your first tendency with a new software product
may be to begin using it immediately after you install it, take some time
to draw up an overview of the system goals and data requirements that you
will need for your database before you begin creating it. Do not rush into
building your database. The time you spend designing your database initially
will save you time and expenses later in the project. Remember, a well-designed
database is easier to maintain and document, and you can extend it for future
GIS projects.
Use the following guidelines when determining the information
you want to include in a database:
- Identify the initial objective of the project
and its ultimate goal. Consider any requirements that may have been imposed
on it. Determine their feasibility for initial implementation and, as best
as possible, the impact of any future demands upon them.
- Identify the attribute data that are necessary
to illustrate the project objectives. Determine if you have these data or
can obtain them.
- Identify the spatial features that you want on
your map, for example, states, cities, rivers, roads, railroads, airports,
and so forth.
Once you have determined a preliminary list of the data
that you will need, use these additional factors to help evaluate and refine
your list:
- To utilize attribute data for map actions, themes,
or labeling, the attribute data set must contain the same identification information
as the spatial feature that it describes so that you can link between them.
For example, if one of the items on your attribute data list is the Sales
Revenue for each store, along with the Store ID Number, you probably want
to include the actual location in longitude and latitude for each Store ID
Number on your spatial data list. You can then place a marker at the store
location and also visualize and analyze the corresponding attribute data for
each store.
- Do not use more detail than you need. If your
store locations request the customer ZIP code at the cash register, don't
assume that you need ZIP code boundaries on your map. ZIP code boundaries
may be far too small for your purposes if you have stores nationwide. You
may decide instead that the 3-digit ZIP code boundaries provide fewer, yet
more appropriately sized, areas for your analysis. You can summarize your
attribute data to the 3-digit ZIP code level and use it for your analysis,
reducing both the amount of spatial data and attribute data that you need.
As long as it is appropriate for your analysis, decreasing the amount of required
spatial and attribute data reduces storage space and improves performance.
Reducing the level of detail in the spatial data also saves money if you have
to purchase the data.
- If you plan to summarize your attribute data to
a matching level of your spatial data, make sure the two types of data have
a common level that you can use. For example, ZIP code boundaries can cross
not only county boundaries, but also state boundaries, so there is usually
not a one-to-one correspondence between ZIP codes and states or counties.
If the only information that ties your attribute data to your spatial data
is ZIP codes, you will have difficulties using your ZIP code level attribute
data if you include only state or county boundaries in your spatial data.
For specific, smaller areas of the country, a one-to-one
correspondence may exist that will allow you to summarize your attribute data
to a higher level. However, ZIP codes can change frequently, and this correspondence
may be lost. Also, because ZIP codes change, you must be able to account for
these changes when performing historical analyses. For example, if you are
comparing sales in a specific ZIP code area over a ten-year period, make sure
that the area remained constant during that period.
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.