Using Spatial Data with SAS/GIS Software

Data in SAS/GIS Applications

SAS/GIS software uses two basic types of data:

Spatial Data: Contain the coordinates and identifying information describing the map features such as streets, rivers, and railroads.
Attribute Data: Are the information that you want to use for analysis or presentation. This information must be spatial in nature. Sales figures for each of your store locations, population data for each county, and total income for each household in a region are examples of information that are spatial in nature because the information applies to a specific geographic feature.

For example, the U.S. Census Bureau distributes both types of data:

TIGER Line files: Contain spatial information that you can use to build maps.
Summary Tape files: Contain population and other demographic information that you can link to the map features.

Attribute data provide the information that you want to analyze, and spatial data provide the context in which you want to analyze it. For example, consider the SAS/GIS map shown in Spatial and Attribute Data in SAS/GIS Maps. Spatial data provide the boundaries for the map areas, and attribute data provide the population information that is used to color the map areas.

Spatial and Attribute Data in SAS/GIS Maps

[IMAGE]

Spatial Data

Spatial data contain the coordinates and identifying information that is necessary to draw maps. For SAS/GIS software, spatial data are stored in SAS/GIS spatial databases, which consist of collections of SAS data sets and SAS catalog entries. The primary method for creating a SAS/GIS spatial database is through the SAS/GIS Import facility, either in batch or in interactive mode. You can also use the GIS procedure to create, modify, and manage the catalog entries in a spatial database.

Features in the spatial data are organized into layers. A layer is a collection of all the features in the map that share some common characteristic. The various physical aspects of the map--political boundaries, roads, railroads, waterways, and so forth--are assigned to layers according to their common spatial data values. Some features can appear in multiple layers. For example, a street can also be a ZIP code boundary and a city boundary line. The street could appear in three layers: one containing the streets, one containing the ZIP code boundaries, and one containing the city boundaries.

Three types of layers can be represented in SAS/GIS maps: points, lines, and areas. For example, the collection of all the points in the map that represent park locations can be organized into a point layer for parks, the collection of all the lines in the map that represent streets can be organized into a line layer for streets, and the collection of all the areas that represents census tracts can be organized into an area layer for tracts. When the various layers are overlaid, they form a map, as shown in Layers Forming a SAS/GIS Map.

Layers Forming a SAS/GIS Map

[IMAGE]

A layer can be displayed as either static or thematic. When a layer is displayed as static, it uses the same graphical characteristics (color, line, width, and so forth) for all features in that layer. For example, a street layer could use the same color and line style to display all the streets. When a layer is displayed as thematic, it uses different graphical characteristics to classify the features in that layer. For example, a theme representing sales regions could use different colors to show the quarterly sales performance of each region. A theme in a layer representing highways could use different line widths to show the classes of roads. A layer can have multiple themes stored in it, and you can easily change which theme is currently displayed.

In SAS/GIS software, maps display only the portion of the spatial data that falls within a given coverage. A coverage defines a subset of the spatial data that is available to a map. The coverage can include all the spatial data in the database, or only selected portions. For example, a spatial database may contain geographic data for an entire country, but a coverage may restrict the portion that is available for a given map to only one region. You can define more than one coverage for each spatial database, although a map uses only one coverage at a time.

Most operations in SAS/GIS software use composites of spatial data variables rather than the actual spatial data variables themselves. Composites identify the relationships and purpose of the variables in the spatial data.

For example, if the spatial data have variables STATEL and STATER that contain the state ID codes for the left and right sides of each feature, then the spatial database could define a composite named STATE that identifies the relationship between these variables and specifies that they delineate state areas in the map. You would use the STATE composite, rather than the actual STATEL and STATER variables, to link state areas in the map to attribute data for the corresponding state.

See Details of SAS/GIS Spatial Databases for more information about the structure of SAS/GIS spatial databases.

Attribute Data

The second type of data that are used in a GIS is attribute data. In SAS/GIS software, your attribute data must be stored in either a SAS data set or a SAS view. SAS views allow you to transparently access data in other formats. For example, you can create a SAS/ACCESS view to access data in a database such as DB2 using the SAS/ACCESS to DB2 software. A DATA step view or an SQL view also allows you to access an external file, or any other type of data for which you can create a SAS view. Once your data are accessible either as a SAS data set or through a SAS view, they can be linked to your spatial data for use in labeling, analysis through an action, or theming. For instance, your spatial data might represent a county and contain information for city boundaries, census tract boundaries, streets, and so forth. An attribute data set with population information for each census tract can be linked to a map using the corresponding tract composite in the spatial data.

Some of the ways in which you can use attribute data in SAS/GIS software include the following:

Using values in your attribute data as labels. For example, you could use attribute data containing population data to provide the text of labels for each of the census tracts.
Using the values in your attribute data as themes for layers. For example, you could use attribute data containing average household income data as a theme for the census tract layer.
See Chapter 5, "Customizing Maps," in SAS/GIS Software: Usage and Reference, Version 6 for more information about assigning themes to map layers.
Defining actions that display or manipulate the attribute data when features are selected in the map. This way, you can explore your attribute data interactively rather than simply view static results. The actions can range from simple, such as displaying observations from an attribute data set that relate to features in the map, to complex, such as submitting procedures from SAS/STAT software to perform statistical analyses.

Actions can be defined to do the following:

Display observations from the attribute data sets that relate to the selected map features.
Open additional maps that relate to selected map features.
Display images that relate to the selected map features.
Interactively subset the attribute data sets according to the subset of selected map features.
Submit SAS programs.
Issue SAS commands.
Issue host commands.
Display and edit information for the selected map features.
Organize area features into groups that are based on the attribute data.

See Chapter 4, "Performing Actions for Selected Map Features" in SAS/GIS Software: Usage and Reference, Version 6 for more information on defining and performing actions.

Designing a SAS/GIS Spatial Database

One of the first steps in a SAS/GIS project is determining the design of your SAS/GIS spatial database. The database should include all of the spatial data that the user wants to see and all of the associated attribute data that the user needs to use for analysis or presentation purposes.

Although your first tendency with a new software product may be to begin using it immediately after you install it, take some time to draw up an overview of the system goals and data requirements that you will need for your database before you begin creating it. Do not rush into building your database. The time you spend designing your database initially will save you time and expenses later in the project. Remember, a well-designed database is easier to maintain and document, and you can extend it for future GIS projects.

Use the following guidelines when determining the information you want to include in a database:

Identify the initial objective of the project and its ultimate goal. Consider any requirements that may have been imposed on it. Determine their feasibility for initial implementation and, as best as possible, the impact of any future demands upon them.
Identify the attribute data that are necessary to illustrate the project objectives. Determine if you have these data or can obtain them.
Identify the spatial features that you want on your map, for example, states, cities, rivers, roads, railroads, airports, and so forth.

Once you have determined a preliminary list of the data that you will need, use these additional factors to help evaluate and refine your list:

To utilize attribute data for map actions, themes, or labeling, the attribute data set must contain the same identification information as the spatial feature that it describes so that you can link between them. For example, if one of the items on your attribute data list is the Sales Revenue for each store, along with the Store ID Number, you probably want to include the actual location in longitude and latitude for each Store ID Number on your spatial data list. You can then place a marker at the store location and also visualize and analyze the corresponding attribute data for each store.
Do not use more detail than you need. If your store locations request the customer ZIP code at the cash register, don't assume that you need ZIP code boundaries on your map. ZIP code boundaries may be far too small for your purposes if you have stores nationwide. You may decide instead that the 3-digit ZIP code boundaries provide fewer, yet more appropriately sized, areas for your analysis. You can summarize your attribute data to the 3-digit ZIP code level and use it for your analysis, reducing both the amount of spatial data and attribute data that you need. As long as it is appropriate for your analysis, decreasing the amount of required spatial and attribute data reduces storage space and improves performance. Reducing the level of detail in the spatial data also saves money if you have to purchase the data.
If you plan to summarize your attribute data to a matching level of your spatial data, make sure the two types of data have a common level that you can use. For example, ZIP code boundaries can cross not only county boundaries, but also state boundaries, so there is usually not a one-to-one correspondence between ZIP codes and states or counties. If the only information that ties your attribute data to your spatial data is ZIP codes, you will have difficulties using your ZIP code level attribute data if you include only state or county boundaries in your spatial data.
For specific, smaller areas of the country, a one-to-one correspondence may exist that will allow you to summarize your attribute data to a higher level. However, ZIP codes can change frequently, and this correspondence may be lost. Also, because ZIP codes change, you must be able to account for these changes when performing historical analyses. For example, if you are comparing sales in a specific ZIP code area over a ten-year period, make sure that the area remained constant during that period.

Chapter Contents
Previous
Next
Top of Page