Chapter Contents |
Previous |
Next |
SAS/SPECTRAVIEW Software User's Guide |
When you first invoke SAS/SPECTRAVIEW, [Data] is selected by default, ready for you to load data. Note that you can load data at any time during a SAS/SPECTRAVIEW session by reselecting [Data].
Loading Data
Selecting a Libref |
To display the session's assigned librefs:
Selecting a Libref
Selecting a Data Set |
SAS/SPECTRAVIEW works as well with small data sets (such as 20 observations) as it does with large data sets (such as a quarter million observations). The SAS data set that you select must have at least four variables to be specified for the three axis variables and the response variable, the response variable must be numeric, and each variable specified for SAS/SPECTRAVIEW must contain at least two unique values. If you want to use a BY variable, the data set must have a fifth variable as well. To load a data set that has only three variables, see Loading a Data Set with Only Three Variables.
Select the input data set from the list of names. Use the scroll bar if there are more than 10. Once you select the input data set, the software lists the data set's variables in columns from which you can select SAS/SPECTRAVIEW variables.
Selecting a Data Set
Specifying SAS/SPECTRAVIEW Variables |
To help you select appropriate variables, you can place your cursor on a variable name, and the software will display a short description of it in the text window. For example, for the EPA data set, which contains the variables HOUR, LEVEL, LNGITUDE, LATITUDE, SULFATE, and OZONE, their descriptions provide the following information:
Type: Num, Label: Sulfate (ppm)
.
Note that any variable that is appropriate as a Response variable is not a valid choice as an axis variable, and any variable that is appropriate as an axis variable is not a valid choice for a Response variable. Attempting to read a data set with inappropriate variables selected could result in the data set failing to load. You want to specify variables that are the best ones as the axis variables to build as complete a volume grid with actual data points as possible. And you want to avoid specifying axis variables that are sparsely valued or have continuous data.
Specifying SAS/SPECTRAVIEW Variables
Once you select the four required variables, the software highlights [Read data], but you still have the option of specifying BY variable processing, duplicate values handling, data categorizing, automatic axis scaling, and data subsetting with a WHERE clause, which are discussed in the following sections.
Grouping Observations with a BY Variable |
A BY variable can be either character or numeric. BY data usually includes multiple response values for a single data point.
For example, in the EPA data set, the variable HOUR contains hour values, which would be useful as a BY variable. If you imagine that the first four variables would generate a cube of data values, then specifying a BY variable would generate a sequence of cubes of data values that can be cycled through to determine how response values change over time (in this case).
If you select LNGITUDE, LATITUDE, and LEVEL as the axis variables, SULFATE as the Response variable, then HOUR as the BY variable, you will create a sequence of volumes of data to be displayed and analyzed.
Specifying a BY Variable
Note: If you do not
specify a BY variable
but your data contains BY data (like a time variable), you may receive a message
in the text window after loading the data. The message warns that there is
more than one response value for an x,y,z coordinate. When this occurs, the
software handles the response values according to the setting on the Duplicate Values panel.
Handling Duplicate Values |
You determine how the software handles duplicate values by selecting one of the choices under the label Duplicate Values. The default is [Last], which means that the last response value encountered for a data point is used as that location's response value.
Handling Duplicate Values
To specify how the software handles duplicate values, select one of the following options:
When you load data, each response value for the resulting data points represents a count of the observations for that location. If there are no duplicate observations for a particular x,y,z location, the response value is 1, indicating that only one observation was found for that location. Similarly, if the data includes no observations for a particular x,y,z location, the response value would be 0, meaning that the data point is missing. [Count] allows you to find the number of response values that were used to calculate other values, for example, [Mean] or [Sum]. If you load data with [Mean], you may want to know how many values were used to calculate the mean value shown at a particular x,y,z location. You can load again using [Count], then probe the data to reveal the number used for the mean.
With [Nmiss] specified, every data point has a response value indicating how many missing response values were encountered for that location. If a valid data point has five observations and only three had response values, then that data point's response value is 2, meaning two observations were found missing a response value for that location. [Nmiss] only counts valid data points having no response value. It does not count filler points generated by the software. If the data does not contain an observation for an x,y,z location, the software inserts a data point that has a missing response value. This means that if you load a data set, display it as a point cloud, and discover there are several missing values in the volume grid, you can reload the data with [Nmiss] selected and determine which missing values are caused by missing response values as opposed to missing axis values.
Categorizing Data |
Continuous data (containing few gaps that vary slightly over a large range like weight and height) are a good candidate for categorizing. For example, to analyze a group of people's heart rate based on their age, activity level, and weight, the weight values, which would be in pounds like 139.5, 143.6, would be considered continuous. That is, it is not likely that any two people (let alone several) would have the same weight but a different age and activity level. Categorizing the weight values by creating weight categories for ranges of weight with one value to represent each category would make the data clearer and easier to use.
Discrete data (containing natural gaps like patient IDs and years) would probably not be as useful to categorize. But discrete data such as hour could be categorized into groups if the degree of precision can be reduced without losing data integrity.
To categorize data:
Categorizing Data
[Lower] | Uses the lower bound value in each range. |
[Midpoint] | Uses the midpoint value in each range. This is the default setting. |
[Upper] | Uses the upper bound value in each range. |
[Bounds] | Uses both the upper and lower bound values in each range.
The values display as a range, for example,
125.1-225.1 for each major
tick mark. |
For example, suppose values for the X variable are integers from 1 to 100. If you categorize the X values into groups of 10 values, 1-10 would be a single category. The data points 1,1,1 and 2,1,1 and 3,1,1 and so forth are viewed by the software as the same data point in the volume grid, because they would all have the same X, Y, and Z values.
The response values for the 10 data points would appear to be 10 different response values for the same data point. The response values for the duplicate locations are handled according to the method specified for duplicate values handling, with the default being to use the last response value found as the category's response value.
Automatically Scaling Axes |
Note: Once a data set is loaded, [Auto scale] is deselected. To
load a subsequent data set with automatic scaling, you must select [Auto scale] again.
Subsetting Data with a WHERE Clause |
Subsetting can change the size and shape of the volume grid. For example, subsetting data can create holes that are replaced with filler points, or subsetting can remove holes in data.
Prior to selecting [Read data], you can specify subsetting conditions using a SAS WHERE clause:
sulfate
> .00005060
.
Subsetting Data
Reading the Data Set |
To have the software read the data, select [Read data].
The software loads the input data, applying any optional specifications. For example, if a WHERE clause is specified, the software loads only those observations meeting the criteria, and if categorizing is specified, the software changes the number of data points accordingly. Once the data set is loaded, the variable list disappears, and the software is ready for you to
If you have loading problems, see Resolving Data Loading Problems.
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.