Chapter Contents |
Previous |
Next |
Sample DATA Step |
data total_points (drop=TeamName); [1] input TeamName $ ParticipantName $ Event1 Event2 Event3; [2] TeamTotal + (Event1 + Event2 + Event3); [3] datalines; Knights Sue 6 8 8 Cardinals Jane 9 7 8 Knights John 7 7 7 Knights Lisa 8 9 9 Knights Fran 7 6 6 Knights Walter 9 8 10 ;
The DROP= data set option prevents the variable TeamName from being written to the output SAS data set called TOTAL_POINTS. | |
The INPUT statement describes the data by giving a name to each variable, identifying its data type (character or numeric), and identifying its relative location in the data record. | |
The Sum statement accumulates the scores for three events in the variable TeamTotal. |
Creating the Input Buffer and the Program Data Vector |
The PDV contains all the variables in the input data set, the variables created in DATA step statements, and the two variables, _N_ and _ERROR_, that are automatically generated for every DATA step. The _N_ variable represents the number of times the DATA step has iterated. The _ERROR_ variable acts like a binary switch whose value is 0 if no errors exist in the DATA step, or 1 if one or more errors exist. The following figure shows the Input Buffer and the program data vector after DATA step compilation.
Input Buffer and Program Data Vector
Variables that are created by the INPUT and the Sum statements (TeamName, ParticipantName, Event1, Event2, Event3, and TeamTotal) are set to missing initially. Note that in this representation, numeric variables are initialized with a period and character variables are initialized with blanks. The automatic variable _N_ is set to 1; the automatic variable _ERROR_ is set to 0.
The variable TeamName is marked Drop in the PDV because of the DROP= data set option in the DATA statement. Dropped variables are not written to the SAS data set. The _N_ and _ERROR_ variables are dropped because automatic variables created by the DATA step are not written to a SAS data set. See SAS Variables for details about automatic variables.
Reading a Record |
Position of the Pointer in the Input Buffer Before SAS Reads Data
The INPUT statement then reads data values from the record in the input buffer and writes them to the PDV where they become variable values. The following figure shows both the position of the pointer in the input buffer, and the values in the PDV after SAS reads the first record.
Values from the First Record are Read into the Program Data Vector
After the INPUT statement reads a value for each variable, SAS executes the Sum statement. SAS computes a value for the variable TeamTotal and writes it to the PDV. The following figure shows the PDV with all of its values before SAS writes the observation to the data set.
Program Data Vector with Computed Value of the Sum Statement
Writing an Observation to the SAS Data Set |
The First Observation in Data Set TOTAL_POINTS
SAS then returns to the DATA statement to begin the next iteration. SAS resets the values in the PDV in the following way:
The following figure shows the current values in the PDV.
Current Values in the Program Data Vector
Reading the Next Record |
Input Buffer, Program Data Vector, and First Two Observations
As SAS continues to read records, the value in TeamTotal grows larger as more participant scores are added to the variable. _N_ is incremented at the beginning of each iteration of the DATA step. This process continues until SAS reaches the end of the input file.
When the DATA Step Finishes Executing |
Output from the Walkthrough DATA Step
Total Team Scores 1 Participant Team Obs Name Event1 Event2 Event3 Total 1 Sue 6 8 8 22 2 Jane 9 7 8 46 3 John 7 7 7 67 4 Lisa 8 9 9 93 5 Fran 7 6 6 112 6 Walter 9 8 10 139 |
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.