Chapter Contents |
Previous |
Next |
Overview |
The BY statement modifies the action of the SET, MERGE, MODIFY, or UPDATE statement by controlling when the values in the program data vector are set to missing. During BY-group processing, SAS retains the values of variables until it has copied the last observation it finds for that BY group in any of the data sets. Without the BY statement, the SET statement sets variables to missing when it reads the last observation from any data set, and the MERGE statement does not set variables to missing after the DATA step starts reading observations into the program data vector.
You can process observations conditionally by using the subsetting IF or IF-THEN statements, or the SELECT statement, with the temporary variables FIRST.variable and LAST.variable (set up during BY-group processing). For example, you can use them to perform calculations for each BY group and to write an observation when the first or the last observation of a BY group has been read into the program data vector.
The following example computes annual payroll by department. It uses IF-THEN statements and the values of FIRST.variable and LAST.variable automatic variables to reset the value of PAYROLL to 0 at the beginning of each BY group and to write an observation after the last observation in a BY group is processed.
options pageno=1 nodate linesize=80 pagesize=60; proc sort data=salaries out=temp; by Department; data budget (keep=Department Payroll); set temp; by Department; if WageCategory='Salaried' then YearlyWage=WageRate*12; else if WageCategory='Hourly' then YearlyWage=WageRate*2000; /* Set FIRST.variable to 1 if this is a new department */ /* in the BY group. */ if first.Department then Payroll=0; Payroll+YearlyWage; /* Set LAST.variable to 1 if this is the last department */ /* in the current BY group. */ if last.Department; run; proc print data=budget; format Payroll dollar10.; title 'Annual Payroll by Department'; run;
Output from Conditional BY-Group Processing
Annual Payroll by Department 1 Obs Department Payroll 1 BAD $952,000 2 DDG $448,000 3 PPD $522,000 4 STD $496,000 |
Data Grouped by Ascending Order |
options pageno=1 nodate linesize=80 pagesize=60; data current_inventory; set inventory; by Name; Number=Number+13; run; proc print data=current_inventory; title 'Current Inventory'; run;
Output in Ascending Order of BY Variable
Current Inventory 1 Obs Name Number 1 Fern 63 2 Hosta 31 3 Ivy 24 4 Moss 21 5 Rose 19 6 Vinca 16 |
Data Grouped by Descending Order |
options pageno=1 nodate linesize=80 pagesize=60; data current_inventory; set inventory; by descending Name; Number=Number+33; run; proc print data=current_inventory; title 'Number of Plants on Order'; run;
Output in Decreasing Order of BY Variable
Number of Plants on Order 1 Obs Name Number 1 Vinca 36 2 Rose 39 3 Moss 41 4 Ivy 44 5 Hosta 51 6 Fern 83 |
Data Not in Alphabetic or Numeric Order |
This example assumes that the data is grouped by MONTH. The subsetting IF statement conditionally writes an observation, based on the value of LAST.month. This DATA step writes an observation only after processing the last observation in each BY group.
data total_sale(drop=sales); set region.sales by month notsorted; total+sales; if last.month; run;
Data Grouped by Formatted Values |
Use the GROUPFORMAT option in the BY statement to ensure that
The GROUPFORMAT option is valid only in the DATA step that creates the SAS data set. It is particularly useful with user-defined formats.
This example uses the FORMAT procedure, the GROUPFORMAT option, and the FORMAT statement to create and print a simple data set. The input TEST data set is sorted by ascending values. The NEWTEST data set is arranged by the formatted values of the variable Score.
options pageno=1 nodate linesize=80 pagesize=60; /* Create a user-defined format */ proc format; value Range 1-2='Low' 3-4='Medium' 5-6='High'; run; /* Create the SAS data set */ data newtest; set test; by groupformat Score; format Score Range.; run; /* Print using formatted values */ proc print data=newtest; title 'Score Categories'; var Name Score; by Score; run;
Grouping Observations By Using Formatted Values
Score Categories 1 ---------------------------------- Score=Low ----------------------------------- Obs Name Score 1 Jon Low --------------------------------- Score=Medium --------------------------------- Obs Name Score 2 Anthony Medium 3 Miguel Medium 4 Joseph Medium ---------------------------------- Score=High ---------------------------------- Obs Name Score 5 Ian High 6 Jan High |
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.