Stratified Sampling

The SURVEYSELECT Procedure

Stratified Sampling

In this section, stratification is added to the sample design for the customer satisfaction survey. The sampling frame, or list of all customers, is stratified by State and Type. This divides the sampling frame into nonoverlapping subgroups formed from the values of the State and Type variables. Samples are then selected independently within the strata.

PROC SURVEYSELECT requires that the input data set be sorted by the STRATA variables. The following PROC SORT statements sort the Customers data set by the stratification variables State and Type.

   proc sort data=Customers;
      by State Type;
   run;

The following PROC FREQ statements display the crosstabulation of the Customers data set by State and Type.

   proc freq data=Customers;
      tables State*Type;
   run;

Figure 63.4 presents the table of State by Type for the 13,471 customers. There are four states and two levels of Type, forming a total of eight strata.

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct

Table of State by Type
State	Type		Total
State	New	Old	Total
AL	1238 9.19 63.68 14.43	706 5.24 36.32 14.43	1944 14.43
FL	2170 16.11 61.30 25.29	1370 10.17 38.70 28.01	3540 26.28
GA	3488 25.89 64.26 40.65	1940 14.40 35.74 39.66	5428 40.29
SC	1684 12.50 65.81 19.63	875 6.50 34.19 17.89	2559 19.00
Total	8580 63.69	4891 36.31	13471 100.00

Figure 63.4: Stratification of Customers by State and Type

The following PROC SURVEYSELECT statements select a probability sample of customers from the Customers data set according to the stratified sample design.

   title1 'Customer Satisfaction Survey';
   title2 'Stratified Sampling';
   proc surveyselect data=Customers method=srs n=15
         seed=1953 out=SampleStrata;
      strata State Type;
   run;

The STRATA statement names the stratification variables State and Type. In the PROC SURVEYSELECT statement, the METHOD=SRS option specifies simple random sampling. The N=15 option specifies a sample size of 15 customers for each stratum. If you want to specify different sample sizes for different strata, you can use the N=SAS-data-set option to name a secondary data set that contains the stratum sample sizes. The SEED=1953 option specifies '1953' as the initial seed for random number generation.

Figure 63.5 displays the output from PROC SURVEYSELECT, which summarizes the sample selection. A total of 120 customers are selected.

Customer Satisfaction Survey

Stratified Sampling

The SURVEYSELECT Procedure

Selection Method	Simple Random Sampling
Strata Variables	State
	Type

Input Data Set	CUSTOMERS
Random Number Seed	1953
Stratum Sample Size	15
Number of Strata	8
Total Sample Size	120
Output Data Set	SAMPLESTRATA

Figure 63.5: Sample Selection Summary

The following PROC PRINT statements display the first 30 observations of the output data set SampleStrata.

   title1 'Customer Satisfaction Survey';
   title2 'Sample Selected by Stratified Design';
   title3 '(First 30 Observations)';
   proc print data=SampleStrata(obs=30);
   run;

Figure 63.6 displays the first 30 observations of the output data set SampleStrata, which contains the sample of 120 customers, 15 customers from each of the 8 strata. The variable SelectionProb contains the selection probability for each customer in the sample. Since customers are selected with equal probability within strata in this design, the selection probability equals the stratum sample size (15) divided by the stratum population size. The selection probabilities differ from stratum to stratum since the population sizes differ. The selection probability for each customer in the first stratum (State=`AL' and Type=`New') is 0.012116, and the selection probability is 0.021246 for customers in the second stratum. The variable SamplingWeight contains the sampling weights, which are computed as inverse selection probabilities.

Customer Satisfaction Survey

Sample Selected by Stratified Design

(First 30 Observations)

Obs	State	Type	CustomerID	Usage	SelectionProb	SamplingWeight
1	AL	New	002-26-1498	1189	0.012116	82.5333
2	AL	New	070-86-8494	106	0.012116	82.5333
3	AL	New	121-28-6895	76	0.012116	82.5333
4	AL	New	131-79-7630	265	0.012116	82.5333
5	AL	New	211-88-4991	108	0.012116	82.5333
6	AL	New	222-81-3742	83	0.012116	82.5333
7	AL	New	238-46-3776	278	0.012116	82.5333
8	AL	New	370-01-0671	123	0.012116	82.5333
9	AL	New	407-07-5479	1580	0.012116	82.5333
10	AL	New	550-90-3188	177	0.012116	82.5333
11	AL	New	582-40-9610	46	0.012116	82.5333
12	AL	New	672-59-9114	66	0.012116	82.5333
13	AL	New	848-60-3119	28	0.012116	82.5333
14	AL	New	886-83-4909	170	0.012116	82.5333
15	AL	New	993-31-7677	64	0.012116	82.5333
16	AL	Old	124-60-0495	80	0.021246	47.0667
17	AL	Old	128-54-9590	56	0.021246	47.0667
18	AL	Old	204-05-4017	17	0.021246	47.0667
19	AL	Old	210-68-8704	4363	0.021246	47.0667
20	AL	Old	239-75-4343	430	0.021246	47.0667
21	AL	Old	317-70-6496	452	0.021246	47.0667
22	AL	Old	365-37-1340	21	0.021246	47.0667
23	AL	Old	399-78-7900	108	0.021246	47.0667
24	AL	Old	404-90-6273	824	0.021246	47.0667
25	AL	Old	421-04-8548	1332	0.021246	47.0667
26	AL	Old	604-48-0587	16	0.021246	47.0667
27	AL	Old	774-04-0162	318	0.021246	47.0667
28	AL	Old	849-66-4156	79	0.021246	47.0667
29	AL	Old	937-69-9106	182	0.021246	47.0667
30	AL	Old	985-09-8691	24	0.021246	47.0667

Figure 63.6: Customer Sample (First 30 Observations)

Chapter Contents
Previous
Next
Top