The SCORE Procedure

Getting Started

The SCORE procedure multiplies the values from two SAS data sets and creates a new data set to contain the results of the multiplication. The variables in the new data set are linear combinations of the variables in the two input data sets. Typically, one of these data sets contains raw data that you want to score, and the other data set contains scoring coefficients.

The following example demonstrates how to use the SCORE procedure to multiply values from two SAS data sets, one containing factor-scoring coefficients and the other containing raw data to be scored using the scoring coefficients.

Suppose you are interested in the performance of three different types of schools: private schools, state-run urban schools, and state-run rural schools. You want to compare the schools' performances as measured by student grades on standard tests in English, mathematics, and biology. You administer these tests and record the scores for each of the three types of schools.

The following DATA step creates the SAS data set Schools. The data are provided by Chaseling (1996).

   data Schools;
      input Type $ English Math Biology @@;
      datalines;
   p  52  55  45  p  42  49  40  p  63  64  54
   p  47  50  51  p  64  69  47  p  63  67  54
   p  59  63  42  p  56  61  41  p  41  44  72
   p  39  42  45  p  56  63  44  p  63  73  42
   p  62  68  46  p  51  61  51  p  45  56  54
   p  63  66  63  p  65  67  57  p  49  50  47
   p  47  48  34  p  53  54  46  p  49  40  43
   p  50  41  50  p  82  72  80  p  68  61  62
   p  68  61  46  p  63  53  48  p  77  72  74
   p  50  47  60  p  61  49  48  p  64  54  45
   p  60  53  40  p  80  69  75  p  76  69  77
   p  55  48  51  p  85  76  80  p  70  64  48
   p  61  51  61  p  51  47  58  p  78  72  79
   p  52  47  46  u  49  47  58  u  64  72  45
   u  36  44  46  u  32  43  46  u  52  57  42
   u  45  47  53  u  44  52  43  u  54  63  42
   u  39  45  49  u  48  51  46  u  53  61  54
   u  28  32  33  u  52  59  44  u  54  61  51
   u  60  65  66  u  60  63  63  u  47  52  49
   u  28  31  32  u  43  46  45  u  40  42  48
   u  66  51  48  u  79  68  77  u  58  52  49
   u  34  29  33  u  47  35  40  u  60  49  49
   u  62  50  51  u  69  50  47  u  59  41  52
   u  56  44  43  u  76  61  74  u  50  36  52
   u  69  56  52  u  57  41  55  u  56  44  51
   u  52  42  42  u  51  36  42  u  44  31  57
   u  79  68  77  u  61  44  41  r  38  28  22
   r  35  28  24  r  50  47  48  r  36  28  38
   r  69  65  53  r  55  44  41  r  62  58  45
   r  57  55  32  r  47  42  66  r  45  38  45
   r  56  55  42  r  39  36  33  r  63  51  42
   r  42  41  48  r  51  44  52  r  47  42  44
   r  53  42  47  r  62  59  48  r  80  74  81
   r  95  79  95  r  65  60  43  r  67  60  53
   r  42  43  50  r  70  68  55  r  63  56  48
   r  37  33  34  r  49  47  49  r  42  43  50
   r  44  46  47  r  62  55  44  r  67  64  52
   r  77  77  69  r  43  42  52  r  51  54  45
   r  67  65  45  r  65  73  49  r  34  29  32
   r  50  47  49  r  55  48  46  r  38  36  51
   ;

The data set Schools contains the character variable Type, which represents the type of school. Valid values are p (private schools), r (state-run rural schools), and u (state-run urban schools).

The three numeric variables in the data set are English, Math, and Biology, which represent the student scores for English, mathematics, and biology, respectively. The double trailing at sign (@@) in the INPUT statement specifies that observations are input from each line until all values are read.

The following statements invoke the FACTOR procedure to compute the data set of factor scoring coefficients. The statements perform a principle components factor analysis using all three numeric variables in the SAS data set Schools. The OUTSTAT= option requests that PROC FACTOR output the factor scores to the data set Scores. The NOPRINT option suppresses display of the output.

   proc factor data=schools score outstat=scores noprint;
      var english math biology;
   proc score data=schools score=scores out=new;
      var english math biology;
      id type;
   run;

The SCORE procedure is then invoked using Schools as the raw data set to be scored and Scores as the scoring data set. The OUT= option creates the SAS data set New to contain the linear combinations.

The VAR statement specifies that the variables English, Math, and Biology are used in computing scores. The ID statement copies the variable Type from the Schools data set to the output data set New.

The following statements print the SAS output data set Scores, the first two observations from the original data set Schools, and the first two observations of the resulting data set New.

   title 'Scoring Coefficient Data Set from PROC FACTOR';
      proc print data=scores;
   run;
   title 'First Two Observations of the Original Schools data set';
      proc print data=schools(obs=2);
   run;
   title 'First Two Observations of the New Data Set from PROC SCORE';
      proc print data=New(obs=2);
   run;

Figure 57.1 displays the output data set Scores produced by the FACTOR procedure. The last observation (observation number 11) contains the scoring coefficients (_TYPE_='SCORE'). Only one factor has been retained. Figure 57.1 also lists the first two observations of the original SAS data set Schools and the first two observations of the output data set New from the SCORE procedure.

Scoring Coefficient Data Set from PROC FACTOR

Obs	_TYPE_	_NAME_	English	Math	Biology
1	MEAN		55.525	52.325	50.350
2	STD		12.949	12.356	12.239
3	N		120.000	120.000	120.000
4	CORR	English	1.000	0.833	0.672
5	CORR	Math	0.833	1.000	0.594
6	CORR	Biology	0.672	0.594	1.000
7	COMMUNAL		0.881	0.827	0.696
8	PRIORS		1.000	1.000	1.000
9	EIGENVAL		2.405	0.437	0.159
10	PATTERN	Factor1	0.939	0.910	0.834
11	SCORE	Factor1	0.390	0.378	0.347

First Two Observations of the Original Schools data set

Obs	Type	English	Math	Biology
1	p	52	55	45
2	p	42	49	40

First Two Observations of the New Data Set from PROC SCORE

Obs	Type	Factor1
1	p	-0.17604
2	p	-0.80294

Figure 57.1: Views of the Scores, Schools, and New Data Sets

The score variable Factor1 in the New data set is named according to the value of the _NAME_ variable in the Scores data set. The values of the variable Factor1 are computed as follows: the original data set variables are standardized to a mean of 0 and a variance of 1. These standardized variables are then multiplied by their respective standardized scoring coefficients from the data set Scores. These products are summed over all three variables, and the sum is the value of the new variable Factor1. The first two values of the scored variable Factor1 are obtained as follows:

([((52 - 55.525))/12.949]×0.390) + ([((55 - 52.325))/12.356]×0.378) + ([((45 - 50.350))/12.239]×0.347) = -0.17604

([((42 - 55.525))/12.949]×0.390) + ([((49 - 52.325))/12.356]×0.378) + ([((40 - 50.350))/12.239]×0.347) = -0.80294

The following statements request that the GCHART procedure produce a horizontal bar chart of the variable Type. The length of each bar represents the mean of the variable Factor1.

   proc gchart;
      hbar type/type=mean sumvar=Factor1;
   run;

Figure 57.2: Bar Chart of School Type; Length is Value of the Variable Factor1

Figure 57.2 displays the mean score of the variable Factor1 for each of the three school types. For private schools (Type=p), the average value of the variable Factor1 is 0.384, while for state-run schools the average value is much lower. The state-run urban schools (Type=u) have the lowest mean value of -0.202, and the state-run rural schools (Type=r) have a mean value of -0.183.

Chapter Contents
Previous
Next
Top