PROC PRINCOMP Statement
- PROC PRINCOMP < options > ;
The PROC PRINCOMP statement starts the PRINCOMP procedure
and, optionally, identifies input and output data sets, specifies
details of the analysis, or suppresses the display of output.
You can specify the following options in the PROC PRINCOMP statement.
Task
|
|
Options
|
Specify data sets | | DATA= |
| | OUT= |
| | OUTSTAT= |
Specify details of analysis | | COV |
| | N= |
| | NOINT |
| | PREFIX= |
| | SINGULAR= |
| | STD |
| | VARDEF= |
Suppress the display of output | | NOPRINT |
The following list provides details on these options.
- COVARIANCE
- COV
-
computes the principal components from the covariance matrix.
If you omit the COV option,
the correlation matrix is analyzed.
Use of the COV option causes variables with large variances
to be more strongly associated with components with large
eigenvalues and causes variables with small variances to be more
strongly associated with components with small eigenvalues.
You should not specify the COV option unless the units
in which the variables are measured are comparable
or the variables are standardized in some way.
If you specify the COV option, the procedure calculates scores using
the centered variables rather than the standardized variables.
- DATA=SAS-data-set
-
specifies the SAS data set to be analyzed.
The data set can be an ordinary SAS data set or
a TYPE=ACE, TYPE=CORR, TYPE=COV, TYPE=FACTOR, TYPE=SSCP,
TYPE=UCORR, or TYPE=UCOV data set
(see Appendix A, "Special SAS Data Sets").
Also, the PRINCOMP procedure can read the _TYPE_=`COVB'
matrix from a TYPE=EST data set.
If you omit the DATA= option, the procedure
uses the most recently created SAS data set.
- N=number
-
specifies the number of principal components to be computed.
The default is the number of variables.
The value of the N= option must be an integer greater than or equal to zero.
- NOINT
-
omits the intercept from the model.
In other words, the NOINT option requests that the covariance
or correlation matrix not be corrected for the mean.
When you use the PRINCOMP procedure with the NOINT option, the covariance matrix
and, hence, the standard deviations are not corrected for the mean.
If you are interested in the standard deviations corrected for
the mean, you can get them by using a procedure such as the MEANS procedure.
If you use a TYPE=SSCP data set as input to the PRINCOMP
procedure and list the variable Intercept in the VAR
statement, the procedure acts as if you had also specified the NOINT
option. If you use NOINT and also create an OUTSTAT= data set, the
data set is TYPE=UCORR or TYPE=UCOV rather than TYPE=CORR or TYPE=COV.
- NOPRINT
-
suppresses the display of all output. Note that this option
temporarily disables the Output Delivery System (ODS).
For more information, see Chapter 15, "Using the Output Delivery System."
- OUT=SAS-data-set
-
creates an output SAS data set that contains all the
original data as well as the principal component scores.
If you want to create a permanent SAS data set, you must
specify a two-level name (refer to SAS Language Reference:
Concepts for information on permanent SAS data sets).
- OUTSTAT=SAS-data-set
-
creates an output SAS data set that contains means,
standard deviations, number of observations, correlations
or covariances, eigenvalues, and eigenvectors.
If you specify the COV option, the data set is TYPE=COV
or TYPE=UCOV, depending on the NOINT option, and it contains
covariances; otherwise, the data set is TYPE=CORR or TYPE=UCORR,
depending on the NOINT option, and it contains correlations.
If you specify the PARTIAL statement, the OUTSTAT=
data set contains R-squares as well.
If you want to create a permanent SAS data set, you must
specify a two-level name (refer to SAS Language
Reference: Concepts for information on permanent SAS data sets).
- PREFIX=name
-
specifies a prefix for naming the principal components.
By default, the names are Prin1,
Prin2, ... , Prinn.
If you specify PREFIX=ABC, the components are
named ABC1, ABC2, ABC3, and so on.
The number of characters in the prefix plus the number of digits
required to designate the variables should not exceed the
current name length defined by the VALIDVARNAME= system option.
- SINGULAR=p
- SING=p
-
specifies the singularity criterion, where 0<p<1.
If a variable in a PARTIAL statement has an R-square
as large as 1-p when predicted from the variables
listed before it in the statement, the variable
is assigned a standardized coefficient of 0.
By default, SINGULAR=1E-8.
- STANDARD
- STD
-
standardizes the principal component scores
in the OUT= data set to unit variance.
If you omit the STANDARD option, the scores
have variance equal to the corresponding eigenvalue.
Note that STANDARD has no effect on the eigenvalues themselves.
- VARDEF=DF | N | WDF | WEIGHT | WGT
-
specifies the divisor used in calculating
variances and standard deviations.
By default, VARDEF=DF.
The following table displays the values and associated divisors.
Value
|
Divisor
|
Formula
|
|
DF | error degrees of freedom | n-i | (before partialling) |
| | n-p-i | (after partialling) |
N | number of observations | n | |
WEIGHT | WGT | sum of weights | | |
WDF | sum of weights minus one | | (before partialling) |
| | | (after partialling) |
In the formulas for VARDEF=DF and VARDEF=WDF, p is the number
of degrees of freedom of the variables in the PARTIAL statement,
and i is 0 if the NOINT option is specified and 1 otherwise.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.