What's New in SAS Software

SAS/INSIGHT Software

Principal Component Analysis

Principal component analysis was available in Version 6 of SAS/INSIGHT software, but has been extended in the current version. Given a set of several Y variables, principal component analysis produces a set of linear combinations of the the Y variables. The coefficients of the linear combinations are the eigenvectors of the covariance or correlation matrix. Specifically, principal components are formed as follows:

The first principal component is the linear combination of the Y variables that accounts for the greatest possible variance.
Each subsequent principal component is the linear combination of the Y variables that has the greatest possible variance and is uncorrelated with the previously defined components.

Component Rotation

You can generate tables of output from principal component rotation by setting options in the Rotation Options dialog or by selecting Component Rotation from the Tables menu. You specify the number of components and type of rotation in the Rotation Options dialog.

Figure 1.11: Principal Components Rotation Dialog

The Orthogonal Rotation Matrix table displays the orthogonal rotation matrix used to compute the rotated principal components from the standardized principal components.

The Correlations (Structure) and Covariances tables include the correlations and covariances between the Y variables and the rotated principal components.

The scoring coefficients are the coefficients of the Y variables used to generate rotated principal components. The Std Scoring Coefs table includes the scoring coefficients of the standardized Y variables and the Raw Scoring Coefs table includes the scoring coefficients of the centered Y variables.

The Communality Estimates table gives the standardized variance of each Y variable explained by the rotated principal components.

The Redundancy table gives the variances of the standardized Y variables explained by each rotated principal component.

Figure 1.12: Rotation Matrix, Correlation, and Covariance Tables

Figure 1.13: Scoring Coefficients, Communality, and Redundancy Tables

Biplots

You can use principal component analysis to transform the Y variables into a smaller number of principal components that account for most of the variance of the Y variables. The plots of the first few components can reveal useful information about the distribution of the data, such as identifying different groups of the data or identifying observations with extreme values (possible outliers).

You can request a plot of the first two principal components or the first three principal components from the Principal Components Options dialog, or by selecting Principal Components from the Graphs menu.

In the dialog, you choose a principal component scatter plot (Scatter Plot), a principal component biplot with standardized Y variables (Biplot (Std Y)), or a principal component biplot with centered Y variables (Biplot (Raw Y)).

Figure 1.14: Principal Component Plots Dialog

A biplot is a joint display of two sets of variables. The data points are first displayed in a scatter plot of principal components. With the approximated Y variable axes also displayed in the scatter plot, the data values of the Y variables are graphically estimated. A biplot is a useful tool to examine data patterns and outliers.

The Y variable axes are generated from the regression coefficients of the Y variables on the principal components. The lengths of the axes are approximately proportional to the standard deviations of the variables. A closer parallel between a Y variable axis and a principal component axis indicates a higher correlation between the two variables.

For a Y variable Y1, the Y1 variable value of a data point y in a principal component biplot is geometrically evaluated as follow:

A perpendicular is dropped from point y onto Y1 axis.
Measure the distance from the origin to this perpendicular.
Multiply the distance by the length of the Y1 axis, this gives an approximation of the Y1 variable value for point y.

Two sets of variables are used in creating principal component biplots. One set is the Y variables. Either standardized or centered Y variables are used and it is specified in the Principal Component Plots dialog.

The other set is the principal component variables. These variables have variances either equal to one or equal to corresponding eigenvalues. You specify the principal component variable variance in the Multivariate Method Options dialog.

Note: A biplot with principal component variable variances equal to one is called a GH' biplot and a biplot with principal component variable variances equal to corresponding eigenvalues is called a JK' biplot.

Figure 1.15: Principal Component Plots

Chapter Contents
Previous
Next
Top