Example 24.1: Simple Correspondence Analysis
of Cars and Their Owners
In this example, PROC CORRESP creates a contingency table from
categorical data and performs a simple correspondence analysis. The
data are from a sample of individuals who were asked to provide information
about themselves and their cars. The questions included origin of the
car (American, Japanese, European) and family status (single, married,
single and living with children, and married living with children).
These data are used again in Example 24.2.
The first steps read the input data and assign formats. PROC CORRESP
is used to perform the simple correspondence analysis. The ALL option
displays all tables including the contingency table,
chi-square information, profiles, and all results of the correspondence
analysis. The OUTC= option creates an output coordinate data set. The
TABLES statement specifies the row and column categorical variables. The
%PLOTIT macro is used to plot the results.
Normally, you only need to tell the %PLOTIT macro the name of
the input data set, DATA=Coor, and
the type of analysis performed on the data, DATATYPE=CORRESP.
The following statements produce Output 24.1.1:
title 'Car Owners and Car Origin';
proc format;
value Origin 1 = 'American' 2 = 'Japanese' 3 = 'European';
value Size 1 = 'Small' 2 = 'Medium' 3 = 'Large';
value Type 1 = 'Family' 2 = 'Sporty' 3 = 'Work';
value Home 1 = 'Own' 2 = 'Rent';
value Sex 1 = 'Male' 2 = 'Female';
value Income 1 = '1 Income' 2 = '2 Incomes';
value Marital 1 = 'Single with Kids' 2 = 'Married with Kids'
3 = 'Single' 4 = 'Married';
run;
data Cars;
missing a;
input (Origin Size Type Home Income Marital Kids Sex) (1.) @@;
* Check for End of Line;
if n(of Origin -- Sex) eq 0 then do; input; return; end;
marital = 2 * (kids le 0) + marital;
format Origin Origin. Size Size. Type Type. Home Home.
Sex Sex. Income Income. Marital Marital.;
output;
datalines;
131112212121110121112201131211011211221122112121131122123211222212212201
121122023121221232211101122122022121110122112102131112211121110112311101
211112113211223121122202221122111311123131211102321122223221220221221101
122122022121220211212201221122021122110132112202213112111331226122221101
1212110231AA220232112212113112112121220212212202112111022222110212121221
211211012211222212211101313112113121220121112212121112212211222221112211
221111011112220122212201131211013121220113112222131112012131110221112211
121112212211121121112201321122311311221113112212213211013121220221221101
133211011212220233311102213111023211122121312222212212111111222121112211
133112011212112212112212212222022131222222121101111122022211220113112212
211112012232220121221102213211011131220121212201211122112331220233312202
222122012111220212112201221122112212220222212211311122012111110112212212
112222011131112221212202322211021222110121221101333211012232110132212101
223222013111220112211101211211022112110212211102221122021111220112111211
111122022121110113311122322111122221210222211101212122021211221232112202
1331110113112211213222012131221211112212221122021331220212121112121.2212
121122.22121210233112212222121011311122121211102211122112121110121212101
311212022231221112112211211211312221221213112212221122022222110131212202
213122211311221212112222113122221221220213111221121211221211221221221102
131122211211220221222101223112012111221212111102223122111311222121111102
2121110121112202133122222311122121312212112.2101312122012111122112112202
111212023121110111112221212111012211220221321101221211122121220112111112
212211022111110122221101121112112122110122122232221122212211221212112202
213122112211110212121201113211012221110232111102212211012112220121212202
221112011211220121221101211211022211221112121101111112212121221111221201
211122122122111212112221111122312132110113121101121122222111220222121102
221211012122110221221102312111012122220121121101121122221111222212221102
212122021222120113112202121122212121110113111101123112212111220113111101
221112211321210131212211121211011222110122112222123122023121223112212202
311211012131110131221102112211021131220213122201222111022121221221312202
131.22523221110122212221131112412211220221121112131222022122220122122201
212111011311220221312202221122123221210121222202223122121211221221111112
211111121211221221212201113122122131220222112222211122011311110112312211
211222013221220121211211312122122221220122112201111222011211110122311112
312111021231220122121101211112112.22110222112212121122122211110121112101
121211013211222121112222321112112112110121321101113111012221220121312201
213211012212220221211101321122121111220221121101122211021122110213112212
212122011211122131221101121211022212220212121101
;
*---Perform Simple Correspondence Analysis---;
proc corresp all data=Cars outc=Coor;
tables Marital, Origin;
run;
*---Plot the Simple Correspondence Analysis Results---;
%plotit(data=Coor, datatype=corresp)
Correspondence analysis locates all the categories in a Euclidean space.
The first two dimensions of this space are plotted to examine the
associations among the categories. Since the smallest dimension of this
table is three, there is no loss of information when only two dimensions
are plotted. The plot should be thought of as two different overlaid
plots, one for each categorical variable. Distances between
points within a variable have meaning, but distances between
points from different variables do not.
Output 24.1.1: Simple Correspondence Analysis of a Contingency Table
Car Owners and Car Origin |
Contingency Table |
|
American |
European |
Japanese |
Sum |
Married |
37 |
14 |
51 |
102 |
Married with Kids |
52 |
15 |
44 |
111 |
Single |
33 |
15 |
63 |
111 |
Single with Kids |
6 |
1 |
8 |
15 |
Sum |
128 |
45 |
166 |
339 |
Chi-Square Statistic Expected Values |
|
American |
European |
Japanese |
Married |
38.5133 |
13.5398 |
49.9469 |
Married with Kids |
41.9115 |
14.7345 |
54.3540 |
Single |
41.9115 |
14.7345 |
54.3540 |
Single with Kids |
5.6637 |
1.9912 |
7.3451 |
Observed Minus Expected Values |
|
American |
European |
Japanese |
Married |
-1.5133 |
0.4602 |
1.0531 |
Married with Kids |
10.0885 |
0.2655 |
-10.3540 |
Single |
-8.9115 |
0.2655 |
8.6460 |
Single with Kids |
0.3363 |
-0.9912 |
0.6549 |
Contributions to the Total Chi-Square Statistic |
|
American |
European |
Japanese |
Sum |
Married |
0.05946 |
0.01564 |
0.02220 |
0.09730 |
Married with Kids |
2.42840 |
0.00478 |
1.97235 |
4.40553 |
Single |
1.89482 |
0.00478 |
1.37531 |
3.27492 |
Single with Kids |
0.01997 |
0.49337 |
0.05839 |
0.57173 |
Sum |
4.40265 |
0.51858 |
3.42825 |
8.34947 |
|
Car Owners and Car Origin |
Row Profiles |
|
American |
European |
Japanese |
Married |
0.362745 |
0.137255 |
0.500000 |
Married with Kids |
0.468468 |
0.135135 |
0.396396 |
Single |
0.297297 |
0.135135 |
0.567568 |
Single with Kids |
0.400000 |
0.066667 |
0.533333 |
Column Profiles |
|
American |
European |
Japanese |
Married |
0.289063 |
0.311111 |
0.307229 |
Married with Kids |
0.406250 |
0.333333 |
0.265060 |
Single |
0.257813 |
0.333333 |
0.379518 |
Single with Kids |
0.046875 |
0.022222 |
0.048193 |
|
Car Owners and Car Origin |
Inertia and Chi-Square Decomposition |
Singular Value |
Principal Inertia |
Chi- Square |
Percent |
Cumulative Percent |
19 38 57 76 95 ----+----+----+----+----+--- |
0.15122 |
0.02287 |
7.75160 |
92.84 |
92.84 |
************************ |
0.04200 |
0.00176 |
0.59787 |
7.16 |
100.00 |
** |
Total |
0.02463 |
8.34947 |
100.00 |
|
|
Degrees of Freedom = 6 |
Row Coordinates |
|
Dim1 |
Dim2 |
Married |
-0.0278 |
0.0134 |
Married with Kids |
0.1991 |
0.0064 |
Single |
-0.1716 |
0.0076 |
Single with Kids |
-0.0144 |
-0.1947 |
Summary Statistics for the Row Points |
|
Quality |
Mass |
Inertia |
Married |
1.0000 |
0.3009 |
0.0117 |
Married with Kids |
1.0000 |
0.3274 |
0.5276 |
Single |
1.0000 |
0.3274 |
0.3922 |
Single with Kids |
1.0000 |
0.0442 |
0.0685 |
|
Car Owners and Car Origin |
Partial Contributions to Inertia for the Row Points |
|
Dim1 |
Dim2 |
Married |
0.0102 |
0.0306 |
Married with Kids |
0.5678 |
0.0076 |
Single |
0.4217 |
0.0108 |
Single with Kids |
0.0004 |
0.9511 |
Indices of the Coordinates that Contribute Most to Inertia for the Row Points |
|
Dim1 |
Dim2 |
Best |
Married |
0 |
0 |
2 |
Married with Kids |
1 |
0 |
1 |
Single |
1 |
0 |
1 |
Single with Kids |
0 |
2 |
2 |
Squared Cosines for the Row Points |
|
Dim1 |
Dim2 |
Married |
0.8121 |
0.1879 |
Married with Kids |
0.9990 |
0.0010 |
Single |
0.9980 |
0.0020 |
Single with Kids |
0.0054 |
0.9946 |
|
Car Owners and Car Origin |
Column Coordinates |
|
Dim1 |
Dim2 |
American |
0.1847 |
-0.0166 |
European |
0.0013 |
0.1073 |
Japanese |
-0.1428 |
-0.0163 |
Summary Statistics for the Column Points |
|
Quality |
Mass |
Inertia |
American |
1.0000 |
0.3776 |
0.5273 |
European |
1.0000 |
0.1327 |
0.0621 |
Japanese |
1.0000 |
0.4897 |
0.4106 |
|
Car Owners and Car Origin |
Partial Contributions to Inertia for the Column Points |
|
Dim1 |
Dim2 |
American |
0.5634 |
0.0590 |
European |
0.0000 |
0.8672 |
Japanese |
0.4366 |
0.0737 |
Indices of the Coordinates that Contribute Most to Inertia for the Column Points |
|
Dim1 |
Dim2 |
Best |
American |
1 |
0 |
1 |
European |
0 |
2 |
2 |
Japanese |
1 |
0 |
1 |
Squared Cosines for the Column Points |
|
Dim1 |
Dim2 |
American |
0.9920 |
0.0080 |
European |
0.0001 |
0.9999 |
Japanese |
0.9871 |
0.0129 |
|
Output 24.1.2: Plot of Simple Correspondence Analysis
of a Contingency Table
To interpret the plot, start by interpreting the row points separately
from the column points. The European point is near and to the left of
the centroid, so it makes a relatively small contribution to the
chi-square statistic (because it is near the centroid), it contributes
almost nothing to the inertia of dimension one (since its coordinate on
dimension one has a small absolute value relative to the other column
points), and it makes a relatively large contribution to the inertia of
dimension two (since its coordinate on dimension two has a large
absolute value relative to the other column points). Its squared
cosines for dimension one and two, approximately 0 and 1, respectively,
indicate that its position is almost completely determined by its location on
dimension two. Its quality of display is 1.0, indicating perfect
quality, since the table is two-dimensional after the centering. The
American and Japanese points are far from the centroid, and they lie along
dimension one. They make relatively large contributions to the
chi-square statistic and the inertia of dimension one. The horizontal
dimension seems to be largely determined by Japanese versus American car
ownership.
In the row points, the Married point is near the centroid, and the
Single with Kids point has a small coordinate on dimension one that is
near zero. The horizontal dimension seems to be largely determined by the
Single versus the Married with Kids points. The two interpretations of
dimension one show the association with being Married with Kids and
owning an American car, and being single and owning a Japanese car.
The fact that the Married with Kids point is close to the American
point and the fact that the Japanese point is near the Single point
should be ignored. Distances between row and column points are not
defined.
The plot shows that more people who are married with kids than you would
expect if the rows and columns were independent drive an American car,
and more people who are single than you would expect if the rows and
columns were independent drive a Japanese car.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.