Example 30.6: Multivariate Analysis of Variance
The following example employs multivariate analysis of variance
(MANOVA) to measure differences in the chemical characteristics of
ancient pottery found at four kiln sites in Great Britain. The data
are from Tubb, Parker, and Nickless (1980), as reported in Hand et al.
(1994).
For each of 26 samples of pottery, the percentages of oxides of five
metals are measured. The following statements create the data set
and invoke the GLM procedure to perform a one-way MANOVA.
Additionally, it is of interest to know whether the pottery from one
site in Wales (Llanederyn) differs from the samples from other sites;
a CONTRAST statement is used to test this hypothesis.
data pottery;
title1 "Romano-British Pottery";
input Site $12. Al Fe Mg Ca Na;
datalines;
Llanederyn 14.4 7.00 4.30 0.15 0.51
Llanederyn 13.8 7.08 3.43 0.12 0.17
Llanederyn 14.6 7.09 3.88 0.13 0.20
Llanederyn 11.5 6.37 5.64 0.16 0.14
Llanederyn 13.8 7.06 5.34 0.20 0.20
Llanederyn 10.9 6.26 3.47 0.17 0.22
Llanederyn 10.1 4.26 4.26 0.20 0.18
Llanederyn 11.6 5.78 5.91 0.18 0.16
Llanederyn 11.1 5.49 4.52 0.29 0.30
Llanederyn 13.4 6.92 7.23 0.28 0.20
Llanederyn 12.4 6.13 5.69 0.22 0.54
Llanederyn 13.1 6.64 5.51 0.31 0.24
Llanederyn 12.7 6.69 4.45 0.20 0.22
Llanederyn 12.5 6.44 3.94 0.22 0.23
Caldicot 11.8 5.44 3.94 0.30 0.04
Caldicot 11.6 5.39 3.77 0.29 0.06
IslandThorns 18.3 1.28 0.67 0.03 0.03
IslandThorns 15.8 2.39 0.63 0.01 0.04
IslandThorns 18.0 1.50 0.67 0.01 0.06
IslandThorns 18.0 1.88 0.68 0.01 0.04
IslandThorns 20.8 1.51 0.72 0.07 0.10
AshleyRails 17.7 1.12 0.56 0.06 0.06
AshleyRails 18.3 1.14 0.67 0.06 0.05
AshleyRails 16.7 0.92 0.53 0.01 0.05
AshleyRails 14.8 2.74 0.67 0.03 0.05
AshleyRails 19.1 1.64 0.60 0.10 0.03
;
proc glm data=pottery;
class Site;
model Al Fe Mg Ca Na = Site;
contrast 'Llanederyn vs. the rest' Site 1 1 1 -3;
manova h=_all_ / printe printh;
run;
After the summary information, displayed in Output 30.6.1, PROC GLM
produces the univariate analyses for each of the dependent variables,
as shown in Output 30.6.2. These analyses show that sites are
significantly different for all oxides individually. You can suppress
these univariate analyses by specifying the NOUNI option in the MODEL
statement.
Output 30.6.1: Summary Information on Groups
Class Level Information |
Class |
Levels |
Values |
Site |
4 |
AshleyRails Caldicot IslandThorns Llanederyn |
Number of observations |
26 |
|
Output 30.6.2: Univariate Analysis of Variance for Each Dependent
The GLM Procedure |
Dependent Variable: Al |
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
Model |
3 |
175.6103187 |
58.5367729 |
26.67 |
<.0001 |
Error |
22 |
48.2881429 |
2.1949156 |
|
|
Corrected Total |
25 |
223.8984615 |
|
|
|
R-Square |
Coeff Var |
Root MSE |
Al Mean |
0.784330 |
10.22284 |
1.481525 |
14.49231 |
Source |
DF |
Type I SS |
Mean Square |
F Value |
Pr > F |
Site |
3 |
175.6103187 |
58.5367729 |
26.67 |
<.0001 |
Source |
DF |
Type III SS |
Mean Square |
F Value |
Pr > F |
Site |
3 |
175.6103187 |
58.5367729 |
26.67 |
<.0001 |
Contrast |
DF |
Contrast SS |
Mean Square |
F Value |
Pr > F |
Llanederyn vs. the rest |
1 |
58.58336640 |
58.58336640 |
26.69 |
<.0001 |
|
The GLM Procedure |
Dependent Variable: Fe |
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
Model |
3 |
134.2216158 |
44.7405386 |
89.88 |
<.0001 |
Error |
22 |
10.9508457 |
0.4977657 |
|
|
Corrected Total |
25 |
145.1724615 |
|
|
|
R-Square |
Coeff Var |
Root MSE |
Fe Mean |
0.924567 |
15.79171 |
0.705525 |
4.467692 |
Source |
DF |
Type I SS |
Mean Square |
F Value |
Pr > F |
Site |
3 |
134.2216158 |
44.7405386 |
89.88 |
<.0001 |
Source |
DF |
Type III SS |
Mean Square |
F Value |
Pr > F |
Site |
3 |
134.2216158 |
44.7405386 |
89.88 |
<.0001 |
Contrast |
DF |
Contrast SS |
Mean Square |
F Value |
Pr > F |
Llanederyn vs. the rest |
1 |
71.15144132 |
71.15144132 |
142.94 |
<.0001 |
|
The GLM Procedure |
Dependent Variable: Mg |
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
Model |
3 |
103.3505270 |
34.4501757 |
49.12 |
<.0001 |
Error |
22 |
15.4296114 |
0.7013460 |
|
|
Corrected Total |
25 |
118.7801385 |
|
|
|
R-Square |
Coeff Var |
Root MSE |
Mg Mean |
0.870099 |
26.65777 |
0.837464 |
3.141538 |
Source |
DF |
Type I SS |
Mean Square |
F Value |
Pr > F |
Site |
3 |
103.3505270 |
34.4501757 |
49.12 |
<.0001 |
Source |
DF |
Type III SS |
Mean Square |
F Value |
Pr > F |
Site |
3 |
103.3505270 |
34.4501757 |
49.12 |
<.0001 |
Contrast |
DF |
Contrast SS |
Mean Square |
F Value |
Pr > F |
Llanederyn vs. the rest |
1 |
56.59349339 |
56.59349339 |
80.69 |
<.0001 |
|
The GLM Procedure |
Dependent Variable: Ca |
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
Model |
3 |
0.20470275 |
0.06823425 |
29.16 |
<.0001 |
Error |
22 |
0.05148571 |
0.00234026 |
|
|
Corrected Total |
25 |
0.25618846 |
|
|
|
R-Square |
Coeff Var |
Root MSE |
Ca Mean |
0.799032 |
33.01265 |
0.048376 |
0.146538 |
Source |
DF |
Type I SS |
Mean Square |
F Value |
Pr > F |
Site |
3 |
0.20470275 |
0.06823425 |
29.16 |
<.0001 |
Source |
DF |
Type III SS |
Mean Square |
F Value |
Pr > F |
Site |
3 |
0.20470275 |
0.06823425 |
29.16 |
<.0001 |
Contrast |
DF |
Contrast SS |
Mean Square |
F Value |
Pr > F |
Llanederyn vs. the rest |
1 |
0.03531688 |
0.03531688 |
15.09 |
0.0008 |
|
The GLM Procedure |
Dependent Variable: Na |
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
Model |
3 |
0.25824560 |
0.08608187 |
9.50 |
0.0003 |
Error |
22 |
0.19929286 |
0.00905877 |
|
|
Corrected Total |
25 |
0.45753846 |
|
|
|
R-Square |
Coeff Var |
Root MSE |
Na Mean |
0.564424 |
60.06350 |
0.095178 |
0.158462 |
Source |
DF |
Type I SS |
Mean Square |
F Value |
Pr > F |
Site |
3 |
0.25824560 |
0.08608187 |
9.50 |
0.0003 |
Source |
DF |
Type III SS |
Mean Square |
F Value |
Pr > F |
Site |
3 |
0.25824560 |
0.08608187 |
9.50 |
0.0003 |
Contrast |
DF |
Contrast SS |
Mean Square |
F Value |
Pr > F |
Llanederyn vs. the rest |
1 |
0.23344446 |
0.23344446 |
25.77 |
<.0001 |
|
The PRINTE option in the MANOVA statement displays the elements of the
error matrix, also called the Error Sums of Squares and Crossproducts
matrix. See Output 30.6.3. The diagonal elements of this
matrix are the error sums of squares from the corresponding univariate
analyses.
The PRINTE option also displays the partial correlation matrix
associated with the E matrix. In this example, none of the oxides are
very strongly correlated; the strongest correlation (r=0.488) is
between magnesium oxide and calcium oxide.
Output 30.6.3: Error SSCP Matrix and Partial Correlations
The GLM Procedure |
Multivariate Analysis of Variance |
E = Error SSCP Matrix |
|
Al |
Fe |
Mg |
Ca |
Na |
Al |
48.288142857 |
7.0800714286 |
0.6080142857 |
0.1064714286 |
0.5889571429 |
Fe |
7.0800714286 |
10.950845714 |
0.5270571429 |
-0.155194286 |
0.0667585714 |
Mg |
0.6080142857 |
0.5270571429 |
15.429611429 |
0.4353771429 |
0.0276157143 |
Ca |
0.1064714286 |
-0.155194286 |
0.4353771429 |
0.0514857143 |
0.0100785714 |
Na |
0.5889571429 |
0.0667585714 |
0.0276157143 |
0.0100785714 |
0.1992928571 |
Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r| |
DF = 22 |
Al |
Fe |
Mg |
Ca |
Na |
Al |
1.000000
|
0.307889
0.1529 |
0.022275
0.9196 |
0.067526
0.7595 |
0.189853
0.3856 |
Fe |
0.307889
0.1529 |
1.000000
|
0.040547
0.8543 |
-0.206685
0.3440 |
0.045189
0.8378 |
Mg |
0.022275
0.9196 |
0.040547
0.8543 |
1.000000
|
0.488478
0.0180 |
0.015748
0.9431 |
Ca |
0.067526
0.7595 |
-0.206685
0.3440 |
0.488478
0.0180 |
1.000000
|
0.099497
0.6515 |
Na |
0.189853
0.3856 |
0.045189
0.8378 |
0.015748
0.9431 |
0.099497
0.6515 |
1.000000
|
|
The PRINTH option produces the SSCP matrix for the hypotheses being
tested (Site and the contrast); see Output 30.6.3. Since the
Type III SS are the highest level SS produced by PROC GLM by default,
and since the HTYPE= option is not specified, the SSCP matrix for
Site gives the Type III H matrix. The diagonal elements of
this matrix are the model sums of squares from the corresponding
univariate analyses.
Four multivariate tests are computed, all based on the characteristic
roots and vectors of E-1H. These roots and vectors
are displayed along with the tests. All four tests can be transformed
to variates that have F distributions under the null hypothesis. Note
that the four tests all give the same results for the contrast, since
it has only one degree of freedom. In this case, the multivariate
analysis matches the univariate results: there is an overall
difference between the chemical composition of samples from different
sites, and the samples from Llanederyn are different from the average
of the other sites.
Output 30.6.4: Hypothesis SSCP Matrix and Multivariate Tests
The GLM Procedure |
Multivariate Analysis of Variance |
H = Type III SSCP Matrix for Site |
|
Al |
Fe |
Mg |
Ca |
Na |
Al |
175.61031868 |
-149.295533 |
-130.8097066 |
-5.889163736 |
-5.372264835 |
Fe |
-149.295533 |
134.22161582 |
117.74503516 |
4.8217865934 |
5.3259491209 |
Mg |
-130.8097066 |
117.74503516 |
103.35052703 |
4.2091613187 |
4.7105458242 |
Ca |
-5.889163736 |
4.8217865934 |
4.2091613187 |
0.2047027473 |
0.154782967 |
Na |
-5.372264835 |
5.3259491209 |
4.7105458242 |
0.154782967 |
0.2582456044 |
Characteristic Roots and Vectors of: E Inverse * H, where H = Type III SSCP Matrix for Site E = Error SSCP Matrix |
Characteristic Root |
Percent |
Characteristic Vector V'EV=1 |
Al |
Fe |
Mg |
Ca |
Na |
34.1611140 |
96.39 |
0.09562211 |
-0.26330469 |
-0.05305978 |
-1.87982100 |
-0.47071123 |
1.2500994 |
3.53 |
0.02651891 |
-0.01239715 |
0.17564390 |
-4.25929785 |
1.23727668 |
0.0275396 |
0.08 |
0.09082220 |
0.13159869 |
0.03508901 |
-0.15701602 |
-1.39364544 |
0.0000000 |
0.00 |
0.03673984 |
-0.15129712 |
0.20455529 |
0.54624873 |
-0.17402107 |
0.0000000 |
0.00 |
0.06862324 |
0.03056912 |
-0.10662399 |
2.51151978 |
1.23668841 |
MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall Site Effect H = Type III SSCP Matrix for Site E = Error SSCP Matrix S=3 M=0.5 N=8 |
Statistic |
Value |
F Value |
Num DF |
Den DF |
Pr > F |
Wilks' Lambda |
0.01230091 |
13.09 |
15 |
50.091 |
<.0001 |
Pillai's Trace |
1.55393619 |
4.30 |
15 |
60 |
<.0001 |
Hotelling-Lawley Trace |
35.43875302 |
40.59 |
15 |
29.13 |
<.0001 |
Roy's Greatest Root |
34.16111399 |
136.64 |
5 |
20 |
<.0001 |
NOTE: |
F Statistic for Roy's Greatest Root is an upper bound. |
|
|
The GLM Procedure |
Multivariate Analysis of Variance |
H = Contrast SSCP Matrix for Llanederyn vs. the rest |
|
Al |
Fe |
Mg |
Ca |
Na |
Al |
58.583366402 |
-64.56230291 |
-57.57983466 |
-1.438395503 |
-3.698102513 |
Fe |
-64.56230291 |
71.151441323 |
63.456352116 |
1.5851961376 |
4.0755256878 |
Mg |
-57.57983466 |
63.456352116 |
56.593493386 |
1.4137558201 |
3.6347541005 |
Ca |
-1.438395503 |
1.5851961376 |
1.4137558201 |
0.0353168783 |
0.0907993915 |
Na |
-3.698102513 |
4.0755256878 |
3.6347541005 |
0.0907993915 |
0.2334444577 |
Characteristic Roots and Vectors of: E Inverse * H, where H = Contrast SSCP Matrix for Llanederyn vs. the rest E = Error SSCP Matrix |
Characteristic Root |
Percent |
Characteristic Vector V'EV=1 |
Al |
Fe |
Mg |
Ca |
Na |
16.1251646 |
100.00 |
-0.08883488 |
0.25458141 |
0.08723574 |
0.98158668 |
0.71925759 |
0.0000000 |
0.00 |
-0.00503538 |
0.03825743 |
-0.17632854 |
5.16256699 |
-0.01022754 |
0.0000000 |
0.00 |
0.00162771 |
-0.08885364 |
-0.01774069 |
-0.83096817 |
2.17644566 |
0.0000000 |
0.00 |
0.04450136 |
-0.15722494 |
0.22156791 |
0.00000000 |
0.00000000 |
0.0000000 |
0.00 |
0.11939206 |
0.10833549 |
0.00000000 |
0.00000000 |
0.00000000 |
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall Llanederyn vs. the rest Effect H = Contrast SSCP Matrix for Llanederyn vs. the rest E = Error SSCP Matrix S=1 M=1.5 N=8 |
Statistic |
Value |
F Value |
Num DF |
Den DF |
Pr > F |
Wilks' Lambda |
0.05839360 |
58.05 |
5 |
18 |
<.0001 |
Pillai's Trace |
0.94160640 |
58.05 |
5 |
18 |
<.0001 |
Hotelling-Lawley Trace |
16.12516462 |
58.05 |
5 |
18 |
<.0001 |
Roy's Greatest Root |
16.12516462 |
58.05 |
5 |
18 |
<.0001 |
|
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.