PROC GLM for Unbalanced ANOVA

The GLM Procedure

PROC GLM for Unbalanced ANOVA

Analysis of variance, or ANOVA, typically refers to partitioning the variation in a variable's values into variation between and within several groups or classes of observations. The GLM procedure can perform simple or complicated ANOVA for balanced or unbalanced data.

This example discusses a 2 ×2 ANOVA model. The experimental design is a full factorial, in which each level of one treatment factor occurs at each level of the other treatment factor. The data are shown in a table and then read into a SAS data set.

		A
			1	2
			12	20
	1
			14	18
B
			11	17
	2
			9

   title 'Analysis of Unbalanced 2-by-2 Factorial';
   data exp;
      input A $ B $ Y @@;
      datalines;
   A1 B1 12 A1 B1 14     A1 B2 11 A1 B2 9
   A2 B1 20 A2 B1 18     A2 B2 17
   ;

Note that there is only one value for the cell with A=`A2' and B=`B2'. Since one cell contains a different number of values from the other cells in the table, this is an unbalanced design.

The following PROC GLM invocation produces the analysis.

   proc glm;
      class A B;
      model Y=A B A*B;
   run;

Both treatments are listed in the CLASS statement because they are classification variables. A*B denotes the interaction of the A effect and the B effect. The results are shown in Figure 30.1 and Figure 30.2.

Analysis of Unbalanced 2-by-2 Factorial

The GLM Procedure

Class Level Information
Class	Levels	Values
A	2	A1 A2
B	2	B1 B2

Number of observations	7

Figure 30.1: Class Level Information

Figure 30.1 displays information about the classes as well as the number of observations in the data set. Figure 30.2 shows the ANOVA table, simple statistics, and tests of effects.

Analysis of Unbalanced 2-by-2 Factorial

The GLM Procedure

Dependent Variable: Y

Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	3	91.71428571	30.57142857	15.29	0.0253
Error	3	6.00000000	2.00000000
Corrected Total	6	97.71428571

R-Square	Coeff Var	Root MSE	Y Mean
0.938596	9.801480	1.414214	14.42857

Source	DF	Type I SS	Mean Square	F Value	Pr > F
A	1	80.04761905	80.04761905	40.02	0.0080
B	1	11.26666667	11.26666667	5.63	0.0982
*AB**	1	0.40000000	0.40000000	0.20	0.6850

Source	DF	Type III SS	Mean Square	F Value	Pr > F
A	1	67.60000000	67.60000000	33.80	0.0101
B	1	10.00000000	10.00000000	5.00	0.1114
*AB**	1	0.40000000	0.40000000	0.20	0.6850

Figure 30.2: ANOVA Table and Tests of Effects

The degrees of freedom may be used to check your data. The Model degrees of freedom for a 2 ×2 factorial design with interaction are (ab-1), where a is the number of levels of A and b is the number of levels of B; in this case, (2×2-1) = 3. The Corrected Total degrees of freedom are always one less than the number of observations used in the analysis; in this case, 7-1=6.

The overall F test is significant (F=15.29, p=0.0253), indicating strong evidence that the means for the four different A×B cells are different. You can further analyze this difference by examining the individual tests for each effect.

Four types of estimable functions of parameters are available for testing hypotheses in PROC GLM. For data with no missing cells, the Type III and Type IV estimable functions are the same and test the same hypotheses that would be tested if the data were balanced. Type I and Type III sums of squares are typically not equal when the data are unbalanced; Type III sums of squares are preferred in testing effects in unbalanced cases because they test a function of the underlying parameters that is independent of the number of observations per treatment combination.

According to a significance level of 5% $(\alpha = 0.05)$ , the A*B interaction is not significant (F=0.20, p=0.6850). This indicates that the effect of A does not depend on the level of B and vice versa. Therefore, the tests for the individual effects are valid, showing a significant A effect (F=33.80, p=0.0101) but no significant B effect (F=5.00, p=0.1114).

Chapter Contents
Previous
Next
Top