Example 28.4: Analyzing a 2x2 Contingency Table
This example computes chi-square tests and Fisher's exact
test to compare the probability of coronary heart disease
for two types of diet. It also estimates the relative risks
and computes exact confidence limits for the odds ratio.
The data set FatComp contains hypothetical data for a
case-control study of high fat diet and the risk of coronary
heart disease. The data are recorded as cell counts, where
the variable Count contains the frequencies for each
exposure and response combination. The data is sorted in
descending order by the variables Exposure and
Response, so that the first cell of the 2×2 table
contains the frequency of positive exposure and positive
response. The FORMAT procedure creates formats to
identify the type of exposure and response with character
values.
proc format;
value ExpFmt 1='High Cholesterol Diet'
0='Low Cholesterol Diet';
value RspFmt 1='Yes'
0='No';
run;
data FatComp;
input Exposure Response Count;
label Response='Heart Disease';
datalines;
0 0 6
0 1 2
1 0 4
1 1 11
;
proc sort data=FatComp;
by descending Exposure descending Response;
run;
In the following statements, the TABLES statement creates a
two-way table, and the option ORDER=DATA orders the
contingency table values by their order in the data set.
The CHISQ option produces several chi-square tests, while
the RELRISK option produces relative risk measures. The
EXACT statement creates the exact Pearson chi-square test
and exact confidence limits for the odds ratio. These
statements produce Output 28.4.1 through Output 28.4.3.
proc freq data=FatComp order=data;
weight Count;
tables Exposure*Response / chisq relrisk;
exact pchi or;
format Exposure ExpFmt. Response RspFmt.;
title 'Case-Control Study of High Fat/Cholesterol Diet';
run;
Output 28.4.1: Contingency Table
Case-Control Study of High Fat/Cholesterol Diet |
Frequency Percent Row Pct Col Pct |
|
Table of Exposure by Response |
Exposure |
Response(Heart Disease) |
Total |
Yes |
No |
High Cholesterol Diet |
11 47.83 73.33 84.62 |
4 17.39 26.67 40.00 |
15 65.22 |
Low Cholesterol Diet |
2 8.70 25.00 15.38 |
6 26.09 75.00 60.00 |
8 34.78 |
Total |
13 56.52 |
10 43.48 |
23 100.00 |
|
|
The contingency table in Output 28.4.1 displays the variable
values so that the first table cell contains the frequency
for the first cell in the data set, the frequency of positive
exposure and positive response.
Output 28.4.2: Chi-Square Statistics
Case-Control Study of High Fat/Cholesterol Diet |
Statistics for Table of Exposure by Response |
Statistic |
DF |
Value |
Prob |
Chi-Square |
1 |
4.9597 |
0.0259 |
Likelihood Ratio Chi-Square |
1 |
5.0975 |
0.0240 |
Continuity Adj. Chi-Square |
1 |
3.1879 |
0.0742 |
Mantel-Haenszel Chi-Square |
1 |
4.7441 |
0.0294 |
Phi Coefficient |
|
0.4644 |
|
Contingency Coefficient |
|
0.4212 |
|
Cramer's V |
|
0.4644 |
|
WARNING: 50% of the cells have expected counts less than 5. (Asymptotic) Chi-Square may not be a valid test. |
Pearson Chi-Square Test |
Chi-Square |
4.9597 |
DF |
1 |
Asymptotic Pr > ChiSq |
0.0259 |
Exact Pr >= ChiSq |
0.0393 |
Fisher's Exact Test |
Cell (1,1) Frequency (F) |
11 |
Left-sided Pr <= F |
0.9967 |
Right-sided Pr >= F |
0.0367 |
|
|
Table Probability (P) |
0.0334 |
Two-sided Pr <= P |
0.0393 |
|
Since the expected counts in
some of the cells are small, PROC FREQ displays a warning
that the asymptotic chi-square tests may not be appropriate.
In this case, the exact tests in
Output 28.4.2 are appropriate. The alternative hypothesis
for this analysis states that coronary heart disease is more
likely to be associated with a high fat diet, so a
one-sided test is desired. Fisher's exact right-sided
test analyzes whether the probability of heart disease in the
high fat group exceeds the probability of heart disease in
the low fat group; since this p-value is small, the
alternative hypothesis is supported.
Output 28.4.3: Relative Risk
Case-Control Study of High Fat/Cholesterol Diet |
Statistics for Table of Exposure by Response |
Estimates of the Relative Risk (Row1/Row2) |
Type of Study |
Value |
95% Confidence Limits |
Case-Control (Odds Ratio) |
8.2500 |
1.1535 |
59.0029 |
Cohort (Col1 Risk) |
2.9333 |
0.8502 |
10.1204 |
Cohort (Col2 Risk) |
0.3556 |
0.1403 |
0.9009 |
Odds Ratio (Case-Control Study) |
Odds Ratio |
8.2500 |
|
|
Asymptotic Conf Limits |
|
95% Lower Conf Limit |
1.1535 |
95% Upper Conf Limit |
59.0029 |
|
|
Exact Conf Limits |
|
95% Lower Conf Limit |
0.8677 |
95% Upper Conf Limit |
105.5488 |
|
The odds ratio, displayed in Output 28.4.3, provides an
estimate of the relative risk when an event is rare. This
estimate indicates that the odds of heart disease is 8.25
times higher in the high fat diet group; however, the wide
confidence limits indicate that this estimate has low precision.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.