Example 22.3: Logistic Regression, Standard Response Function
In this data set, from Cox and Snell (1989), ingots are
prepared with different heating and soaking times and tested
for their readiness to be rolled. The response variable
Y has value 1 for ingots that are not ready and value 0
otherwise. The explanatory variables are Heat and
Soak.
title 'Maximum Likelihood Logistic Regression';
data ingots;
input Heat Soak nready ntotal @@;
Count=nready;
Y=1;
output;
Count=ntotal-nready;
Y=0;
output;
drop nready ntotal;
datalines;
7 1.0 0 10 14 1.0 0 31 27 1.0 1 56 51 1.0 3 13
7 1.7 0 17 14 1.7 0 43 27 1.7 4 44 51 1.7 0 1
7 2.2 0 7 14 2.2 2 33 27 2.2 0 21 51 2.2 0 1
7 2.8 0 12 14 2.8 0 31 27 2.8 1 22 51 4.0 0 1
7 4.0 0 9 14 4.0 0 19 27 4.0 1 16
;
Logistic regression analysis is often used to investigate
the relationship between discrete response variables and
continuous explanatory variables. For logistic regression,
the continuous design-effects are declared in a
DIRECT statement. The following statements produce
Output 22.3.1 through Output 22.3.7.
proc catmod data=ingots;
weight Count;
direct Heat Soak;
model Y=Heat Soak / freq covb corrb;
quit;
Output 22.3.1: Maximum Likelihood Logistic Regression
Maximum Likelihood Logistic Regression |
Response |
Y |
Response Levels |
2 |
Weight Variable |
Count |
Populations |
19 |
Data Set |
INGOTS |
Total Frequency |
387 |
Frequency Missing |
0 |
Observations |
25 |
Population Profiles |
Sample |
Heat |
Soak |
Sample Size |
1 |
7 |
1 |
10 |
2 |
7 |
1.7 |
17 |
3 |
7 |
2.2 |
7 |
4 |
7 |
2.8 |
12 |
5 |
7 |
4 |
9 |
6 |
14 |
1 |
31 |
7 |
14 |
1.7 |
43 |
8 |
14 |
2.2 |
33 |
9 |
14 |
2.8 |
31 |
10 |
14 |
4 |
19 |
11 |
27 |
1 |
56 |
12 |
27 |
1.7 |
44 |
13 |
27 |
2.2 |
21 |
14 |
27 |
2.8 |
22 |
15 |
27 |
4 |
16 |
16 |
51 |
1 |
13 |
17 |
51 |
1.7 |
1 |
18 |
51 |
2.2 |
1 |
19 |
51 |
4 |
1 |
|
You can verify that the populations are defined as you
intended by looking at the "Population Profiles"
table in Output 22.3.1.
Output 22.3.2: Response Summaries
Maximum Likelihood Logistic Regression |
Response Profiles |
Response |
Y |
1 |
0 |
2 |
1 |
Response Frequencies |
Sample |
Response Number |
1 |
2 |
1 |
10 |
0 |
2 |
17 |
0 |
3 |
7 |
0 |
4 |
12 |
0 |
5 |
9 |
0 |
6 |
31 |
0 |
7 |
43 |
0 |
8 |
31 |
2 |
9 |
31 |
0 |
10 |
19 |
0 |
11 |
55 |
1 |
12 |
40 |
4 |
13 |
21 |
0 |
14 |
21 |
1 |
15 |
15 |
1 |
16 |
10 |
3 |
17 |
1 |
0 |
18 |
1 |
0 |
19 |
1 |
0 |
|
Since the "Response Profiles" table shows the
response level ordering as 0, 1, the default response
function, the logit, is defined as log([(pY = 0)/(pY = 1)]).
Output 22.3.3: Iteration History
Maximum Likelihood Logistic Regression |
Maximum Likelihood Analysis |
Iteration |
Sub Iteration |
-2 Log Likelihood |
Convergence Criterion |
Parameter Estimates |
1 |
2 |
3 |
0 |
0 |
536.49592 |
1.0000 |
0 |
0 |
0 |
1 |
0 |
152.58961 |
0.7156 |
2.1594 |
-0.0139 |
-0.003733 |
2 |
0 |
106.76066 |
0.3003 |
3.5334 |
-0.0363 |
-0.0120 |
3 |
0 |
96.692171 |
0.0943 |
4.7489 |
-0.0640 |
-0.0299 |
4 |
0 |
95.383825 |
0.0135 |
5.4138 |
-0.0790 |
-0.0498 |
5 |
0 |
95.345659 |
0.000400 |
5.5539 |
-0.0819 |
-0.0564 |
6 |
0 |
95.345613 |
4.8289E-7 |
5.5592 |
-0.0820 |
-0.0568 |
7 |
0 |
95.345613 |
7.73E-13 |
5.5592 |
-0.0820 |
-0.0568 |
Maximum likelihood computations converged. |
|
Seven Newton-Raphson iterations are required to find the
maximum likelihood estimates.
Output 22.3.4: Analysis of Variance Table
Maximum Likelihood Logistic Regression |
Maximum Likelihood Analysis of Variance |
Source |
DF |
Chi-Square |
Pr > ChiSq |
Intercept |
1 |
24.65 |
<.0001 |
Heat |
1 |
11.95 |
0.0005 |
Soak |
1 |
0.03 |
0.8639 |
Likelihood Ratio |
16 |
13.75 |
0.6171 |
|
The analysis of variance table (Output 22.3.4) shows that
the model fits since the likelihood ratio goodness-of-fit
test is nonsignificant. It also shows that the length of
heating time is a significant factor with respect to
readiness but that length of soaking time is not.
Output 22.3.5: Maximum Likelihood Estimates
Maximum Likelihood Logistic Regression |
Analysis of Maximum Likelihood Estimates |
Effect |
Parameter |
Estimate |
Standard Error |
Chi- Square |
Pr > ChiSq |
Intercept |
1 |
5.5592 |
1.1197 |
24.65 |
<.0001 |
Heat |
2 |
-0.0820 |
0.0237 |
11.95 |
0.0005 |
Soak |
3 |
-0.0568 |
0.3312 |
0.03 |
0.8639 |
|
Output 22.3.6: Covariance Matrix
Maximum Likelihood Logistic Regression |
Covariance Matrix of the Maximum Likelihood Estimates |
|
1 |
2 |
3 |
1 |
1.2537133 |
-0.0215664 |
-0.2817648 |
2 |
-0.0215664 |
0.0005633 |
0.0026243 |
3 |
-0.2817648 |
0.0026243 |
0.1097020 |
|
Output 22.3.7: Correlation Matrix
Maximum Likelihood Logistic Regression |
Correlation Matrix of the Maximum Likelihood Estimates |
|
1 |
2 |
3 |
1 |
1.00000 |
-0.81152 |
-0.75977 |
2 |
-0.81152 |
1.00000 |
0.33383 |
3 |
-0.75977 |
0.33383 |
1.00000 |
|
From the table of maximum likelihood estimates
(Output 22.3.5), the fitted model is
-
E( logit(p)) = 5.559 - 0.082( Heat) - 0.057( Soak)
For example, for Sample 1 with Heat =7 and
Soak =1, the estimate is
-
E( logit(p)) = 5.559 - 0.082(7) - 0.057(1) = 4.9284
Predicted values of the logits, as well as the probabilities
of readiness, could be obtained by specifying PRED=PROB in
the MODEL statement. For the example of Sample 1 with
Heat =7 and Soak =1, PRED=PROB would give an
estimate of the probability of readiness equal to 0.9928
since
implies that
As another consideration, since soaking time is
nonsignificant, you could fit another model that deleted the
variable Soak.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.