Example 55.2: Predicting Weight by Height and Age
In this example, the weights of school children are
modeled as a function of their heights and ages.
Modeling is performed separately for boys and girls.
The example shows the use of a BY statement with
PROC REG, multiple MODEL statements, and the OUTEST=
and OUTSSCP= options, which create data sets.
Since the BY statement is used, interactive
processing is not possible in this example; no
statements can appear after the first RUN statement.
The following statements produce Output 55.2.1
through Output 55.2.4:
*------------Data on Age, Weight, and Height of Children-------*
| Age (months), height (inches), and weight (pounds) were |
| recorded for a group of school children. |
| From Lewis and Taylor (1967). |
*--------------------------------------------------------------*;
data htwt;
input sex $ age :3.1 height weight @@;
datalines;
f 143 56.3 85.0 f 155 62.3 105.0 f 153 63.3 108.0 f 161 59.0 92.0
f 191 62.5 112.5 f 171 62.5 112.0 f 185 59.0 104.0 f 142 56.5 69.0
f 160 62.0 94.5 f 140 53.8 68.5 f 139 61.5 104.0 f 178 61.5 103.5
f 157 64.5 123.5 f 149 58.3 93.0 f 143 51.3 50.5 f 145 58.8 89.0
f 191 65.3 107.0 f 150 59.5 78.5 f 147 61.3 115.0 f 180 63.3 114.0
f 141 61.8 85.0 f 140 53.5 81.0 f 164 58.0 83.5 f 176 61.3 112.0
f 185 63.3 101.0 f 166 61.5 103.5 f 175 60.8 93.5 f 180 59.0 112.0
f 210 65.5 140.0 f 146 56.3 83.5 f 170 64.3 90.0 f 162 58.0 84.0
f 149 64.3 110.5 f 139 57.5 96.0 f 186 57.8 95.0 f 197 61.5 121.0
f 169 62.3 99.5 f 177 61.8 142.5 f 185 65.3 118.0 f 182 58.3 104.5
f 173 62.8 102.5 f 166 59.3 89.5 f 168 61.5 95.0 f 169 62.0 98.5
f 150 61.3 94.0 f 184 62.3 108.0 f 139 52.8 63.5 f 147 59.8 84.5
f 144 59.5 93.5 f 177 61.3 112.0 f 178 63.5 148.5 f 197 64.8 112.0
f 146 60.0 109.0 f 145 59.0 91.5 f 147 55.8 75.0 f 145 57.8 84.0
f 155 61.3 107.0 f 167 62.3 92.5 f 183 64.3 109.5 f 143 55.5 84.0
f 183 64.5 102.5 f 185 60.0 106.0 f 148 56.3 77.0 f 147 58.3 111.5
f 154 60.0 114.0 f 156 54.5 75.0 f 144 55.8 73.5 f 154 62.8 93.5
f 152 60.5 105.0 f 191 63.3 113.5 f 190 66.8 140.0 f 140 60.0 77.0
f 148 60.5 84.5 f 189 64.3 113.5 f 143 58.3 77.5 f 178 66.5 117.5
f 164 65.3 98.0 f 157 60.5 112.0 f 147 59.5 101.0 f 148 59.0 95.0
f 177 61.3 81.0 f 171 61.5 91.0 f 172 64.8 142.0 f 190 56.8 98.5
f 183 66.5 112.0 f 143 61.5 116.5 f 179 63.0 98.5 f 186 57.0 83.5
f 182 65.5 133.0 f 182 62.0 91.5 f 142 56.0 72.5 f 165 61.3 106.5
f 165 55.5 67.0 f 154 61.0 122.5 f 150 54.5 74.0 f 155 66.0 144.5
f 163 56.5 84.0 f 141 56.0 72.5 f 147 51.5 64.0 f 210 62.0 116.0
f 171 63.0 84.0 f 167 61.0 93.5 f 182 64.0 111.5 f 144 61.0 92.0
f 193 59.8 115.0 f 141 61.3 85.0 f 164 63.3 108.0 f 186 63.5 108.0
f 169 61.5 85.0 f 175 60.3 86.0 f 180 61.3 110.5 m 165 64.8 98.0
m 157 60.5 105.0 m 144 57.3 76.5 m 150 59.5 84.0 m 150 60.8 128.0
m 139 60.5 87.0 m 189 67.0 128.0 m 183 64.8 111.0 m 147 50.5 79.0
m 146 57.5 90.0 m 160 60.5 84.0 m 156 61.8 112.0 m 173 61.3 93.0
m 151 66.3 117.0 m 141 53.3 84.0 m 150 59.0 99.5 m 164 57.8 95.0
m 153 60.0 84.0 m 206 68.3 134.0 m 250 67.5 171.5 m 176 63.8 98.5
m 176 65.0 118.5 m 140 59.5 94.5 m 185 66.0 105.0 m 180 61.8 104.0
m 146 57.3 83.0 m 183 66.0 105.5 m 140 56.5 84.0 m 151 58.3 86.0
m 151 61.0 81.0 m 144 62.8 94.0 m 160 59.3 78.5 m 178 67.3 119.5
m 193 66.3 133.0 m 162 64.5 119.0 m 164 60.5 95.0 m 186 66.0 112.0
m 143 57.5 75.0 m 175 64.0 92.0 m 175 68.0 112.0 m 175 63.5 98.5
m 173 69.0 112.5 m 170 63.8 112.5 m 174 66.0 108.0 m 164 63.5 108.0
m 144 59.5 88.0 m 156 66.3 106.0 m 149 57.0 92.0 m 144 60.0 117.5
m 147 57.0 84.0 m 188 67.3 112.0 m 169 62.0 100.0 m 172 65.0 112.0
m 150 59.5 84.0 m 193 67.8 127.5 m 157 58.0 80.5 m 168 60.0 93.5
m 140 58.5 86.5 m 156 58.3 92.5 m 156 61.5 108.5 m 158 65.0 121.0
m 184 66.5 112.0 m 156 68.5 114.0 m 144 57.0 84.0 m 176 61.5 81.0
m 168 66.5 111.5 m 149 52.5 81.0 m 142 55.0 70.0 m 188 71.0 140.0
m 203 66.5 117.0 m 142 58.8 84.0 m 189 66.3 112.0 m 188 65.8 150.5
m 200 71.0 147.0 m 152 59.5 105.0 m 174 69.8 119.5 m 166 62.5 84.0
m 145 56.5 91.0 m 143 57.5 101.0 m 163 65.3 117.5 m 166 67.3 121.0
m 182 67.0 133.0 m 173 66.0 112.0 m 155 61.8 91.5 m 162 60.0 105.0
m 177 63.0 111.0 m 177 60.5 112.0 m 175 65.5 114.0 m 166 62.0 91.0
m 150 59.0 98.0 m 150 61.8 118.0 m 188 63.3 115.5 m 163 66.0 112.0
m 171 61.8 112.0 m 162 63.0 91.0 m 141 57.5 85.0 m 174 63.0 112.0
m 142 56.0 87.5 m 148 60.5 118.0 m 140 56.8 83.5 m 160 64.0 116.0
m 144 60.0 89.0 m 206 69.5 171.5 m 159 63.3 112.0 m 149 56.3 72.0
m 193 72.0 150.0 m 194 65.3 134.5 m 152 60.8 97.0 m 146 55.0 71.5
m 139 55.0 73.5 m 186 66.5 112.0 m 161 56.8 75.0 m 153 64.8 128.0
m 196 64.5 98.0 m 164 58.0 84.0 m 159 62.8 99.0 m 178 63.8 112.0
m 153 57.8 79.5 m 155 57.3 80.5 m 178 63.5 102.5 m 142 55.0 76.0
m 164 66.5 112.0 m 189 65.0 114.0 m 164 61.5 140.0 m 167 62.0 107.5
m 151 59.3 87.0
;
title '----- Data on age, weight, and height of children ------';
proc reg outest=est1 outsscp=sscp1 rsquare;
by sex;
eq1: model weight=height;
eq2: model weight=height age;
proc print data=sscp1;
title2 'SSCP type data set';
proc print data=est1;
title2 'EST type data set';
run;
Output 55.2.1: Height and Weight Data: Female Children
----- Data on age, weight, and height of children ------ |
The REG Procedure |
Model: EQ1 |
Dependent Variable: weight |
Analysis of Variance |
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
Model |
1 |
21507 |
21507 |
141.09 |
<.0001 |
Error |
109 |
16615 |
152.42739 |
|
|
Corrected Total |
110 |
38121 |
|
|
|
Root MSE |
12.34615 |
R-Square |
0.5642 |
Dependent Mean |
98.87838 |
Adj R-Sq |
0.5602 |
Coeff Var |
12.48620 |
|
|
Parameter Estimates |
Variable |
DF |
Parameter Estimate |
Standard Error |
t Value |
Pr > |t| |
Intercept |
1 |
-153.12891 |
21.24814 |
-7.21 |
<.0001 |
height |
1 |
4.16361 |
0.35052 |
11.88 |
<.0001 |
|
----- Data on age, weight, and height of children ------ |
The REG Procedure |
Model: EQ2 |
Dependent Variable: weight |
Analysis of Variance |
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
Model |
2 |
22432 |
11216 |
77.21 |
<.0001 |
Error |
108 |
15689 |
145.26700 |
|
|
Corrected Total |
110 |
38121 |
|
|
|
Root MSE |
12.05268 |
R-Square |
0.5884 |
Dependent Mean |
98.87838 |
Adj R-Sq |
0.5808 |
Coeff Var |
12.18939 |
|
|
Parameter Estimates |
Variable |
DF |
Parameter Estimate |
Standard Error |
t Value |
Pr > |t| |
Intercept |
1 |
-150.59698 |
20.76730 |
-7.25 |
<.0001 |
height |
1 |
3.60378 |
0.40777 |
8.84 |
<.0001 |
age |
1 |
1.90703 |
0.75543 |
2.52 |
0.0130 |
|
Output 55.2.2: Height and Weight Data: Male Children
----- Data on age, weight, and height of children ------ |
The REG Procedure |
Model: EQ1 |
Dependent Variable: weight |
Analysis of Variance |
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
Model |
1 |
31126 |
31126 |
206.24 |
<.0001 |
Error |
124 |
18714 |
150.92222 |
|
|
Corrected Total |
125 |
49840 |
|
|
|
Root MSE |
12.28504 |
R-Square |
0.6245 |
Dependent Mean |
103.44841 |
Adj R-Sq |
0.6215 |
Coeff Var |
11.87552 |
|
|
Parameter Estimates |
Variable |
DF |
Parameter Estimate |
Standard Error |
t Value |
Pr > |t| |
Intercept |
1 |
-125.69807 |
15.99362 |
-7.86 |
<.0001 |
height |
1 |
3.68977 |
0.25693 |
14.36 |
<.0001 |
|
----- Data on age, weight, and height of children ------ |
The REG Procedure |
Model: EQ2 |
Dependent Variable: weight |
Analysis of Variance |
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
Model |
2 |
32975 |
16487 |
120.24 |
<.0001 |
Error |
123 |
16866 |
137.11922 |
|
|
Corrected Total |
125 |
49840 |
|
|
|
Root MSE |
11.70979 |
R-Square |
0.6616 |
Dependent Mean |
103.44841 |
Adj R-Sq |
0.6561 |
Coeff Var |
11.31945 |
|
|
Parameter Estimates |
Variable |
DF |
Parameter Estimate |
Standard Error |
t Value |
Pr > |t| |
Intercept |
1 |
-113.71346 |
15.59021 |
-7.29 |
<.0001 |
height |
1 |
2.68075 |
0.36809 |
7.28 |
<.0001 |
age |
1 |
3.08167 |
0.83927 |
3.67 |
0.0004 |
|
For both females and males, the overall F statistics
for both models are significant, indicating that the model
explains a significant portion of the variation in the data.
For females, the full model is
-
weight = -150.57 + 3.60 × height + 1.91 × age
and, for males, the full model is
-
weight = -113.71 + 2.68 × height + 3.08 × age
Output 55.2.3: SSCP Matrix
----- Data on age, weight, and height of children ------ |
SSCP type data set |
Obs |
sex |
_TYPE_ |
_NAME_ |
Intercept |
height |
weight |
age |
1 |
f |
SSCP |
Intercept |
111.0 |
6718.40 |
10975.50 |
1824.90 |
2 |
f |
SSCP |
height |
6718.4 |
407879.32 |
669469.85 |
110818.32 |
3 |
f |
SSCP |
weight |
10975.5 |
669469.85 |
1123360.75 |
182444.95 |
4 |
f |
SSCP |
age |
1824.9 |
110818.32 |
182444.95 |
30363.81 |
5 |
f |
N |
|
111.0 |
111.00 |
111.00 |
111.00 |
6 |
m |
SSCP |
Intercept |
126.0 |
7825.00 |
13034.50 |
2072.10 |
7 |
m |
SSCP |
height |
7825.0 |
488243.60 |
817919.60 |
129432.57 |
8 |
m |
SSCP |
weight |
13034.5 |
817919.60 |
1398238.75 |
217717.45 |
9 |
m |
SSCP |
age |
2072.1 |
129432.57 |
217717.45 |
34515.95 |
10 |
m |
N |
|
126.0 |
126.00 |
126.00 |
126.00 |
|
The OUTSSCP= data set is shown in Output 55.2.3.
Note how the BY groups are separated.
Observations with _TYPE_=`N' contain the number
of observations in the associated BY group.
Observations with _TYPE_=`SSCP' contain the rows of
the uncorrected sums of squares and crossproducts matrix.
The observations with _NAME_=`Intercept'
contain crossproducts for the intercept.
Output 55.2.4: OUTEST Data Set
----- Data on age, weight, and height of children ------ |
EST type data set |
Obs |
sex |
_MODEL_ |
_TYPE_ |
_DEPVAR_ |
_RMSE_ |
Intercept |
height |
weight |
age |
_IN_ |
_P_ |
_EDF_ |
_RSQ_ |
1 |
f |
EQ1 |
PARMS |
weight |
12.3461 |
-153.129 |
4.16361 |
-1 |
. |
1 |
2 |
109 |
0.56416 |
2 |
f |
EQ2 |
PARMS |
weight |
12.0527 |
-150.597 |
3.60378 |
-1 |
1.90703 |
2 |
3 |
108 |
0.58845 |
3 |
m |
EQ1 |
PARMS |
weight |
12.2850 |
-125.698 |
3.68977 |
-1 |
. |
1 |
2 |
124 |
0.62451 |
4 |
m |
EQ2 |
PARMS |
weight |
11.7098 |
-113.713 |
2.68075 |
-1 |
3.08167 |
2 |
3 |
123 |
0.66161 |
|
The OUTEST= data set is displayed in Output 55.2.4;
again, the BY groups are separated.
The _MODEL_ column contains the
labels for models from the MODEL statements.
If no labels are specified, the defaults MODEL1
and MODEL2 would appear as values for _MODEL_.
Note that _TYPE_=`PARMS' for all observations, indicating
that all observations contain parameter estimates.
The _DEPVAR_ column displays the dependent variable, and the _RMSE_ column
gives the Root Mean Square Error for the associated model.
The Intercept column gives the estimate for the intercept for
the associated model, and variables with the same
name as variables in the original data set (height,
age) give parameter estimates for those variables.
The dependent variable,
weight, is shown with a value of -1. The _IN_ column contains
the number of regressors in the model not including the intercept;
_P_ contains the number of parameters in the model; _EDF_
contains the error degrees of freedom; and _RSQ_ contains the
R2 statistic. Finally, note that the
_IN_, _P_, _EDF_ and _RSQ_ columns
appear in the OUTEST= data set since the RSQUARE option is specified in
the PROC REG statement.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.