OUTPUT Statement
- OUTPUT < OUT=SAS-data-set > < options >;
The OUTPUT statement creates a new SAS data set that
contains all the variables in the input data set and, optionally, the
estimated linear predictors and
their standard error estimates,
the estimates of the cumulative or individual response probabilities,
and the confidence limits
for the cumulative probabilities.
Regression diagnostic statistics and estimates of crossvalidated
response probabilities are also available for binary response
models. Formulas for the statistics are given in
the "Linear Predictor, Predicted Probability, and Confidence Limits" section and the "Regression Diagnostics" section.
If you use the single-trial syntax,
the data set may also contain a variable named
_LEVEL_, which indicates the level of the response that the
given row of output is referring to. For instance, the value of the
cumulative probability
variable is the probability that
the response variable is as large as the corresponding value of
_LEVEL_.
For details, see the section "OUT= Output Data Set".
The estimated linear predictor, its standard error estimate, all
predicted probabilities, and the confidence limits for the cumulative
probabilities are computed
for all observations in which the explanatory variables have no
missing values, even if the response is missing. By adding observations
with missing response values to the input data set, you can compute
these statistics for new observations or for settings of the
explanatory variables not present in the data without affecting the
model fit.
- OUT= SAS-data-set
-
names the output data set.
If you omit the OUT= option, the output data set is created
and given a default name using the DATAn convention.
The following sections explain options in the
OUTPUT statement, divided into statistic options for
any type of response variable, statistic options only for binary
response, and other options.
The statistic options specify the statistics to be included in the
output data set and
name the new variables that contain the statistics.
- LOWER=name
- L=name
-
specifies the lower confidence limit for the probability of an event response
if events/trials syntax is specified, or the lower confidence
limit for the probability that the response is less than or equal to the
value of
_LEVEL_ if single-trial syntax is specified. See the
ALPHA= option.
- PREDICTED=name
- PRED=name
- PROB=name
- P=name
-
specifies the predicted probability of an event response if events/trials
syntax is specified, or the predicted probability that the response
variable is less than or equal to the value of _LEVEL_ if
single-trial
syntax is specified (in other words, Pr(Y
_LEVEL_), where Y is
the response variable).
- PREDPROBS=(keywords)
-
requests
individual, cumulative, or cross-validated predicted probabilities.
Descriptions of the keywords are as follows.
- INDIVIDUAL | I
- requests the predicted probability of each
response level.
For a response variable Y with three levels, 1, 2, and 3, the
individual probabilities are Pr(Y=1),
Pr(Y=2), and Pr(Y=3).
- CUMULATIVE | C
- requests the cumulative predicted probability of
each response level.
For a response variable Y with three response levels, 1,2, and 3, the
cumulative probabilities are Pr(Y1), Pr(Y2),
and Pr(Y3). The cumulative probability for the last
response level always has the constant value of 1.
- CROSSVALIDATE | XVALIDATE | X
- requests the cross-validated
individual predicted
probability of each response level. These probabilities are derived from
the leave-one-out principle; that is, dropping the data of one subject and
reestimating the parameter estimates. PROC LOGISTIC
uses a less expensive one-step approximation to compute the parameter
estimates. Note that, for ordinal models, the cross validated
probabilities are not computed and are set to missing.
See the end of this section for further details regarding the PREDPROBS=
option.
- STDXBETA=name
-
specifies the standard error estimate of
XBETA
- UPPER=name
- U=name
-
specifies the upper confidence limit for the probability of an event response
if events/trials model is specified, or the upper confidence
limit for the probability that the response is less than or equal to the
value of
_LEVEL_ if single-trial syntax is specified. See the
ALPHA=option.
-
XBETA=name
-
specifies the estimate of the linear predictor
, where i
is the corresponding ordered
value of _LEVEL_.
- C=name
-
specifies the confidence interval displacement diagnostic that measures the
influence of individual observations on the regression
estimates.
- CBAR=name
-
specifies the another confidence interval displacement diagnostic, which measures the
overall change in the global regression estimates due to deleting
an individual observation.
- DFBETAS= _ALL_
- DFBETAS=var-list
-
specifies the standardized differences in the regression estimates for assessing
the effects of individual observations on the estimated regression
parameters in the fitted model. You can specify a list of
up to s+1 variable names, where
s is the number of explanatory variables in the
MODEL statement, or you can specify just the keyword _ALL_. In
the former specification, the first variable contains the
standardized differences in the intercept estimate, the
second variable contains the standardized differences in the
parameter estimate for the first explanatory variable
in the MODEL statement, and so on. In the latter specification,
the DFBETAS statistics are named DFBETA_xxx, where xxx is
the name of the regression parameter. For example, if the model
contains two variables X1 and X2, the specification DFBETAS=_ALL_
produces three DFBETAS statistics named DFBETA_Intercept,
DFBETA_X1, and DFBETA_X2.
If an explanatory variable
is not included in the final model, the corresponding
output variable named in DFBETAS=var-list
contains missing
values.
- DIFCHISQ=name
-
specifies the change in the chi-square goodness-of-fit statistic attributable to
deleting the individual observation.
- DIFDEV=name
-
specifies the change in the deviance attributable to deleting the individual
observation.
- H=name
-
specifies the diagonal element of the hat matrix for detecting extreme points in the
design space.
- RESCHI=name
-
specifies the Pearson (Chi) residual for identifying observations
that are poorly accounted for by the model.
- RESDEV=name
-
specifies the deviance residual for identifying poorly fitted
observations.
-
ALPHA=value
-
sets the confidence level used for the confidence limits for the
appropriate response probabilities. The quantity
value must be between 0 and 1.
By default, ALPHA=0.05, which results in the calculation of
a 95% confidence interval.
You can request any of the three given types of predicted probabilities.
For example, you can request
both the individual predicted probabilities and
the cross-validated probabilities by specifying PREDPROBS=(I X).
When you specify the PREDPROBS= option,
two automatic variables _FROM_ and _INTO_
are included for the single-trial syntax and only one
variable, _INTO_,
is included for the events/trials syntax. The _FROM_ variable contains
the formatted value of the observed response. The variable
_INTO_ contains
the formatted value of the response level with the largest individual
predicted probability.
If you specify PREDPROBS=INDIVIDUAL,
the OUTPUT data set contains k additional
variables representing the individual probabilities, one for each
response level, where k is the
maximum number of response levels across all BY-groups. The names of
these variables have the form IP_xxx, where xxx represents the
particular level. The representation depends on the following
situations.
- If you specify events/trials syntax, xxx is either
`Event' or `Nonevent'. Thus, the variable containing the event
probabilities is
named IP_Event and the variable containing the nonevent probabilities
is named IP_Nonevent.
- If you specify the single-trial syntax with more than
one BY group,
xxx is 1 for the first ordered level of the response, 2 for
the second ordered level of the response, ..., and so forth,
as given in the "Response Profile" table. The variable containing
the predicted probabilities Pr(Y=1) is named
IP_1,
where Y is
the response variable. Similarly, IP_2 is the name of the
variable containing the predicted probabilities
Pr(Y=2), and so on.
- If you specify the single-trial syntax with no BY-group processing,
xxx is the left-justified formatted value of the response
level (the value may be
truncated so that IP_xxx does not exceed 32 characters.)
For example, if Y is the response variable with response levels
`None', `Mild', and `Severe', the variables representing
individual probabilities Pr(Y='None'), P(Y='Mild'), and
P(Y='Severe') are named
IP_None, IP_Mild, and IP_Severe, respectively.
If you specify PREDPROBS=CUMULATIVE, the OUTPUT data set contains
k additional
variables representing the cumulative probabilities, one for each
response level, where k is the
maximum number of response levels across all BY-groups. The names of
these variables have the form CP_xxx, where xxx represents the
particular response level.
The naming convention is similar to that given by
PREDPROBS=INDIVIDUAL. The PREDPROBS=CUMULATIVE values are the same as those
output by the PREDICT=keyword, but are arranged in variables on each output observation rather than in multiple output observations.
If you specify PREDPROBS=CROSSVALIDATE, the OUTPUT data set contains
k additional
variables representing the cross-validated predicted probabilities
of the k response levels, where k is the
maximum number of response levels across all BY-groups.
The names of
these variables have the form XP_xxx, where xxx represents the
particular level. The representation is the same as that given by
PREDPROBS=INDIVIDUAL except that for the events/trials syntax
there are four variables for the cross-validated predicted probabilities
instead of two:
- XP_EVENT_R1E
- is the cross validated predicated probability of
an event when a current event trial is removed.
- XP_NONEVENT_R1E
- is the cross validated predicated probability
of
a nonevent when a current event trial is removed.
- XP_EVENT_R1N
- is the cross validated predicated probability of
an event when a current nonevent trial is removed.
- XP_NONEVENT_R1N
- is the cross validated predicated probability
of
a nonevent when a current nonevent trial is removed.
The cross-validated predicted probabilities are precisely those
used in the CTABLE option. Refer to the "Predicted Probability of an Event for Classification" section
for details of the computation.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.