Getting Started
This example illustrates how you can use PROC NPAR1WAY
to perform a one-way nonparametric analysis.
The data* consist of
weight gain measurements for five different levels of gossypol
additive. Gossypol is a substance contained in cottonseed
shells, and these data were collected to study the effect of
gossypol on animal nutrition.
The following DATA step statements create the SAS
data set Gossypol:
data Gossypol;
input Dose n;
do i=1 to n;
input Gain @@;
output;
end;
datalines;
0 16
228 229 218 216 224 208 235 229 233 219 224 220 232 200 208 232
.04 11
186 229 220 208 228 198 222 273 216 198 213
.07 12
179 193 183 180 143 204 114 188 178 134 208 196
.10 17
130 87 135 116 118 165 151 59 126 64 78 94 150 160 122 110 178
.13 11
154 130 130 118 118 104 112 134 98 100 104
;
The data set Gossypol contains the variable
Dose, which represents the amount of gossypol additive,
and the variable Gain, which represents the weight
gain.
Researchers are interested in whether there is a difference
in weight gain among the different dose levels of gossypol.
The following statements invoke the NPAR1WAY procedure to
perform a nonparametric analysis of this problem.
proc npar1way data=Gossypol;
class Dose;
var Gain;
run;
The variable Dose is the CLASS variable, and the VAR statement
specifies the variable Gain is the response variable.
The CLASS statement is required, and you must name only one CLASS
variable. You may name one or more analysis variables in the VAR
statement. If you omit the VAR statement, PROC NPAR1WAY analyzes
all numeric variables in the data set except for the CLASS variable,
the FREQ variable, and the BY variables.
Since no analysis options are specified in the PROC NPAR1WAY statement,
the WILCOXON, MEDIAN, VW, SAVAGE, and EDF options are invoked by
default. The results of these analyses are shown in the following
tables.
Analysis of Variance for Variable Gain Classified by Variable Dose |
Dose |
N |
Mean |
0 |
16 |
222.187500 |
0.04 |
11 |
217.363636 |
0.07 |
12 |
175.000000 |
0.1 |
17 |
120.176471 |
0.13 |
11 |
118.363636 |
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
Among |
4 |
140082.986077 |
35020.74652 |
55.8143 |
<.0001 |
Within |
62 |
38901.998997 |
627.45160 |
|
|
Average scores were used for ties. |
|
These tables are produced with the ANOVA option. For each level of
the CLASS variable Dose, PROC NPAR1WAY displays the number of
observations and the mean of the analysis variable Gain.
PROC NPAR1WAY displays a standard analysis of variance on the raw
data. This gives the same results as the GLM and ANOVA procedures.
The p-value for the F test is <.0001, which indicates that
Dose accounts for a significant portion of the variability in
the dependent variable Gain.
Wilcoxon Scores (Rank Sums) for Variable Gain Classified by Variable Dose |
Dose |
N |
Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 |
16 |
890.50 |
544.0 |
67.978966 |
55.656250 |
0.04 |
11 |
555.00 |
374.0 |
59.063588 |
50.454545 |
0.07 |
12 |
395.50 |
408.0 |
61.136622 |
32.958333 |
0.1 |
17 |
275.50 |
578.0 |
69.380741 |
16.205882 |
0.13 |
11 |
161.50 |
374.0 |
59.063588 |
14.681818 |
Average scores were used for ties. |
Kruskal-Wallis Test |
Chi-Square |
52.6656 |
DF |
4 |
Pr > Chi-Square |
<.0001 |
|
The WILCOXON option produces these tables. PROC NPAR1WAY
first provides a summary of the Wilcoxon scores for the analysis variable
Gain by class level. For each level of the CLASS variable
Dose, PROC NPAR1WAY displays the following information:
number of observations, sum of the Wilcoxon scores, expected sum
under the null hypothesis of no difference among class levels, standard
deviation under the null hypothesis, and mean score.
Next PROC NPAR1WAY displays the one-way ANOVA statistic, which for
Wilcoxon scores is known as the Kruskal-Wallis test. The statistic
equals 52.6656,
with four degrees of freedom, which is the number of class levels minus
one. The p-value, or probability of a larger statistic under the
null hypothesis, is <.0001. This leads to rejection of the null
hypothesis that there is no difference in location for Gain
among the levels of Dose. This p-value is asymptotic, computed
from the asymptotic chi-square distribution of the test statistic.
For certain data sets it may also be useful to compute the exact
p-value; for example, for small data sets, or data sets that
are sparse, skewed, or heavily tied. You can use the EXACT
statement to request exact p-values for any of the location
or scale tests available in PROC NPAR1WAY.
Median Scores (Number of Points Above Median) for Variable Gain Classified by Variable Dose |
Dose |
N |
Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 |
16 |
16.0 |
7.880597 |
1.757902 |
1.00 |
0.04 |
11 |
11.0 |
5.417910 |
1.527355 |
1.00 |
0.07 |
12 |
6.0 |
5.910448 |
1.580963 |
0.50 |
0.1 |
17 |
0.0 |
8.373134 |
1.794152 |
0.00 |
0.13 |
11 |
0.0 |
5.417910 |
1.527355 |
0.00 |
Average scores were used for ties. |
Median One-Way Analysis |
Chi-Square |
54.1765 |
DF |
4 |
Pr > Chi-Square |
<.0001 |
|
Van der Waerden Scores (Normal) for Variable Gain Classified by Variable Dose |
Dose |
N |
Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 |
16 |
16.116474 |
0.0 |
3.325957 |
1.007280 |
0.04 |
11 |
8.340899 |
0.0 |
2.889761 |
0.758264 |
0.07 |
12 |
-0.576674 |
0.0 |
2.991186 |
-0.048056 |
0.1 |
17 |
-14.688921 |
0.0 |
3.394540 |
-0.864054 |
0.13 |
11 |
-9.191777 |
0.0 |
2.889761 |
-0.835616 |
Average scores were used for ties. |
Van der Waerden One-Way Analysis |
Chi-Square |
47.2972 |
DF |
4 |
Pr > Chi-Square |
<.0001 |
|
Savage Scores (Exponential) for Variable Gain Classified by Variable Dose |
Dose |
N |
Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 |
16 |
16.074391 |
0.0 |
3.385275 |
1.004649 |
0.04 |
11 |
7.693099 |
0.0 |
2.941300 |
0.699373 |
0.07 |
12 |
-3.584958 |
0.0 |
3.044534 |
-0.298746 |
0.1 |
17 |
-11.979488 |
0.0 |
3.455082 |
-0.704676 |
0.13 |
11 |
-8.203044 |
0.0 |
2.941300 |
-0.745731 |
Average scores were used for ties. |
Savage One-Way Analysis |
Chi-Square |
39.4908 |
DF |
4 |
Pr > Chi-Square |
<.0001 |
|
These tables display the analyses produced by the MEDIAN,
VW, and SAVAGE options. For each score type, PROC NPAR1WAY
provides a summary of scores and the one-way ANOVA statistic,
as previously described for Wilcoxon scores. Other score types
available in PROC NPAR1WAY are Siegel-Tukey, Ansari-Bradley,
Klotz, and Mood, which are used to test for scale differences.
Additionally, you can request the SCORES=DATA option, which
uses the raw data as scores. This option gives you the
flexibility to construct any scores for your data with the
DATA step and then analyze these scores with PROC NPAR1WAY.
Kolmogorov-Smirnov Test for Variable Gain Classified by Variable Dose |
Dose |
N |
EDF at Maximum |
Deviation from Mean at Maximum |
0 |
16 |
0.000000 |
-1.910448 |
0.04 |
11 |
0.000000 |
-1.584060 |
0.07 |
12 |
0.333333 |
-0.499796 |
0.1 |
17 |
1.000000 |
2.153861 |
0.13 |
11 |
1.000000 |
1.732565 |
Total |
67 |
0.477612 |
|
Maximum Deviation Occurred at Observation 36 |
Value of Gain at Maximum = 178.0 |
Kolmogorov-Smirnov Statistics (Asymptotic) |
KS |
0.457928 |
KSa |
3.748300 |
Cramer-von Mises Test for Variable Gain Classified by Variable Dose |
Dose |
N |
Summed Deviation from Mean |
0 |
16 |
2.165210 |
0.04 |
11 |
0.918280 |
0.07 |
12 |
0.348227 |
0.1 |
17 |
1.497542 |
0.13 |
11 |
1.335745 |
Cramer-von Mises Statistics (Asymptotic) |
CM |
0.093508 |
CMa |
6.265003 |
|
These tables display the empirical distribution function
statistics, comparing the distribution of Gain for
the different levels of Dose. These tables are
produced by the EDF option, and they include
Kolmogorov-Smirnov statistics and Cramer-von Mises statistics.
In the preceding example, the CLASS variable Dose has five levels,
and the analyses examines possible differences among these five
levels, or samples. The following statements invoke the NPAR1WAY
procedure to perform a nonparametric analysis of the two lowest
levels of Dose.
proc npar1way data=Gossypol;
where Dose <= .04;
class Dose;
var Gain;
run;
The following tables show the results.
Analysis of Variance for Variable Gain Classified by Variable Dose |
Dose |
N |
Mean |
0 |
16 |
222.187500 |
0.04 |
11 |
217.363636 |
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
Among |
1 |
151.683712 |
151.683712 |
0.5587 |
0.4617 |
Within |
25 |
6786.982955 |
271.479318 |
|
|
Average scores were used for ties. |
|
Wilcoxon Scores (Rank Sums) for Variable Gain Classified by Variable Dose |
Dose |
N |
Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 |
16 |
253.50 |
224.0 |
20.221565 |
15.843750 |
0.04 |
11 |
124.50 |
154.0 |
20.221565 |
11.318182 |
Average scores were used for ties. |
Wilcoxon Two-Sample Test |
Statistic |
124.5000 |
|
|
Normal Approximation |
|
Z |
-1.4341 |
One-Sided Pr < Z |
0.0758 |
Two-Sided Pr > |Z| |
0.1515 |
|
|
t Approximation |
|
One-Sided Pr < Z |
0.0817 |
Two-Sided Pr > |Z| |
0.1635 |
Z includes a continuity correction of 0.5. |
Kruskal-Wallis Test |
Chi-Square |
2.1282 |
DF |
1 |
Pr > Chi-Square |
0.1446 |
|
These tables are produced by the WILCOXON option.
PROC NPAR1WAY provides a summary of the Wilcoxon scores
for the analysis variable Gain for each of the two
class levels. Since there are only two levels, PROC
NPAR1WAY displays the two-sample test, based on the simple
linear rank statistic with Wilcoxon scores. The normal
approximation includes a continuity correction. To remove
this, you can specify the CORRECT=NO option. PROC NPAR1WAY
also gives a t approximation for the Wilcoxon
two-sample test. And as for the multisample analysis, PROC
NPAR1WAY computes a one-way ANOVA statistic, which for Wilcoxon
scores is known as the Kruskal-Wallis test. All these
p-values show no difference in Gain for the two
Dose levels at the .05 level of significance.
Median Scores (Number of Points Above Median) for Variable Gain Classified by Variable Dose |
Dose |
N |
Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 |
16 |
9.0 |
7.703704 |
1.299995 |
0.562500 |
0.04 |
11 |
4.0 |
5.296296 |
1.299995 |
0.363636 |
Average scores were used for ties. |
Median Two-Sample Test |
Statistic |
4.0000 |
Z |
-0.9972 |
One-Sided Pr < Z |
0.1593 |
Two-Sided Pr > |Z| |
0.3187 |
Median One-Way Analysis |
Chi-Square |
0.9943 |
DF |
1 |
Pr > Chi-Square |
0.3187 |
|
Van der Waerden Scores (Normal) for Variable Gain Classified by Variable Dose |
Dose |
N |
Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 |
16 |
3.346520 |
0.0 |
2.320336 |
0.209157 |
0.04 |
11 |
-3.346520 |
0.0 |
2.320336 |
-0.304229 |
Average scores were used for ties. |
Van der Waerden Two-Sample Test |
Statistic |
-3.3465 |
Z |
-1.4423 |
One-Sided Pr < Z |
0.0746 |
Two-Sided Pr > |Z| |
0.1492 |
Van der Waerden One-Way Analysis |
Chi-Square |
2.0801 |
DF |
1 |
Pr > Chi-Square |
0.1492 |
|
Savage Scores (Exponential) for Variable Gain Classified by Variable Dose |
Dose |
N |
Sum of Scores |
Expected Under H0 |
Std Dev Under H0 |
Mean Score |
0 |
16 |
1.834554 |
0.0 |
2.401839 |
0.114660 |
0.04 |
11 |
-1.834554 |
0.0 |
2.401839 |
-0.166778 |
Average scores were used for ties. |
Savage Two-Sample Test |
Statistic |
-1.8346 |
Z |
-0.7638 |
One-Sided Pr < Z |
0.2225 |
Two-Sided Pr > |Z| |
0.4450 |
Savage One-Way Analysis |
Chi-Square |
0.5834 |
DF |
1 |
Pr > Chi-Square |
0.4450 |
|
These tables display the two-sample analyses produced by the
MEDIAN, VW, and SAVAGE options.
Kolmogorov-Smirnov Test for Variable Gain Classified by Variable Dose |
Dose |
N |
EDF at Maximum |
Deviation from Mean at Maximum |
0 |
16 |
0.250000 |
-0.481481 |
0.04 |
11 |
0.545455 |
0.580689 |
Total |
27 |
0.370370 |
|
Maximum Deviation Occurred at Observation 4 |
Value of Gain at Maximum = 216.0 |
Kolmogorov-Smirnov Two-Sample Test (Asymptotic) |
KS |
0.145172 |
D |
0.295455 |
KSa |
0.754337 |
Pr > KSa |
0.6199 |
Cramer-von Mises Test for Variable Gain Classified by Variable Dose |
Dose |
N |
Summed Deviation from Mean |
0 |
16 |
0.098638 |
0.04 |
11 |
0.143474 |
Cramer-von Mises Statistics (Asymptotic) |
CM |
0.008967 |
CMa |
0.242112 |
Kuiper Test for Variable Gain Classified by Variable Dose |
Dose |
N |
Deviation from Mean |
0 |
16 |
0.090909 |
0.04 |
11 |
0.295455 |
Kuiper Two-Sample Test (Asymptotic) |
K |
0.386364 |
Ka |
0.986440 |
Pr > Ka |
0.8383 |
|
These tables display the empirical distribution function
statistics, comparing the distribution of Gain for
the two levels of Dose. The p-value for the
Kolmogorov-Smirnov two-sample test is 0.6199, which
indicates no rejection of the null hypothesis that the
Gain distributions are identical for the two
levels of Dose.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.