STAT 330

Assignment 7: First SAS Assignment

In this handout I begin by showing you examples of 1 sample and two sample procedures using SAS. Then I have several data sets which I describe and expect you to analyze. You must use SAS. I want you to hand in: copies of the SAS commands which you submit, the SAS output you get and a short (two or three sentences) summary of the practical conclusions. Uninterpreted computer output cannot get more than 25% of the possible marks. (At the the same time without the SAS input and output you won't get anything.)

One sample tests and confidence intervals

The data for this example are taken from question 42 in chapter 7 which you should see for an explanation of the setting. I ran the following SAS code which is in the file n: tex2html_wrap_inline29 stat tex2html_wrap_inline29 330 tex2html_wrap_inline29 asbestos.sas.

  options pagesize=60 linesize=80;
  data asbestos;
  infile 'n:\stat\330\asbestos.dat';
  input comply;
  complyd=comply-200;
  proc means mean std stderr t prt maxdec=2;
  run;

The words mean, std, stderr, t, and prt after means in the proc means statement request the computation of the the sample mean, the sample standard deviation, the standard error of the mean, the value of the t statistic for testing the hypothesis of 0 mean and the two sided P-value for a t-test of that null hypothesis. The expression maxdec=2 limits the printout to 2 decimal places for means and such.

The output from proc means is

                        The SAS System                                9
                                       12:47 Thursday, October 12, 1995

 Variable        Mean       Std Dev     Std Error          T  Prob>|T|
 ----------------------------------------------------------------------
 COMPLY        209.75        24.16         6.04         34.73    0.0001
 COMPLYD         9.75        24.16         6.04          1.61    0.1273
 ----------------------------------------------------------------------
Notice that the second line tests the hypothesis that the mean of COMPLY is actually 200. The two sided P value is about 13% indicating that this there is only very weak evidence against this null. To compute a 95% confidence interval take tex2html_wrap_inline43 . I don't know if I can get SAS to actually do this little piece of arithmetic easily.

Two sample tests and confidence intervals

The data for the question about Michelson's measurements of the speed of light from Assignment 4 are the file n: tex2html_wrap_inline29 stat tex2html_wrap_inline29 330 tex2html_wrap_inline29 michlson.dat and I use proc ttest to test for no change in mean.

  options pagesize=60 linesize=80;
  data michlson;
  infile 'n:\stat\330\michlson.dat';
  input set $ speed ;
  proc sort data=michlson;
   by set;
  proc ttest cochran;
   class set;
   var speed;
  run;
  proc univariate plot normal;
   by set;
  run;
To get high definition graphs replace the proc univariate with
proc rank data=michlson normal=vw out=rmich;
 by set;
 var speed;
 ranks normscr;
proc gplot data =rmich;
 title3 'normal probability plot';
 by set;
 plot speed*normscr;
run;
The (low resolution) output is
                            The SAS System                                1
                                             14:31 Monday, October 16, 1995

                                TTEST PROCEDURE

Variable: SPEED

SET          N         Mean      Std Dev    Std Error      Minimum      Maximum
-------------------------------------------------------------------------------
First       20  909.0000000  104.9260391  23.46217561  650.0000000  1070.000000
Second      20  831.5000000   54.2193401  12.12381302  740.0000000   950.000000

Variances        T    Method              DF    Prob>|T|
--------------------------------------------------------
Unequal     2.9346    Satterthwaite     28.5      0.0065
                      Cochran           19.0      0.0085
Equal       2.9346                      38.0      0.0056

For H0: Variances are equal, F' = 3.75    DF = (19,19)    Prob>F' = 0.0060
The line labelled "Equal" gives the usual two sample t statistic for testing for equal means using a pooled estimate of the variance. It shows the degrees of freedom and the associated two tailed P-value. Beneath that line is a line beginning For H0 which tests the hypothesis that the two variances are equal. The statistic F' is the larger sample variance over the smaller and the P value is two tailed. Notice that the two means are clearly different and that the two variances are also clearly different. The ``Unequal'' line reports on tests which try to adjust for unequal variances; Satterthwaite is the technique mentioned in previous solution sets. You have to do your own arithmetic to get confidence intervals. The output of proc univariate is:
                            The SAS System                                1
                                          10:11 Wednesday, October 25, 1995

----------------------------- SET=First -----------------------------------

                         Univariate Procedure

Variable=SPEED

                                    Moments

                    N                20  Sum Wgts         20
                    Mean            909  Sum           18180
                    Std Dev     104.926  Variance   11009.47
                    Skewness   -0.96461  Kurtosis   0.573188
                    USS        16734800  CSS          209180
                    CV         11.54302  Std Mean   23.46218
                    T:Mean=0   38.74321  Pr>|T|       0.0001
                    Num ^= 0         20  Num > 0          20
                    M(Sign)          10  Pr>=|M|      0.0001
                    Sgn Rank        105  Pr>=|S|      0.0001
                    W:Normal   0.920264  Pr<W         0.1059


                                Quantiles(Def=5)

                     100% Max      1070       99%      1070
                      75% Q3        980       95%      1035
                      50% Med       940       90%      1000
                      25% Q1        850       10%       750
                       0% Min       650        5%       695
                                               1%       650
                     Range          420
                     Q3-Q1          130
                     Mode           980


                                    Extremes

                       Lowest    Obs     Highest    Obs
                          650(      14)      980(      12)
                          740(       2)     1000(      11)
                          760(      15)     1000(      17)
                          810(      16)     1000(      18)
                          850(       6)     1070(       4)


                Stem Leaf                     #             Boxplot
                  10 7                        1                |
                  10 000                      3                |
                   9 566888                   6             +-----+
                   9 033                      3             *--+--*
                   8 558                      3             +-----+
                   8 1                        1                |
                   7 6                        1                |
                   7 4                        1                |
                   6 5                        1                0
                     ----+----+----+----+
                 Multiply Stem.Leaf by 10**+2

                                 The SAS System                                2
                                               10:11 Wednesday, October 25, 1995

---------------------------------- SET=First -----------------------------------

                              Univariate Procedure

Variable=SPEED

                                 Normal Probability Plot
              1075+                                       +++++*
                  |                                  *+*++*
                  |                          ** *++*+
                  |                      ** ++++
               875+                  **+*+++
                  |               +*+++
                  |          ++++*
                  |      ++++ *
               675+ +++++*
                   +----+----+----+----+----+----+----+----+----+----+
                       -2        -1         0        +1        +2


                                 The SAS System                                3
                                               10:11 Wednesday, October 25, 1995

---------------------------------- SET=Second ----------------------------------

                              Univariate Procedure

Variable=SPEED

                                    Moments

                    N                20  Sum Wgts         20
                    Mean          831.5  Sum           16630
                    Std Dev    54.21934  Variance   2939.737
                    Skewness   0.692545  Kurtosis   0.328607
                    USS        13883700  CSS           55855
                    CV         6.520666  Std Mean   12.12381
                    T:Mean=0   68.58403  Pr>|T|       0.0001
                    Num ^= 0         20  Num > 0          20
                    M(Sign)          10  Pr>=|M|      0.0001
                    Sgn Rank        105  Pr>=|S|      0.0001
                    W:Normal   0.934107  Pr<W         0.1953


                                Quantiles(Def=5)

                     100% Max       950       99%       950
                      75% Q3        870       95%       945
                      50% Med       810       90%       915
                      25% Q1        805       10%       770
                       0% Min       740        5%       750
                                               1%       740
                     Range          210
                     Q3-Q1           65
                     Mode           810


                                    Extremes

                       Lowest    Obs     Highest    Obs
                          740(      14)      870(      12)
                          760(       5)      870(      20)
                          780(       3)      890(       1)
                          790(       7)      940(      16)
                          800(      18)      950(      17)


                Stem Leaf                     #             Boxplot
                   9 5                        1                |
                   9 4                        1                |
                   8 57779                    5             +-----+
                   8 011111124                9             *--+--*
                   7 689                      3                |
                   7 4                        1                |
                     ----+----+----+----+
                 Multiply Stem.Leaf by 10**+2


                                 The SAS System                                4
                                               10:11 Wednesday, October 25, 1995

---------------------------------- SET=Second ----------------------------------

                              Univariate Procedure

Variable=SPEED

                                 Normal Probability Plot
               975+                                            *  ++++
                  |                                      +*+++++++
                  |                             *+*++*+*+
                  |                  **++*+*++**
                  |          +*++*+*+++
               725+ +++++*+++
                   +----+----+----+----+----+----+----+----+----+----+
                       -2        -1         0        +1        +2


                                 The SAS System                                5
                                               10:11 Wednesday, October 25, 1995

                              Univariate Procedure
                                Schematic Plots

Variable=SPEED

                          |
                     1100 +
                          |
                          |            |
                          |            |
                     1050 +            |
                          |            |
                          |            |
                          |            |
                     1000 +            |
                          |            |
                          |         +-----+
                          |         |     |
                      950 +         |     |        |
                          |         *-----*        |
                          |         |     |        |
                          |         |  +  |        |
                      900 +         |     |        |
                          |         |     |        |
                          |         |     |     +-----+
                          |         |     |     |     |
                      850 +         +-----+     |     |
                          |            |        |  +  |
                          |            |        |     |
                          |            |        *-----*
                      800 +            |        +-----+
                          |            |           |
                          |            |           |
                          |            |           |
                      750 +            |           |
                          |            |           |
                          |
                          |
                      700 +
                          |
                          |
                          |
                      650 +            0
                           ------------+-----------+-----------
                      SET             First      Second

You will see that the normal probability plots are reasonably straight but basically horrible to look at; other packages produce better graphs easily.

Two sample paired comparisons

You do this with proc means:

  options pagesize=60 linesize=80;
  data michpair;
  infile 'n:\stat\330\michpair.dat';
  input speed1 speed2 ;
    diff=speed1-speed2
  proc means mean std stderr t prt maxdec=2;
  proc univariate plot normal;
   var speed1 diff;
  run;
The output is
                                 The SAS System                                2
                                                  14:31 Monday, October 16, 1995

   Variable          Mean       Std Dev     Std Error             T  Prob>|T|
   --------------------------------------------------------------------------
   SPEED1          909.00        104.93         23.46         38.74    0.0001
   SPEED2          831.50         54.22         12.12         68.58    0.0001
   DIFF             77.50        109.78         24.55          3.16    0.0052
   --------------------------------------------------------------------------

Only the third line actually matters.

Your Assignment

  1. The file n: tex2html_wrap_inline29 stat tex2html_wrap_inline29 330 tex2html_wrap_inline29 glucose.dat contains blood glucose levels for 52 women after their first pregnancy and then their second. The following SAS commands read the file and print out the data set.
      options pagesize=60 linesize=80;
      data glucose;
       infile 'n:\stat\330\glucose.dat';
       input frstpreg scndpreg ;
      proc print;
      run;

    1. Get 95% confidence intervals for first pregnancy mean, second pregnancy mean and difference in means.
    2. Is there a difference in blood glucose levels between the two pregnancies?
    3. Does the population look reasonably normal?

  2. For the body fat data in the introductory handout on SAS do men and women have different average percent body fat? Do they have different population standard deviations? Are the normality assumptions adequate? (The data are in n: tex2html_wrap_inline29 stat tex2html_wrap_inline29 330 tex2html_wrap_inline29 bodyfat.dat.)
  3. In the file n: tex2html_wrap_inline29 stat tex2html_wrap_inline29 330 tex2html_wrap_inline29 iris.dat are the measurements of 4 dimensions on each of 50 flowers of 2 species of iris. Read them with input species $ sepallen; -- the file has 3 other columns which are ignored by this command. Do Versicolor and Virginica Irises have different average sepal lengths?

DUE: Friday end of Week 9



Richard Lockhart
Fri Mar 6 15:06:27 PST 1998