In this document I begin by showing you examples of 1 sample and two sample procedures using SAS. Then I have several data sets which I describe and expect you to analyze. You must use SAS. I want you to hand in: copies of the SAS commands which you submit, the SAS output you get and a short (two or three sentences) summary of the practical conclusions. Uninterpreted computer output cannot get more than 25% of the possible marks. (At the the same time without the SAS input and output you won't get anything.)
One sample tests and confidence intervals
The data for this example are taken from question 42 in chapter 7 which you should see for an explanation of the setting. I ran the following SAS code which is in the file g:asbestos.sas which is called Macintosh HD:Student Folder:asbestos.sas on the Macs. (In order to duplicate this analysis on a MAC you must make a copies of the files CLASS:STAT:330:asbestos.sas and CLASS:STAT:330:asbestos.dat in the folder Macintosh HD:Student Folder or on your own floppy disk; the MAC version of SAS cannot read files from the folder CLASS:STAT:330 where the instructor normally puts files for students to use.)
options pagesize=60 linesize=80; data asbestos; infile 'g:asbestos.dat'; [infile 'Macintosh HD:Student Folder:asbestos.dat'; on the Macs] input comply; complyd=comply-200; proc means mean std stderr t prt maxdec=2; run;
The words mean, std, stderr, t, and prt after means in the proc means statement request the computation of the the sample mean, the sample standard deviation, the standard error of the mean, the value of the t statistic for testing the hypothesis of 0 mean and the two sided P-value for a t-test of that null hypothesis. The expression maxdec=2 limits the printout to 2 decimal places for means and such.
The output from proc means is
The SAS System 9 12:47 Thursday, October 12, 1995 Variable Mean Std Dev Std Error T Prob>|T| ---------------------------------------------------------------------- COMPLY 209.75 24.16 6.04 34.73 0.0001 COMPLYD 9.75 24.16 6.04 1.61 0.1273 ----------------------------------------------------------------------Notice that the second line tests the hypothesis that the mean of COMPLY is actually 200. The two sided P value is about 13% indicating that this there is only very weak evidence against this null. To compute a 95% confidence interval take . I don't know if I can get SAS to actually do this little piece of arithmetic easily.
Two sample tests and confidence intervals
The data for the question about Michelson's measurements of the speed of light from Assignment 4 are in the file g:michlson.dat which is called CLASS:STAT:330:michlson.dat on the Macs and I use proc ttest to test for no change in mean.
options pagesize=60 linesize=80; data michlson; infile 'g:michlson.dat'; [ infile 'Macintosh HD:Student Folder:michlson.dat'; on the Macs] input set $ speed ; proc sort data=michlson; by set; proc ttest cochran; class set; proc univariate plot normal; by set; run;
The output is
The SAS System 1 14:31 Monday, October 16, 1995 TTEST PROCEDURE Variable: SPEED SET N Mean Std Dev Std Error Minimum Maximum ------------------------------------------------------------------------------- First 20 909.0000000 104.9260391 23.46217561 650.0000000 1070.000000 Second 20 831.5000000 54.2193401 12.12381302 740.0000000 950.000000 Variances T Method DF Prob>|T| -------------------------------------------------------- Unequal 2.9346 Satterthwaite 28.5 0.0065 Cochran 19.0 0.0085 Equal 2.9346 38.0 0.0056 For H0: Variances are equal, F' = 3.75 DF = (19,19) Prob>F' = 0.0060Notice that the two means are clearly different and that the two variances are also clearly different. The ``Unequal'' line reports on tests which try to adjust for unequal variances; Satterthwaite is the technique mentioned in previous solution sets. You have to do your own arithmetic to get confidence intervals. The output of proc univariate is:
The SAS System 1 10:11 Wednesday, October 25, 1995 ---------------------------------- SET=First ----------------------------------- Univariate Procedure Variable=SPEED Moments N 20 Sum Wgts 20 Mean 909 Sum 18180 Std Dev 104.926 Variance 11009.47 Skewness -0.96461 Kurtosis 0.573188 USS 16734800 CSS 209180 CV 11.54302 Std Mean 23.46218 T:Mean=0 38.74321 Pr>|T| 0.0001 Num ^= 0 20 Num > 0 20 M(Sign) 10 Pr>=|M| 0.0001 Sgn Rank 105 Pr>=|S| 0.0001 W:Normal 0.920264 Pr<W 0.1059 Quantiles(Def=5) 100% Max 1070 99% 1070 75% Q3 980 95% 1035 50% Med 940 90% 1000 25% Q1 850 10% 750 0% Min 650 5% 695 1% 650 Range 420 Q3-Q1 130 Mode 980 Extremes Lowest Obs Highest Obs 650( 14) 980( 12) 740( 2) 1000( 11) 760( 15) 1000( 17) 810( 16) 1000( 18) 850( 6) 1070( 4) Stem Leaf # Boxplot 10 7 1 | 10 000 3 | 9 566888 6 +-----+ 9 033 3 *--+--* 8 558 3 +-----+ 8 1 1 | 7 6 1 | 7 4 1 | 6 5 1 0 ----+----+----+----+ Multiply Stem.Leaf by 10**+2 The SAS System 2 10:11 Wednesday, October 25, 1995 ---------------------------------- SET=First ----------------------------------- Univariate Procedure Variable=SPEED Normal Probability Plot 1075+ +++++* | *+*++* | ** *++*+ | ** ++++ 875+ **+*+++ | +*+++ | ++++* | ++++ * 675+ +++++* +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 The SAS System 3 10:11 Wednesday, October 25, 1995 ---------------------------------- SET=Second ---------------------------------- Univariate Procedure Variable=SPEED Moments N 20 Sum Wgts 20 Mean 831.5 Sum 16630 Std Dev 54.21934 Variance 2939.737 Skewness 0.692545 Kurtosis 0.328607 USS 13883700 CSS 55855 CV 6.520666 Std Mean 12.12381 T:Mean=0 68.58403 Pr>|T| 0.0001 Num ^= 0 20 Num > 0 20 M(Sign) 10 Pr>=|M| 0.0001 Sgn Rank 105 Pr>=|S| 0.0001 W:Normal 0.934107 Pr<W 0.1953 Quantiles(Def=5) 100% Max 950 99% 950 75% Q3 870 95% 945 50% Med 810 90% 915 25% Q1 805 10% 770 0% Min 740 5% 750 1% 740 Range 210 Q3-Q1 65 Mode 810 Extremes Lowest Obs Highest Obs 740( 14) 870( 12) 760( 5) 870( 20) 780( 3) 890( 1) 790( 7) 940( 16) 800( 18) 950( 17) Stem Leaf # Boxplot 9 5 1 | 9 4 1 | 8 57779 5 +-----+ 8 011111124 9 *--+--* 7 689 3 | 7 4 1 | ----+----+----+----+ Multiply Stem.Leaf by 10**+2 The SAS System 4 10:11 Wednesday, October 25, 1995 ---------------------------------- SET=Second ---------------------------------- Univariate Procedure Variable=SPEED Normal Probability Plot 975+ * ++++ | +*+++++++ | *+*++*+*+ | **++*+*++** | +*++*+*+++ 725+ +++++*+++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 The SAS System 5 10:11 Wednesday, October 25, 1995 Univariate Procedure Schematic Plots Variable=SPEED | 1100 + | | | | | 1050 + | | | | | | | 1000 + | | | | +-----+ | | | 950 + | | | | *-----* | | | | | | | + | | 900 + | | | | | | | | | | +-----+ | | | | | 850 + +-----+ | | | | | + | | | | | | | *-----* 800 + | +-----+ | | | | | | | | | 750 + | | | | | | | 700 + | | | 650 + 0 ------------+-----------+----------- SET First Second
You will see that the normal probability plots are reasonably straight but basically horrible to look at; other packages produce better graphs easily.
Two sample paired comparisons
You do this with proc means:
options pagesize=60 linesize=80; data michpair; infile 'g:michpair.dat'; [infile 'Macintosh HD:Student Folder:michpair.dat'; on the Macs] input speed1 speed2 ; diff=speed1-speed2 proc means mean std stderr t prt maxdec=2; proc univariate plot normal; var speed1 diff; run;The output is
The SAS System 2 14:31 Monday, October 16, 1995 Variable Mean Std Dev Std Error T Prob>|T| -------------------------------------------------------------------------- SPEED1 909.00 104.93 23.46 38.74 0.0001 SPEED2 831.50 54.22 12.12 68.58 0.0001 DIFF 77.50 109.78 24.55 3.16 0.0052 --------------------------------------------------------------------------
Only the third line actually matters.
Your Assignment
options pagesize=60 linesize=80; data glucose; infile 'g:glucose.dat'; [infile 'Macintosh HD:Student Folder:glucose.dat'; on the MAC] input frstpreg scndpreg ; proc print; run;
DUE: Wednesday 6 November 1996