No Title

STAT 330: 95-3

Assignment 3 Solutions

Q 59, Chapter 9. Let and be the mean dry densities for soil samples gathered by the two methods (in practical terms I wonder what the population of soil was). This is a straightforward application of the two independent samples confidence interval methodology. We have , , and . I count and . The pooled estimate of the standard deviation is then which is 3.70, leading to the pooled estimate of the standard error of being

Thus use of the two sample t based confidence interval (on 33 degrees of freedom) gives the interval or -0.19 to 5.29. Using the interval which does not pool seems inadvisable here since one of the two sample sizes is small. However, the unpooled estimated standard error of the difference in means is

Thus this interval, using the normal multiplier 1.96 is -0.1 to 5.2. (Aside: there is an approximation due to Welch or Satterthwaite to the distribution of the t-statistic which leads to using the large sample statistic but with a t multiplier with degrees of freedom being

where

which comes to 20.16 degrees of freedom leading to the multiplier 2.085 and the interval which runs from -0.22 to 5.32.)
Chapter 9, Q 60: Another 2 sample unpaired t problem with no real evidence that the two population standard deviations are unequal so we use the two sample unpaired t-test (two tailed). We get and use 16 degrees of freedom leading to the test statistic t = 6.38. The P value from this statistic is minute - .
Chapter 9, Q 66: Let be the probability that an egg survives at 11 degrees and the probability that an egg survives at 30 degrees. The null hypothesis is that these two probabilities are the same while the alternative appears to be that they are not the same. Under the null hypothesis we estimate the common value by . The test statistic is

leading to a P-value of 0.85% which is strong evidence against equal probabilities.
Chapter 9, Q 70: This is another problem with two independent samples. It might have been paired if there were 8 hospitals involved with one room of each type in each hospital. I don't think that is what is intended here. However: if you did use a paired test you get valid conclusions but only 7 degrees of freedom rather than 14 as in the unpaired case. I get a P-value of 35% and conclude that there is little solid evidence of a difference. If I found out that the two hospitals were of different kinds then I would not know whether the difference in bacteria counts was due to carpeting or to the different natures of the hospitals - this is called confounding. In general confounding can be avoided only by randomization which requires a controlled experiment (and not an observational study) in which the experimenter controls which rooms are carpeted and chooses those rooms at random.
Chapter 9, Q 74: The interval is

or . The subscript 1 refers to foreign drivers.
Chapter 9, Q 75:
A standard two sample problem with 2 independent samples. The z statistic is

Since the sample sizes are equal the pooled and unpooled versions are identical. The statistic works out to -6.4 which is overwhelming evidence that the two filling operations produce different mean weights.
This is a one sample problem testing against the alternative that it is more than 1400. The statistic is

yielding a one sided P-value of 13.6% from a t distribution on 29 degrees of freedom or of 13.1% from the normal curve showing that it doesn't really matter which you use. This is only very weak evidence against the null hypothesis which wold be accepted.

In 1879, over the period from June 5 to July 2, Michelson carried out a number of measurements of the speed of light. The first 20 measurements and last 20 measurements (minus 299000 km/sec) and several summary statistics are recorded below.

First 20 Second 20 Difference

850 890 -40

740 840 -100

900 780 120

1070 810 260

930 760 170

850 810 40

950 790 160

980 810 170

980 820 160

880 850 30

1000 870 130

980 870 110

930 810 120

650 740 -90

760 810 -50

810 940 -130

1000 950 50

1000 800 200

960 810 150

960 870 90

Average=909 Average=831.5 Average=77.5

SD=104.9 SD=54.2 SD=109.8

Has the bias of the measurements changed between the first 20 and the last 20?
Solution
This is not a paired data problem; I just calculated the differences to confuse. There is no natural pairing between the first measurement and the first of the last 20. Taking the two independent samples model the question is whether the mean of the first 20 is the same as the mean of the last 20. The idea is that each mean is the true speed of light plus a bias and so the means are equal if and only if the biases are the same.
It is not obvious that the standard deviations are unchanged (and the statistical evidence is that the standard deviation has indeed changed). We thus do not pool the estimates of the standard deviations and our test statistic is

(Numerically, though, since the two sample sizes are the same the statistic has the same value as the two sample t statistic using a pooled variance estimate.) Looking in normal tables and carrying out a 2 tailed test we get a P value of 0.3% and conclude that the bias has indeed changed.
Although the data are not paired the paired comparisons t-test is still valid and the resulting statistic is 3.16 on 19 degrees of freedom yielding a P-value of 0.5% which leads to the same conclusion. The previous calculation based on the unpaired method would have had 38 degrees of freedom if the assumption of equal variances were valid and an indistinguishable P-value.
Annual records were kept in the Prussian army for the number of deaths by horsekick.

Year Number of Year Number of

deaths deaths

1875 3 1885 5

1876 5 1886 11

1877 7 1887 15

1070 9 1888 6

1879 10 1889 11

1880 18 1890 17

1881 6 1891 12

1882 14 1892 15

1883 11 1893 8

1884 9 1894 4

Total 92 104

Use a Poisson model and obtain a 95% confidence interval for the long run average mean number of deaths per year.
I did this in class and obtained two intervals: the first, for the parameter (which is the annual rate) is

and the second is the interval for based on solving a quadratic

This last interval translates to one for by dividing by n=20 to get

or

which is about the same width but centred slightly differently. The difference is negligible.
Develop a test of the hypothesis that there has been no change in this underlying death rate over the time period in question as follows. Let N be the number of deaths in the first 10 years and M the number in the second 10 years. If the Poisson model with constant death rate is credible then M and N have the same mean. What is the standard error of N-M in terms of the Poisson parameter? How can you estimate this standard error? How can you use this to test the hypothesis that there is no change in mean?
If is the per year rate and this is the same for all 20 years then both N and M have Poisson distributions with parameters . Then and the natural test statistic is

leading to a (two sided) P-value of 39% which is not significant.

Richard Lockhart
Tue Feb 3 10:59:11 PST 1998

First 20	Second 20	Difference
850	890	-40
740	840	-100
900	780	120
1070	810	260
930	760	170
850	810	40
950	790	160
980	810	170
980	820	160
880	850	30
1000	870	130
980	870	110
930	810	120
650	740	-90
760	810	-50
810	940	-130
1000	950	50
1000	800	200
960	810	150
960	870	90
Average=909	Average=831.5	Average=77.5
SD=104.9	SD=54.2	SD=109.8

Year	Number of	Year	Number of
	deaths		deaths
1875	3	1885	5
1876	5	1886	11
1877	7	1887	15
1070	9	1888	6
1879	10	1889	11
1880	18	1890	17
1881	6	1891	12
1882	14	1892	15
1883	11	1893	8
1884	9	1894	4
Total	92		104