STAT 330: 98-1

Assignment 1 Solutions

  1. The most obvious source of probable bias is this: teachers are likely to move economically disadvantaged children to the group getting free milk. Typically these students will have the most to gain in educational or nutritional or behavioural ways. Thus the effect of the choice will be to increase the apparent effectiveness of a free milk program.
  2. Let tex2html_wrap_inline84 be the average weight of a month's production of coffee. You are asked to choose between tex2html_wrap_inline86 and tex2html_wrap_inline88 the former being the manufacturer's claim. You have data tex2html_wrap_inline90 , a sample from this population and observe tex2html_wrap_inline92 . The manufacturer has also said tex2html_wrap_inline94 so we will use this in our test.

    To assess the evidence against the manufacturers claim you make that claim the null hypothesis: tex2html_wrap_inline96 and tex2html_wrap_inline98 . We use a z test (large sample size, known tex2html_wrap_inline102 ) and get

    displaymath104

    The alternative predicts large negative z so the P-value is the area under the normal curve to the left of -3 or about 0.0013. The conclusion is that this P-value is so small that the manufacturer's claim is not credible; the packages are underweight.

  3. Let tex2html_wrap_inline84 be the true concentration of cadmium in the lake. You are being asked to choose between tex2html_wrap_inline114 and tex2html_wrap_inline116 . In this case you really ought to examine the data to see which, if either, of these possibilities is ruled out. Thus you can either see right off that the only possibility which might be ruled out is tex2html_wrap_inline116 and make this the null or you can do both tests. Either way the t-statistic (small sample, hopefully normal population of concentration measurements) is

    displaymath122

    For tex2html_wrap_inline124 you get P = 0.004 and for tex2html_wrap_inline128 you get P=.996. The first P-value means there is strong evidence against tex2html_wrap_inline116 while the second means there is little or no evidence against tex2html_wrap_inline114 . The clear real world conclusion is that the concentration of cadmium in the lake is virtually certain to be over 200.

    This conclusion summarizes the statistical calculations which rested on some assumptions. In practice there are serious potential problems which should be examined by the investigator (subject matter specialist not the statistician). Are the measurements unbiased? Are they independent or were they made in batches which would show more homogeneity within a batch than between? Are you really measuring the average concentration in the whole lake or only in a part of it?

  4. Page 284, #4. Let tex2html_wrap_inline84 be the true average yield point of bars of the new composition. We are told that we have a sample of size 25 from a normally distributed population whose mean is tex2html_wrap_inline84 and whose SD is tex2html_wrap_inline142 .

    1. The area to the left of 1.645 under a standard normal curve is 95% so a 90% confidence interval is given by the range tex2html_wrap_inline144 or tex2html_wrap_inline146 .
    2. The multiplier changes from 1.645 to 1.75.

  5. Page 284, #6.

    1. You must solve the inequalities as follows

      eqnarray19

      and similarly for the other inequality to get

      displaymath148

      so that the two things on the outside are the interval.

    2. The lengths of the intervals are tex2html_wrap_inline150 times in one case tex2html_wrap_inline152 and in the other tex2html_wrap_inline154 . For tex2html_wrap_inline156 these lengths are 3.92 and 4.02 so that the usual interval is shorter. (It is a theorem that using tex2html_wrap_inline158 produces the shortest interval.)

  6. Page 290, #16. Let p be the true proportion of breast cancers which would be detected by this technique. Your model is that the number actually detected in the study, X, has a Binomial distribution with parameters p and n=29. It is not at all clear that the data really apply to any larger population of breast cancer patients - you have no idea how these 29 patients were chosen. You get tex2html_wrap_inline168 and an approximate confidence interval

    displaymath170

    which is, as the text warns, quite wide. The sample size is not terribly large and you might worry about the normal approximation but for a rough idea the normal approximation is ok.

  7. Page 290, #18. Let tex2html_wrap_inline84 be the true average playing time of this population of tapes (in practice -- what population of tapes?). We have tex2html_wrap_inline90 sampled from this population. We are given tex2html_wrap_inline176 and s=8 and asked to use tex2html_wrap_inline180 from which the multiplier is tex2html_wrap_inline182 or so. The confidence interval is then tex2html_wrap_inline184 or tex2html_wrap_inline186 . This interval does not contain 360 minutes so it is unlikely that tex2html_wrap_inline84 really is 360. I think this data suggests strongly that the manufacturer is exaggerating. This question would be answered by a hypothesis whose z statistic is 8 leading to a miniscule P-value. There is no doubt whatever that tex2html_wrap_inline84 is less than 6 hours.
  8. Page 314, #10.
    1. The null hypothesis of correct calibration is that tex2html_wrap_inline196 and the alternative is that it is not.
    2. The probability of recalibration is 1 minus the probability that the sample average is between 9.8968 and 10.1032, that is,

      eqnarray53

      In this part you are told to take tex2html_wrap_inline196 and get

      eqnarray66

    3. Now you take tex2html_wrap_inline200 and get

      displaymath202

      and then tex2html_wrap_inline204 and get tex2html_wrap_inline206 .

    4. You made this calculation in b above and got c=2.58.
    5. Just by using n=10 in computing z. No change to c - that's the point of computing z.
    6. This average can be computed by subtracting 10 and multiplying by 1000 to get the numbers -19, 6, -143, 107, -112, -207, -272, 439, 214 and 190 whose sum is 203 so tex2html_wrap_inline218 with n=10 so that z = 0.32 and the hypothesis is not rejected at the 1% level.
    7. Recalibrate if |Z| > 2.58.



Richard Lockhart
Tue Feb 10 12:18:18 PST 1998