STAT 410 96-2 Final Exam Solutions

There was a very serious misprint in this problem which most people seem to have figured out, reinterpreting the thing properly. I wanted gas mileage in Litres per hundred kilometres but in the second method instead of dividing gasoline consumed by distance driven I had both top and bottom as gasoline consumed. For people who did not figure this out I marked the exam out of only 46.5 not 50, ignoring the 1 point at the end of Q 7 and the 2.5 / 5 for method 2 in Q 7. There are 3 possible samples. For method 1 you are to work out y bar for each of the three samples and compute E(y bar) (which is just Y bar) where y is the gas consumption in L per 100 km for an individual trip. If g_i is gas consumed on trip i and d_i is distance driven on trip i then y_i = 100 g_i / d_i and E(y bar) = Y bar = (10 + 8 + 5)/3=7.666... . You can also compute E(y bar) by taking each possible value of y bar, multiplying by the probability of that value (1/3 in each case) and adding. This gives E(y bar) = (9 + 7.5 + 6.5)/3 = 7.666... as before. For method 2 the estimator is 100 g bar / d bar. The table following shows the three possible values of this ratio estimator. The expected value is then (8.666...+6+6)/3 = 6.888... . The crucial difference is not one of statistical properties such as bias or variance but rather one of what quantity. ratio est

Sample g_1 g_2 d_1 d_2 y_1 y_2 y bar g bar d bar

Trip 1, 2 10 16 100 200 10 8 9 13 150 8.6666...

Trip 1, 3 10 20 100 400 10 5 7.5 15 250 6

Trip 2, 3 16 20 200 400 8 5 6.5 18 300 6

\item Estimate the total gasoline consumed and give a standard error of this estimate. (8 marks) \item Estimate my average gasoline consumption in litres per hundred kilometres by estimating total gasoline consumed divided by total distance driven. Give a standard error for this estimate. (8 marks) \item After fill up number 107 the total distance I had driven since the beginning of my record keeping was 47405 kilometres; after fill-up 27 this figure was 11583.8 km. Use these figures to improve your estimate of total gasoline consumed. Again give a standard error. Your answer will include a reason for selecting the estimate you used, because you have more than 1 possible method to use at this point. (9 marks) \item From the data provided does it appear that the estimate in question 3 will be a big improvement over the estimate in question 1? (3 marks) \end{enumerate} \bigskip \centerline{\bf Part B} \medskip Consider a league of 25 teams, each of 5 players. A simple random sample of 2 of the teams is drawn. For each player on the team we record the player's age; the resulting data are in the data table. From each of the two selected teams two players are selected at random and we determine the years of playing experience. These years of experience are also in the table. \begin{enumerate} \setcounter{enumi}{4} \item Estimate the average age of the players in the league and give a standard error for this estimate. (8 marks) \item Estimate the average years of playing experience for players in the league and give a standard error for this estimate. (8 marks) \end{enumerate} \bigskip \begin{center} Data for Part B: Questions 5 and 6 \end{center} \bigskip \begin{center} \begin{tabular}{|c|c|c|c|c|c|c|} \hline Team \# & Ages & Average & Variance & Experiences & Average & Variance \\ \hline 11 & 20 21 23 23 25 &22.4 & 3.8 & 4 3 &3.5 & 0.5 \\ \hline 18 & 19 19 22 23 25 & 21.6 & 6.8 & 1 1 & 1 & 0 \\ \hline \end{tabular} \end{center} \bigskip \begin{center} Data for Part C: Question 7 \end{center} \bigskip \begin{tabular}{|c|c|c|c|} \hline Fill-up \# & Distance & Gasoline & Mileage \\ & Driven (km) & Used (L) & L per 100 km \\ \hline \hline \end{tabular} \newpage \begin{center} \bf Data for Part A: questions 1 to 4 \end{center} \begin{center} Sample of 10 from first 27 fill-ups \end{center} \begin{tabular}{|c|c|c|c|c|c|c|} \hline &Fill-up \# & Total Cost & Km & Price & Gasoline & Mileage \\ & & (\$) & Driven & (cents / L) & Used (L) & L / 100 km \\ \hline & 1& 9.80&390.4&31.2&31.4&8.04\\ & 2&10.41&413.6&31.4&33.1&8.01\\ & 4&10.42&429.2&31.4&33.1&7.73\\ & 7&10.20&438.7&31.4&32.4&7.40\\ & 12&11.12&445.2&34.2&32.5&7.30\\ & 13&11.56&447.7&34.2&33.8&7.54\\ & 16&12.23&470.1&34.2&35.7&7.60\\ & 19&11.22&440.7&34.2&32.8&7.44\\ & 20&10.75&476.1&34.2&31.4&6.60\\ & 24& 8.92&365.5&34.5&25.8&7.07\\ \hline Mean & 11.8&10.663&431.72&33.09&32.2&7.48 \\ SD& 8.05 & 0.937 & 34.05 & 1.50 & 2.56 & 0.430\\ VAR & 64.84& 0.8786 & 1159.6 & 2.254 & 6.569 & 0.185 \\ \hline \end{tabular} \bigskip \begin{center} Sample of 10 from last 80 fill-ups \end{center} \begin{tabular}{|c|c|c|c|c|c|c|} \hline &Fill-up \# & Total Cost & Km & Price & Gasoline & Mileage \\ & & (\$) & Driven & (cents/L) & Used (L) & L / 100 km \\ \hline &34 & 11.14 & 410.1 & 36.1 & 30.9 & 7.52 \\ &48 & 13.12 & 444.0 & 39.8 & 33.0 & 7.42 \\ &54 & 13.64 & 469.4 & 40.1 & 34.0 & 7.25 \\ &61 & 13.80 & 427.5 & 40.6 & 34.0 & 7.95 \\ &67 & 14.01 & 420.7 & 41.0 & 34.2 & 8.12 \\ &77 & 6.33 & 198.4 & 43.3 & 14.6 & 7.37 \\ &80 & 14.65 & 462.0 & 43.3 & 33.8 & 7.32 \\ &87 & 13.25 & 419.9 & 43.3 & 30.6 & 7.29 \\ &95 & 14.49 & 451.2 & 44.5 & 32.6 & 7.22 \\ &98 & 14.59 & 464.5 & 44.5 & 32.8 & 7.06 \\ \hline Mean & 70.1 & 12.90 & 416.77 & 41.65 & 31.04 & 7.45 \\ SD & 21.0 & 2.53 & 79.5 & 2.64 & 5.91 & 0.334 \\ VAR & 441.4 & 6.38 & 6321.2 & 6.952 & 34.9 & 0.1119 \\ \hline \end{tabular} \newpage \begin{center} Correlations for Part A data \bigskip First sample \medskip \begin{tabular}{|c|c|c|c|c|c|c|} \hline &Fill-up \# & Total & Distance & Gas & Gas & Mileage \\ & & Cost & Driven & Price & Used & L / 100 km \\ \hline Fill-up \# & 1 & 0.10 & 0.17 & 0.91 & -0.39 & -0.80 \\ Total Cost & & 1 & 0.84 & 0.38 & 0.86 & 0.04 \\ Distance & & & 1 & 0.31 & 0.73 & -0.33 \\ Price per L& & & & 1 & -0.15 & -0.66 \\ Gas Used & & & & & 1 & -0.39 \\ Mileage & & & & & & 1 \\ \hline \end{tabular} \bigskip Second Sample \medskip \begin{tabular}{|c|c|c|c|c|c|c|} \hline &Fill-up \# & Total & Distance & Gas & Gas & Mileage \\ & & Cost & Driven & Price & Used & L / 100 km \\ \hline Fill-up \# & 1 & 0.21 & -0.01 & 0.97 & -0.11 & -0.42 \\ Total Cost & & 1 & 0.95 & 0.14 & 0.95 & 0.02 \\ Distance & & & 1 & -0.08 & 0.97 & -0.08 \\ Price per L& & & & 1 & -0.19 & -0.43 \\ Gas Used & & & & & 1 & 0.16 \\ Mileage & & & & & & 1 \\ \hline \end{tabular} \end{center} \newpage \end{document}

There are 36 possible results of 2 draws with replacement though, e.g., 8 followed by 3 gives the same value of y bar as 3 followed by 8. Here is part of my table of values of (y bar - Y bar)^2 with the corresponding probabilities. You multiply and add and check that the answer is the same as the formulas given. The point is that V(ybar) is an expected value which you compute by taking value time probability and adding.

Sample	Probability x 36	(y bar -Y bar)^2
8 8	1	(8-34/6)^2
8 3	2	(11/2-34/6)^2
8 1	2
8 11	2
8 4	2
8 7	2
3 3	1
3 1	2
3 11	2
3 4	2
3 7	2
1 1	1
1 11	2
1 4	2
1 7	2
11 11	1
11 4	2
11 7	2
4 4	1
4 7	2
7 7	1

The estimated population total is about 51,473 and 10% of that is 5147. You are therefore asked to compute the probability that an estimate comes within 5147 of the corresponding parameter value. You convert 5147 to standard units by estimating the standard error of the population total using formula 2.22 in the text and dividing 5147 by this figure. The result is about 1.55. Thus the probability asked for is the probability that a standard normal is between -1.55 and 1.55 which is about 88% or 0.9 roughly, as Cochran gives. Note that this probability is not estimated too precisely because you have had to plug in a guess for 10% of the population total.
You need to see that a sample size of 12 makes 2 standard errors less than $200 while a sample size of 11 does not. You are given the value of S^2 for the population of N=36 shelves today and will have to assume that the value of S^2 on the future occasion when the shelves are resampled will be about the same. Then you just plug 12 and 11 for n into 2.13 and compare to the desired value for 1 SE of $100.
When you can carry out separate surveys of owners and renters you are able to choose two separate sample sizes, say r and o which will add up to the total sample size n. The standard error for the difference is about 15/r+15/s and you must choose this to be no more than 1. There are many solutions so its up to you to see you should take the solution which minimizes n=r+s. This gives r=s=30. When you cannot separate out owners and renters in advance you will have to take a SRS of n from the total population. You will then get a value of r approximately equal to 0.25n and s equal to roughly 0.75n leading to the condition 15/(0.25n) + 15/(0.75n) = 1 giving n=80. Note that the actual sample sizes achieved will be random and there is a substantial probability that the number of renters will actually be less than 0.25n = 20 in which case the standard error of the difference will be larger than 1. <\li>
Let M be the number of distinct units in the sample. Then any student at this level should be able to compute P_i=Prob(M=i) and get Cochran's answer. Now given M=i the estimate ybar' is the sample mean for a SRS of size i. Thus its conditional mean is Ybar for all values of i and so its unconditional mean is Ybar. Hence you may compute its variance by averaging the conditional variances of ybar' given M=i over values of i. These conditional variances are given by formula 2.8 with n replaced by i. Multiplying the answer in 2.8 by the corresponding P_i and adding gives the formula in Cochran's question in the sentence beginning `One way ...'. The approximation in the previous formula is obtained by multiplying out (2N-1)(N-1)/(6N^2) and discarding the term 1/(6N^2) since this is smaller than terms with only 1/N in them. Finally the inequality V(ybar') < V(ybar) must be verified by comparing the formula for V(ybar') with 2.8 with n=3.
I will eventually distribute a detailed answer to this question. It can be done as follows: write the estimate as ybar + R where R is a random variable which is equal to c if we get y_1 but not y_N (event A), -c if we get y_N but not y_1 (event B) and 0 otherwise (event C). The expected value of our estimate is E(ybar)+E(R) = Ybar + [cP(A) -cP(B) +0P(C)] which is just Ybar because P(A)=P(B). Next var(ybar+R) = var(ybar) + var(R) + 2 cov(ybar,R). You have a formula, 2.8, for var(ybar) and you already know that E(R)=0 so var(R) = E(R^2) =c^2 P(A) + (-c)^2 P(B). Now you actually need to work out P(A) and P(B) and see that these are N-2 choose n-1 over N choose n. Finally computation of cov(ybar, R) = E(R x ybar) requires you to average separately over the events A and B. On the event A, for example ybar is (y_1 + total of n-1 chosen from the set {y_2,...y_(N-1)} ) divided by n. On the same event R is just c. So E(R x ybar | A ) is c(y_1+ (n-1) Ybar2)/n where Ybar2 is the mean of the N-2 numbers {y_2,...y_(N-1)}. Now you can finish the algebra, particularly since the Ybar2 values cancel out.
The hard part is the compution of the mean and variance of this estimate. But E(y_1 + 6 ybar2 + y_8) = E(y_1) + 6 E(ybar2) + E(y_8). Remember that y_1 and y_8 are parameters; they do not change from sample to sample. Moreover, E(ybar2) = Ybar2 and 6Ybar2 = y_2+...+y_7. This E(y_1 + 6 ybar2 + y_8) = Y and so the estimate in the problem is unbiased. Next var(y_1/8+ 6ybar2/8+ y_8/8) = var(3ybar2/4) because y_1 and y_8 are constants. Thus var(ybarst) = 9var(ybar2)/16 and the latter is the value of 2.8 for N=6, n=2 and the population {y_2,...y_7}. The rest of this problem was plugging in numbers.

The questions.

Sample	g_1 g_2	d_1 d_2	y_1 y_2	y bar	g bar	d bar
Trip 1, 2	10 16	100 200	10 8	9	13	150	8.6666...
Trip 1, 3	10 20	100 400	10 5	7.5	15	250	6
Trip 2, 3	16 20	200 400	8 5	6.5	18	300	6