STAT 410 96-2 Final Exam Solutions
- There was a very serious misprint in this problem which most people
seem
to have figured out, reinterpreting the thing properly. I wanted gas
mileage in Litres per hundred kilometres but in the second method instead
of
dividing gasoline consumed by distance driven I had both top and bottom as
gasoline consumed. For people who did not figure this out I marked the
exam out of only 46.5 not 50, ignoring the 1 point at the end of Q 7 and
the 2.5 / 5 for method 2 in Q 7.
There are 3 possible samples. For method 1 you are to work out y bar for
each
of the three samples and compute E(y bar) (which is just Y bar) where y
is the gas consumption in L per 100 km for an individual trip. If g_i is
gas consumed on trip i and d_i is distance driven on trip i then
y_i = 100 g_i / d_i and E(y bar) = Y bar = (10 + 8 + 5)/3=7.666... . You
can also
compute E(y bar) by taking each possible value of y bar, multiplying
by the probability of that value (1/3 in each case) and adding. This
gives E(y bar) = (9 + 7.5 + 6.5)/3 = 7.666... as before.
For method 2 the estimator is 100 g bar / d bar. The table following shows
the three possible values of this ratio estimator. The expected value is
then (8.666...+6+6)/3 = 6.888... .
The crucial difference is not one of statistical properties such as bias
or variance but rather one of what quantity.
Sample | g_1 g_2 | d_1 d_2 | y_1
y_2 | y bar | g bar | d bar | ratio
est
Trip 1, 2 | 10 16 | 100 200 | 10 8 | 9 | 13 | 150 | 8.6666... |
Trip 1, 3 | 10 20 | 100 400 | 10 5 | 7.5 | 15 | 250 | 6 |
Trip 2, 3 | 16 20 | 200 400 | 8 5 | 6.5 | 18 | 300 | 6 |
\item Estimate the total gasoline consumed and give a standard error of
this estimate. (8 marks)
\item Estimate my average gasoline consumption in litres per hundred
kilometres by estimating total gasoline consumed divided by total
distance driven. Give a standard error for this estimate. (8 marks)
\item After fill up number 107 the total distance I had driven since the
beginning of my record keeping was 47405 kilometres; after fill-up 27 this
figure was 11583.8 km. Use these figures to
improve your estimate of total gasoline consumed. Again give a standard
error. Your answer will include a reason for selecting the estimate you
used, because you have more than 1 possible method to use at this point. (9
marks)
\item From the data provided does it appear that the estimate in
question 3 will be a big improvement over the estimate in
question 1? (3 marks)
\end{enumerate}
\bigskip
\centerline{\bf Part B}
\medskip
Consider a league of 25 teams, each of 5 players. A simple random sample
of 2 of the teams is drawn. For each player on the team we record the
player's age; the resulting data are in the data table. From each
of the two selected teams two players are selected at random and we
determine the years of playing experience. These years of experience are
also in the table.
\begin{enumerate}
\setcounter{enumi}{4}
\item Estimate the average age of the players in the league and give a standard
error for this estimate. (8 marks)
\item Estimate the average years of playing experience for players in the
league and give a standard error for this estimate. (8 marks)
\end{enumerate}
\bigskip
\begin{center}
Data for Part B: Questions 5 and 6
\end{center}
\bigskip
\begin{center}
\begin{tabular}{|c|c|c|c|c|c|c|}
\hline
Team \# & Ages & Average & Variance & Experiences & Average & Variance \\
\hline
11 & 20 21 23 23 25 &22.4 & 3.8 & 4 3 &3.5 & 0.5 \\
\hline
18 & 19 19 22 23 25 & 21.6 & 6.8 & 1 1 & 1 & 0 \\
\hline
\end{tabular}
\end{center}
\bigskip
\begin{center}
Data for Part C: Question 7
\end{center}
\bigskip
\begin{tabular}{|c|c|c|c|}
\hline
Fill-up \# & Distance & Gasoline & Mileage \\
& Driven (km) & Used (L) & L per 100 km \\
\hline
\hline
\end{tabular}
\newpage
\begin{center}
\bf Data for Part A: questions 1 to 4
\end{center}
\begin{center}
Sample of 10 from first 27 fill-ups
\end{center}
\begin{tabular}{|c|c|c|c|c|c|c|}
\hline
&Fill-up \# & Total Cost & Km & Price & Gasoline & Mileage \\
& & (\$) & Driven & (cents / L) & Used (L) & L / 100 km \\
\hline
& 1& 9.80&390.4&31.2&31.4&8.04\\
& 2&10.41&413.6&31.4&33.1&8.01\\
& 4&10.42&429.2&31.4&33.1&7.73\\
& 7&10.20&438.7&31.4&32.4&7.40\\
& 12&11.12&445.2&34.2&32.5&7.30\\
& 13&11.56&447.7&34.2&33.8&7.54\\
& 16&12.23&470.1&34.2&35.7&7.60\\
& 19&11.22&440.7&34.2&32.8&7.44\\
& 20&10.75&476.1&34.2&31.4&6.60\\
& 24& 8.92&365.5&34.5&25.8&7.07\\
\hline
Mean & 11.8&10.663&431.72&33.09&32.2&7.48 \\
SD& 8.05 & 0.937 & 34.05 & 1.50 & 2.56 & 0.430\\
VAR & 64.84& 0.8786 & 1159.6 & 2.254 & 6.569 & 0.185 \\
\hline
\end{tabular}
\bigskip
\begin{center}
Sample of 10 from last 80 fill-ups
\end{center}
\begin{tabular}{|c|c|c|c|c|c|c|}
\hline
&Fill-up \# & Total Cost & Km & Price & Gasoline & Mileage \\
& & (\$) & Driven & (cents/L) & Used (L) & L / 100 km \\
\hline
&34 & 11.14 & 410.1 & 36.1 & 30.9 & 7.52 \\
&48 & 13.12 & 444.0 & 39.8 & 33.0 & 7.42 \\
&54 & 13.64 & 469.4 & 40.1 & 34.0 & 7.25 \\
&61 & 13.80 & 427.5 & 40.6 & 34.0 & 7.95 \\
&67 & 14.01 & 420.7 & 41.0 & 34.2 & 8.12 \\
&77 & 6.33 & 198.4 & 43.3 & 14.6 & 7.37 \\
&80 & 14.65 & 462.0 & 43.3 & 33.8 & 7.32 \\
&87 & 13.25 & 419.9 & 43.3 & 30.6 & 7.29 \\
&95 & 14.49 & 451.2 & 44.5 & 32.6 & 7.22 \\
&98 & 14.59 & 464.5 & 44.5 & 32.8 & 7.06 \\
\hline
Mean & 70.1 & 12.90 & 416.77 & 41.65 & 31.04 & 7.45 \\
SD & 21.0 & 2.53 & 79.5 & 2.64 & 5.91 & 0.334 \\
VAR & 441.4 & 6.38 & 6321.2 & 6.952 & 34.9 & 0.1119 \\
\hline
\end{tabular}
\newpage
\begin{center}
Correlations for Part A data
\bigskip
First sample
\medskip
\begin{tabular}{|c|c|c|c|c|c|c|}
\hline
&Fill-up \# & Total & Distance & Gas & Gas & Mileage \\
& & Cost & Driven & Price & Used & L / 100 km \\
\hline
Fill-up \# & 1 & 0.10 & 0.17 & 0.91 & -0.39 & -0.80 \\
Total Cost & & 1 & 0.84 & 0.38 & 0.86 & 0.04 \\
Distance & & & 1 & 0.31 & 0.73 & -0.33 \\
Price per L& & & & 1 & -0.15 & -0.66 \\
Gas Used & & & & & 1 & -0.39 \\
Mileage & & & & & & 1 \\
\hline
\end{tabular}
\bigskip
Second Sample
\medskip
\begin{tabular}{|c|c|c|c|c|c|c|}
\hline
&Fill-up \# & Total & Distance & Gas & Gas & Mileage \\
& & Cost & Driven & Price & Used & L / 100 km \\
\hline
Fill-up \# & 1 & 0.21 & -0.01 & 0.97 & -0.11 & -0.42 \\
Total Cost & & 1 & 0.95 & 0.14 & 0.95 & 0.02 \\
Distance & & & 1 & -0.08 & 0.97 & -0.08 \\
Price per L& & & & 1 & -0.19 & -0.43 \\
Gas Used & & & & & 1 & 0.16 \\
Mileage & & & & & & 1 \\
\hline
\end{tabular}
\end{center}
\newpage
\end{document}
- There are 36 possible results of 2 draws with replacement though,
e.g., 8 followed by 3 gives the same value of y bar as 3 followed by 8.
Here is part of my table of values of (y bar - Y bar)^2 with the corresponding
probabilities. You multiply and add and check that the answer is the same
as the formulas given. The point is that V(ybar) is an expected value
which you compute by taking value time probability and adding.
Sample | Probability x 36 | (y bar -Y bar)^2 |
8 8 | 1 | (8-34/6)^2 |
8 3 | 2 | (11/2-34/6)^2 |
8 1 | 2 |
8 11 | 2 |
8 4 | 2 |
8 7 | 2 |
3 3 | 1 |
3 1 | 2 |
3 11 | 2 |
3 4 | 2 |
3 7 | 2 |
1 1 | 1 |
1 11 | 2 |
1 4 | 2 |
1 7 | 2 |
11 11 | 1 |
11 4 | 2 |
11 7 | 2 |
4 4 | 1 |
4 7 | 2 |
7 7 | 1 |
- The estimated population total is about 51,473 and 10% of
that is 5147. You are therefore asked to compute the
probability that an estimate comes within 5147 of the corresponding
parameter value. You convert 5147 to standard units by estimating
the standard error of the population total using formula 2.22 in
the text and dividing 5147 by this figure. The result is about
1.55. Thus the probability asked for is the probability that a
standard normal is between -1.55 and 1.55 which is about 88% or
0.9 roughly, as Cochran gives. Note that this probability is not
estimated too precisely because you have had to plug in a guess
for 10% of the population total.
- You need to see that a sample size of 12 makes 2 standard errors
less than $200 while a sample size of 11 does not. You are given the
value of S^2 for the population of N=36 shelves today and will have to
assume that the value of S^2 on the future occasion when the shelves
are resampled will be about the same. Then you just plug 12 and 11
for n into 2.13 and compare to the desired value for 1 SE of $100.
- When you can carry out separate surveys of owners and renters
you are able to choose two separate sample sizes, say r and o which
will add up to the total sample size n. The standard error for the
difference is about 15/r+15/s and you must choose this to be no more
than 1. There are many solutions so its up to you to see you should
take the solution which minimizes n=r+s. This gives r=s=30. When
you cannot separate out owners and renters in advance you will have to
take a SRS of n from the total population. You will then get a value
of r approximately equal to 0.25n and s equal to roughly 0.75n leading
to the condition 15/(0.25n) + 15/(0.75n) = 1 giving n=80. Note
that the actual sample sizes achieved will be random and there is
a substantial probability that the number of renters will actually
be less than 0.25n = 20 in which case the standard error
of the difference will be larger than 1.
<\li>
- Let M be the number of distinct units in the sample. Then
any student at this level should be able to compute P_i=Prob(M=i)
and get Cochran's answer. Now given M=i the estimate ybar' is the
sample mean for a SRS of size i. Thus its conditional mean is Ybar
for all values of i and so its unconditional mean is Ybar.
Hence you may compute its variance by averaging the conditional variances
of ybar' given M=i over values of i. These conditional variances are
given by formula 2.8 with n replaced by i. Multiplying the answer in
2.8 by the corresponding P_i and adding gives the formula in Cochran's
question in the sentence beginning `One way ...'. The approximation in
the previous formula is obtained by multiplying out (2N-1)(N-1)/(6N^2)
and discarding the term 1/(6N^2) since this is smaller than terms with only
1/N in them. Finally the inequality V(ybar') < V(ybar) must be verified
by comparing the formula for V(ybar') with 2.8 with n=3.
- I will eventually distribute a detailed answer to this question.
It can be done as follows: write the estimate as ybar + R where R
is a random variable which is equal to c if we get y_1 but not y_N (event
A),
-c if we get y_N but not y_1 (event B) and 0 otherwise (event C).
The expected value
of our estimate is E(ybar)+E(R) = Ybar + [cP(A) -cP(B) +0P(C)] which
is just Ybar because P(A)=P(B). Next var(ybar+R) = var(ybar) + var(R)
+ 2 cov(ybar,R). You have a formula, 2.8, for var(ybar) and you already
know that E(R)=0 so var(R) = E(R^2) =c^2 P(A) + (-c)^2 P(B). Now
you actually need to work out P(A) and P(B) and see that these are
N-2 choose n-1 over N choose n. Finally computation of cov(ybar, R)
= E(R x ybar) requires you to average separately over the events A and
B. On the event A, for example ybar is (y_1 + total of n-1 chosen from
the set {y_2,...y_(N-1)} ) divided by n.
On the same event R is just c. So E(R x ybar | A ) is c(y_1+ (n-1) Ybar2)/n
where Ybar2 is the mean of the N-2 numbers {y_2,...y_(N-1)}. Now you
can finish the algebra, particularly since the Ybar2 values cancel out.
- The hard part is the compution of the mean and variance of this
estimate. But E(y_1 + 6 ybar2 + y_8) = E(y_1) + 6 E(ybar2) + E(y_8). Remember
that y_1 and y_8 are parameters; they do not change from sample to sample.
Moreover, E(ybar2) = Ybar2 and 6Ybar2 = y_2+...+y_7. This E(y_1 + 6 ybar2 +
y_8) = Y and so the estimate in the problem is unbiased. Next
var(y_1/8+ 6ybar2/8+ y_8/8) = var(3ybar2/4) because y_1 and y_8 are
constants. Thus var(ybarst) = 9var(ybar2)/16 and the latter is the
value of 2.8 for N=6, n=2 and the population {y_2,...y_7}. The rest
of this problem was plugging in numbers.
The questions.