STAT 410 96-2 Assignment 9 Solutions

I intended these to be separate estimates. The formulas involved are 6.44 and 6.45 (with estimates plugged in for the population variances and covariances in 6.45) for the ratio estimate and standard error and 7.48 using 7.56 for the regression estimate with 7.58 for its standard error. Using plots, or computing correlation coefficients, I expect to discover that the correlation between government income and total employment income is negative so that ratio estimates are very poor. The regression estimates will not be much better that the usual N ybar_st unless the correlations are over about 0.3 or so. <\li>
In this question I intended you to use combined estimates. For ratio estimates the formulas are 6.48 and 6.51 plugging in estimates for the needed variances and correlations. For regression use formulas on the middle of page 202 to estimate the slope and then 7.61 to estimate the variance of the estimate. <\li>

There are a total of 9 possible samples. For each sample the separate regression of the slope is simply the difference of the two y values divided by the difference of the corresponding x values. The pooled slope is estimated by the simple formula for bc' on page 202. We are trying to estimate Y=40. I enumerated the samples in an obvious order and so I am not typing that here.

b1	b2	bc	sep est	comb est
1.50	5.00	2.20	48.50	42.90
1.50	2.00	1.85	36.50	37.27
1.50	0.50	1.00	44.00	42.00
1.67	5.00	2.00	49.33	42.00
1.67	2.00	1.83	37.33	37.50
1.67	0.50	1.31	44.83	43.04
2.00	5.00	3.50	48.00	40.50
2.00	2.00	2.00	36.00	36.00
2.00	0.50	0.80	43.50	45.90

This leads to biases of 3.11 and 0.79 and Mean Squared errors (average of squares of estimate - 40) of 34.58 and 10.05. This is not the answer in Cochran. Notice that the bias component and the variance are both bigger for the separate estimates, emphasizing the danger of these estimates in small samples. The bias of 3.11 is squared to compare it the the variance or, alternatively compare bias to standard error. You see that the bias of 3 is a substantial fraction of a standard error (around 5) so that the bias is unacceptably large in this tiny problem.

In this question you to find 4 variances:
1. for systematic there are ten possible values of the sample mean: 22.3, 18.2 and so on. You subtract 4155/200 from these, square, add and divide by 10.
3. for a srs the variance is (23601/199)(19/20)/10.
4. for a one per column sample you must work out the value of formula 5.30 which is (sum of S_h^2) (19/20)/200. To compute the sum you go to 5.32 where the left hand side is 23601 and the last term on the right hand side is a sum of 10 terms beginning (410/20 - 4155/200)^2 multiplied by Nh=20. Finally you solve for sum of S_h^2 remembering that Nh-1 is 19.
5. this is like the last one but you group two columns together to form a stratum. Now the Nh's are 40 and so on.

The value of P is 81/360, N is 360 and n=360/8=45. From this you work out the variance of the usual p from an SRS of 45. To get the variance for a systematic sample you have to consider each of the 8 possible samples and figure out how many of the sample houses are in the list given. For instance if we sample houses 1, 9, 17, ... I get 7 houses in the list: 33,41,89, 313, 321, 329 and 337. This leads to p=7/45. You get all 8 values of p, subtract 81/360, square, sum and divide by 8.

The idea here is that with an average household size of 5 you may well get long stretches in the sample where all the people sampled are children or conversely all are male heads of household. This high correlation means that systematic sampling probably has a large variance for estimating proportion of males or proportion of children. For Polish origin the clumping of neighbourhoods with large contiguous groups of the same ethnic background is exactly the situation where systematic sampling works best. Cochran says that for proportion of males the situation is intermediate; this is because the children are, in terms of sex, essentially in random order within families -- a simple fact of life. The adults are in periodic or at least quasi-periodic order however, with the males listed first.

The questions.