STAT 410 96-2 Assignment 1 Solutions
These solutions simply try to outline some of the issues, as I saw them.
There may well be other important points worth making and perhaps after
they are marked I can add some ideas drawn from students' solutions.
- Write a paragraph discussing methods for estimating the number of
spelling errors in the course text book. Do the same for estimating the number
of occurences of the letter `w'. Finally compare and contrast the
two problems.
The important difference between these two problems is that there will be lots
of w's and rather few spelling errors. A second important problem is that
of definition. What is a spelling error and how will you detect it? Will
you pick pages at random, lines at random on selected pages and then
look up each word or a randomly selected word in the dictionary? How will
you deal with alternative spellings? How will you deal with hyphenated words.
If you decide to examine each word on a selected line then hyphenated words
are only half on the line. If you include a hyphenated word both at the
beginning of a line and at the end it will get into the sample twice as
often as others. Remember that hyphenated words are longer and, perhaps,
more easily spelled incorrectly. As for the w's remember that they
sometimes show up in formulas. The Greek letter omega looks like a w and
might be in the book. Would you count it? Is a mistake in a formula ever
a spelling error?
- A survey organization carries out two surveys a week apart. In the
first they estimate NDP support at 40\% and in the second at 38\%.
They say that the surveys are each ``accurate to within 2 percentage
points 19 times out of 20''. How strong is the evidence that
NDP support has slipped. Indicate the assumptions you make to answer the
question.
The information about accuracy suggests that the standard errors of the
two numbers are each about 1 percentage point. The standard error of
the difference is about 1.4 (this is the square root of 2) and the
difference observed is thus about 1.4 standard errors of the difference.
Doing a z-test leads to a 1 sided P-value of about 0.08 but I reckon a
2 sided test is probably appropriate since we would probably ask the
same question about a 2 percentage point rise. In either case the evidence
of a real change is weak.
- Write a paragraph suggesting methods for estimating the number of
dandelions growing in my lawn.
Natural suggestions would have to do with measuring the size of the lawn,
dividing a map with a grid of say 50 cm by 50 cm squares. Picking some
at random, we would count the dandelions and note the number of dandelions
in each, then scale up by the usual SRS formulas. However, some of my property
has no lawn -- where the house is, for example. If we could measure the total
lawn area and the lawn area in the sampled squares we might get a more
accurate estimate of the total number of dandelions by remembering that the non-lawn
area doesn't count. We might also stratify by front and back yard if a preliminary
glance suggested there were many more in one yard than the other.
- Write a paragraph suggesting methods for estimating the number of
courses having a statistical component taught each year at SFU.
The central problem here is probably definitional. What is a course (count
each section, each semester a course is offered, each course which
is offered at least once, what about cross listings)? What is
`a statistical component'? At least one lecture only about data analysis?
Is `questionaire design' a statistical component? What about courses
in which standard experimental design is introduced but the course is
mostly about subject matter? Would you stratify by faculty, department,
level? For courses with a calendar description which does not clearly
indicate whether or not a statistical component is present what would you
do? Who are the respondents? Do you survey departments or instructors
or students even?
The questions.