STAT 410 96-2 Assignment 1 Solutions



These solutions simply try to outline some of the issues, as I saw them. There may well be other important points worth making and perhaps after they are marked I can add some ideas drawn from students' solutions.
  1. Write a paragraph discussing methods for estimating the number of spelling errors in the course text book. Do the same for estimating the number of occurences of the letter `w'. Finally compare and contrast the two problems.


  2. The important difference between these two problems is that there will be lots of w's and rather few spelling errors. A second important problem is that of definition. What is a spelling error and how will you detect it? Will you pick pages at random, lines at random on selected pages and then look up each word or a randomly selected word in the dictionary? How will you deal with alternative spellings? How will you deal with hyphenated words. If you decide to examine each word on a selected line then hyphenated words are only half on the line. If you include a hyphenated word both at the beginning of a line and at the end it will get into the sample twice as often as others. Remember that hyphenated words are longer and, perhaps, more easily spelled incorrectly. As for the w's remember that they sometimes show up in formulas. The Greek letter omega looks like a w and might be in the book. Would you count it? Is a mistake in a formula ever a spelling error?

  3. A survey organization carries out two surveys a week apart. In the first they estimate NDP support at 40\% and in the second at 38\%. They say that the surveys are each ``accurate to within 2 percentage points 19 times out of 20''. How strong is the evidence that NDP support has slipped. Indicate the assumptions you make to answer the question.


  4. The information about accuracy suggests that the standard errors of the two numbers are each about 1 percentage point. The standard error of the difference is about 1.4 (this is the square root of 2) and the difference observed is thus about 1.4 standard errors of the difference. Doing a z-test leads to a 1 sided P-value of about 0.08 but I reckon a 2 sided test is probably appropriate since we would probably ask the same question about a 2 percentage point rise. In either case the evidence of a real change is weak.

  5. Write a paragraph suggesting methods for estimating the number of dandelions growing in my lawn.


  6. Natural suggestions would have to do with measuring the size of the lawn, dividing a map with a grid of say 50 cm by 50 cm squares. Picking some at random, we would count the dandelions and note the number of dandelions in each, then scale up by the usual SRS formulas. However, some of my property has no lawn -- where the house is, for example. If we could measure the total lawn area and the lawn area in the sampled squares we might get a more accurate estimate of the total number of dandelions by remembering that the non-lawn area doesn't count. We might also stratify by front and back yard if a preliminary glance suggested there were many more in one yard than the other.

  7. Write a paragraph suggesting methods for estimating the number of courses having a statistical component taught each year at SFU.


  8. The central problem here is probably definitional. What is a course (count each section, each semester a course is offered, each course which is offered at least once, what about cross listings)? What is `a statistical component'? At least one lecture only about data analysis? Is `questionaire design' a statistical component? What about courses in which standard experimental design is introduced but the course is mostly about subject matter? Would you stratify by faculty, department, level? For courses with a calendar description which does not clearly indicate whether or not a statistical component is present what would you do? Who are the respondents? Do you survey departments or instructors or students even?
The questions.