No Title

$next$ $up$ $previous$

STAT 350: Lecture 10 Example

Confidence intervals for

Refer to the polynomial regression example (data on insurance costs). I fit polynomials of degree 1 through 5. Each model gives a vector of fitted parameters and to predict the mean value of Y at time t we use when the fitted polynomial has degree p. The SAS code below computes both this fitted value and standard errors for each of the 5 models. Notice how I run proc glm 5 times to get the 5 different values.

data insure; infile 'insure.dat' firstobs=2; input year cost; code = year - 1975.5 ; proc glm data=insure; model cost = code ; estimate 'fit1982.25' intercept 1 code 6.75 / E; run ; proc glm data=insure; model cost = code code*code; estimate 'fit1982.25' intercept 1 code 6.75 code*code 45.5625 / E; run ; proc glm data=insure; model cost = code code*code code*code*code; estimate 'fit1982.25' intercept 1 code 6.75 code*code 45.5625 code*code*code 307.5469/ E; run ; proc glm data=insure; model cost = code code*code code*code*code code*code*code*code; estimate 'fit1982.25' intercept 1 code 6.75 code*code 45.5625 code*code*code 307.5469 code*code*code*code 2075.9414 / E; run ; proc glm data=insure; model cost = code code*code code*code*code code*code*code*code code*code*code*code*code; estimate 'fit1982.25' intercept 1 code 6.75 code*code 45.5625 code*code*code 307.5469 code*code*code*code 2075.9414 code*code*code*code*code 14012.6045/ E; run ;
The line estimate ... is probably unfamiliar to you. You have to give the values of each column of the design matrix at the place where you want to estimate . Notice that I had to work out each power of 6.75 by hand.
Now have a look at the edited output. I show here only the 5th degree polynomial results.
General Linear Models Procedure Coefficients for estimate fit1982.25 INTERCEPT 1 CODE 6.75 CODE*CODE 45.5625 CODE*CODE*CODE 307.5469 CODE*CODE*CODE*CODE 2075.9414 COD*COD*COD*COD*CODE 14012.6045 Dependent Variable: COST Sum of Mean Source DF Squares Square F Value Pr > F Model 5 3935.2507732 787.0501546 2147.50 0.0001 Error 4 1.4659868 0.3664967 Corrected Total 9 3936.7167600 R-Square C.V. Root MSE COST Mean 0.999628 0.851438 0.6053897 71.102000 Source DF Type I SS Mean Square F Value Pr > F CODE 1 3328.3209709 3328.3209709 9081.45 0.0001 CODE*CODE 1 298.6522917 298.6522917 814.88 0.0001 CODE*CODE*CODE 1 278.9323940 278.9323940 761.08 0.0001 CODE*CODE*CODE*CODE 1 0.0006756 0.0006756 0.00 0.9678 COD*COD*COD*COD*CODE 1 29.3444412 29.3444412 80.07 0.0009 Source DF Type III SS Mean Square F Value Pr > F CODE 1 0.88117350 0.88117350 2.40 0.1959 CODE*CODE 1 20.86853994 20.86853994 56.94 0.0017 CODE*CODE*CODE 1 72.35876312 72.35876312 197.43 0.0001 CODE*CODE*CODE*CODE 1 0.00067556 0.00067556 0.00 0.9678 COD*COD*COD*COD*CODE 1 29.34444115 29.34444115 80.07 0.0009 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate fit1982.25 70.2630583 4.33 0.0123 16.2154539 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT 64.88753906 176.14 0.0001 0.36839358 CODE -0.50238411 -1.55 0.1959 0.32399642 CODE*CODE 0.75623470 7.55 0.0017 0.10021797 CODE*CODE*CODE 0.80157430 14.05 0.0001 0.05704706 CODE*CODE*CODE*CODE -0.00020251 -0.04 0.9678 0.00471673 COD*COD*COD*COD*CODE -0.01939615 -8.95 0.0009 0.00216764

While we have this output notice the value of which is quite close to 1 and the t-tests of hypotheses that various parameters are 0.
Here is a table of the results of all the forecasts with associated standard errors:

Degree Estimate SE

1 113.98 7.04

2 142.04 12.06

3 204.74 9.45

4 204.50 25.24

5 70.26 16.22

Notice that the standard errors are so small that there is no way that the forecasts from various different degree fits can be reconciled. The problem is that it is very likely that a crucial assumption is not right for this problem, namely, the assumption that the mean of Y is exactly a polynomial of degree 5 (or 3 or whatever). Notice also that adding a term to the model without improving the fit, as in going from degree 3 to degree 4, increases the SE of the prediction greatly.
One final point. The calculations give a confidence interval for based on the distribution of . For the insurance the quantity of interest is . In this formula, is a future value associated with the covariate value x. The prediction can be split up, if the model is correct, as

which is a sum of two independent random variables. The first has variance while the second has variance . An estimate of the square root of the second quantity was printed out by SAS. The Mean Squared Error is an estimate of the first. The estimated standard deviation of is the square root of the sum of the squares of these two quantities which comes, for the 5th degree polynomial to which is only slightly larger, $16.23. Notice that this accuracy is spurious; the major source of error is in the model for the mean of Y which is surely not a 5th degree polynomial.
You can see the principal by deleting the observation for 1980 and then fitting the different polynomials:

Degree Estimate SE

1 91.50 4.26

2 98.22 7.34

3 121.39 4.74

4 132.40 6.73

5 472.34 66.15

The true value is actually $115.19 so most of the forecasts would have been dreadful. Again all the SE's are for errors in predicting the mean not an individual value but for the higher degree polynomials this makes no difference.

$next$ $up$ $previous$

Richard Lockhart
Mon Mar 3 11:18:48 PST 1997

Degree	Estimate	SE
1	113.98	7.04
2	142.04	12.06
3	204.74	9.45
4	204.50	25.24
5	70.26	16.22

Degree	Estimate	SE
1	91.50	4.26
2	98.22	7.34
3	121.39	4.74
4	132.40	6.73
5	472.34	66.15