next up previous

STAT 330 Lecture 33

Reading for Today's Lecture: 12.4, 12.5, 13 (all).

Goals of Today's Lecture:

Today's notes

Correlation Analysis

Correlation Coefficient (population and sample):

displaymath142

eqnarray21

Example: Father Son height data:

n=1078, tex2html_wrap_inline152 , tex2html_wrap_inline154 , tex2html_wrap_inline156 and r=0.5

Regression line:

displaymath160

Confidence intervals for tex2html_wrap_inline140 .

Step A: Get a confidence interval for Fisher's z transform of tex2html_wrap_inline140 , namely,

displaymath168

by taking

displaymath170

which is just

displaymath172

or

displaymath174

Step B: Get a confidence interval for tex2html_wrap_inline140 by undoing the ends of the interval, solving the equation

displaymath178

for tex2html_wrap_inline140 to get

displaymath182

where now we plug in for tex2html_wrap_inline184 the two ends of the interval in Step A. In our example the interval in A runs from 0.488 to 0.610 and so our interval for tex2html_wrap_inline140 runs from

displaymath188

or from 0.453 to 0.544.

Hypothesis tests for tex2html_wrap_inline190 :

Compute

displaymath192

and get P from normal tables.

These inferences for tex2html_wrap_inline140 are based on:

Fact: For large n and bivariate normal data,

displaymath200

has approximately a normal distribution with mean

displaymath168

and standard deviation

displaymath204

Remark: This is an example of what statisticians call "large sample theory" or "asymptotics". The formulas for the mean and variance of V are not exact. It is not possible to compute E(V) or Var(V) analytically. Instead the theory is based on "expansions" valid approximately for large n. Much of the research of academic statisticians is focused on deriving such approximations for new statistics.

Linear Models and Multiple Regression

Model equations: all the model equations we have seen have the form:

displaymath214

(except that we sometimes used the letter X where I have Y, that the index i labelling the different data points was sometimes a double or even triple subscript, and that the parameters were denoted with different Greek letters).

Examples:

Simple linear Regression:

displaymath222

Notice that the tex2html_wrap_inline224 in the equation above is just the number 1 here and tex2html_wrap_inline226 above is just tex2html_wrap_inline228 in the simple linear regression equation. Notice also that tex2html_wrap_inline230 is the intercept, previously denoted by tex2html_wrap_inline232 and tex2html_wrap_inline234 is the slope, previously denoted just by tex2html_wrap_inline236 .

One Way Layout:

eqnarray102

Special points. Using all the parameters tex2html_wrap_inline238 , tex2html_wrap_inline240 ``overparametrizes" the model. Remember we defined tex2html_wrap_inline242 and so tex2html_wrap_inline244 or

displaymath246

We use this to replace tex2html_wrap_inline248 in our model equations and get, for instance,

displaymath250

Two Way Layout without replicates:

displaymath252

with the restrictions tex2html_wrap_inline244 and tex2html_wrap_inline256 becomes:

eqnarray118

Multiple Regression

In multiple regression we have an equation like the above but with the tex2html_wrap_inline258 filled in with the values of more than 1 independent variable:

displaymath260

Example: We now regress hardness on SAND and FIBRE content. Previously we had treated each of these variables as merely having 3 categories. Now we use the values of those categories.


next up previous



Richard Lockhart
Tue Mar 17 15:49:26 PST 1998