No Title

Normal Equations for Simple Linear Regression

Dose Count

0 27043

0 26902

0 25959

150 27700

150 27530

150 27460

420 30650

420 30150

420 29480

900 34790

900 32020

1800 42280

1800 39370

1800 36200

3600 53230

3600 49260

3600 53030

You can compute

in Minitab or Splus. That matrix has 4 numbers each of which is computed as the dot product of 2 columns of X. For instance the first column dotted with itself produces

. Here is an example S session which reads in the data, produces the design matrices for the two models and computes

[36]skekoowahts% S S-PLUS : Copyright (c) 1988, 1995 MathSoft, Inc. S : Copyright AT&T. Version 3.3 Release 1 for Sun SPARC, SunOS 5.3 : 1995 Working data will be in .Data # # The data are in a file called linear. The ! # tells S that what follows is not an S command but a standard # UNIX (or DOS) command # > !more linear Dose Count 0 27043 0 26902 0 25959 150 27700 150 27530 150 27460 420 30650 420 30150 420 29480 900 34790 900 32020 1800 42280 1800 39370 1800 36200 3600 53230 3600 49260 3600 53030 # # The function help(function) produces help for a function such as # > help(read.table) # # Read in the data from a file. The file has 18 lines: # 17 lines of data and a first line which has the names of the variables. # the function read.table reads such data and header=T warns S # that the first line is variable names. The first argument of # read.table is a character string containing the name of the file # to read from. # > x_read.table("linear",header=T) > x Dose Count 1 0 27043 2 0 26902 3 0 25959 4 150 27700 5 150 27530 6 150 27460 7 420 30650 8 420 30150 9 420 29480 10 900 34790 11 900 32020 12 1800 42280 13 1800 39370 14 1800 36200 15 3600 53230 16 3600 49260 17 3600 53030 # # the design matrix has a column of 1s and also # a column consisting of the first column of x # which is just the list of covariate values # The notation x[,1] picks out the first column of x # > design.mat_cbind(rep(1,17),x[,1]) # # To print out an object you type its name! # > design.mat [,1] [,2] [1,] 1 0 [2,] 1 0 [3,] 1 0 [4,] 1 150 [5,] 1 150 [6,] 1 150 [7,] 1 420 [8,] 1 420 [9,] 1 420 [10,] 1 900 [11,] 1 900 [12,] 1 1800 [13,] 1 1800 [14,] 1 1800 [15,] 1 3600 [16,] 1 3600 [17,] 1 3600 # # Compute X^T X -- this uses %*% to multiply matrices # and t(x) to compute the transpose of a matrix x. # > xprimex <- t(design.mat)%*% design.mat > xprimex [,1] [,2] [1,] 17 19710 [2,] 19710 50816700 # # Compute X^T Y # > xprimey <- t(design.mat)%*% x[,2] > xprimey [,1] [1,] 593054 [2,] 882452100 # # Next compute the least squares estimates by solving the # normal equations # > solve(xprimex,xprimey) [,1] [1,] 26806.734691 [2,] 6.968012 # # solve(A,b) computes the solution of Ax=b for A a # square matrix and b a vector. Of course x=A^{-1}b. # # # The next pice of code regresses the variable Count on # Dose taking the data from x. # > lm( Count~Dose,data=x) Call: lm(formula = Count ~ Dose, data = x) Coefficients: (Intercept) Dose 26806.73 6.968012 Degrees of freedom: 17 total; 15 residual Residual standard error: 1521.238 # # Notice that the estimates agree with our calculations # The residual standard error is the usual estimate of sigma # namely the square root of the Mean Square for Error. # # # Now we add a third column to fit the quadratic model # > design.mat2_cbind(design.mat,design.mat[,2]^2) # # Here is X^T X # > t(design.mat2)%*% design.mat2 [,1] [,2] [,3] [1,] 17 19710 5.081670e+07 [2,] 19710 50816700 1.591544e+11 [3,] 50816700 159154389000 5.367847e+14 # # Here is X^T Y # > t(design.mat2)%*% x[,2] [,1] [1,] 5.930540e+05 [2,] 8.824521e+08 [3,] 2.469275e+12 # # But the following illustrates the dangers of doing computations # blindly on the computer. The trouble is that the design matrix # has a third column which is so much larger that the first two. # > solve(t(design.mat2)%*% design.mat2, t(design.mat2)%*% x[,2]) Error in solve.qr(a, b): apparently singular matrix Dumped # # However, good packages know numerical techniques which avoid # the danger. # > lm(Count ~ Dose+Dose^2,data=x) Call: lm(formula = Count ~ Dose + Dose^2, data = x) Coefficients: (Intercept) Dose I(Dose^2) 26718.11 7.240314 -7.596867e-05 Degrees of freedom: 17 total; 14 residual Residual standard error: 1571.277 > # # WARNING: you can't tell from the size of the estimate # of an estimate such as that of beta_3 whether or not # it is important -- you have to compare it to values # of the corresponding covariate and to its standard error # > q() # Used to quit S: pay attention to () -- that part is essential!

Dose	Count

0	27043
0	26902
0	25959
150	27700
150	27530
150	27460
420	30650
420	30150
420	29480
900	34790
900	32020
1800	42280
1800	39370
1800	36200
3600	53230
3600	49260
3600	53030