STAT 350: Lecture 3 Example
Normal Equations for Simple Linear Regression
See Lecture 1 for the framework. Here I consider two models:
The data are
Dose | Count |
0 | 27043 |
0 | 26902 |
0 | 25959 |
150 | 27700 |
150 | 27530 |
150 | 27460 |
420 | 30650 |
420 | 30150 |
420 | 29480 |
900 | 34790 |
900 | 32020 |
1800 | 42280 |
1800 | 39370 |
1800 | 36200 |
3600 | 53230 |
3600 | 49260 |
3600 | 53030 |
The design matrix for the linear model is
You can compute in Minitab or Splus. That matrix has 4 numbers each of which is computed as the dot product of 2 columns of X. For instance the first column dotted with itself produces . Here is an example S session which reads in the data, produces the design matrices for the two models and computes .
[36]skekoowahts% S S-PLUS : Copyright (c) 1988, 1995 MathSoft, Inc. S : Copyright AT&T. Version 3.3 Release 1 for Sun SPARC, SunOS 5.3 : 1995 Working data will be in .Data # # The data are in a file called linear. The ! # tells S that what follows is not an S command but a standard # UNIX (or DOS) command # > !more linear Dose Count 0 27043 0 26902 0 25959 150 27700 150 27530 150 27460 420 30650 420 30150 420 29480 900 34790 900 32020 1800 42280 1800 39370 1800 36200 3600 53230 3600 49260 3600 53030 # # The function help(function) produces help for a function such as # > help(read.table) # # Read in the data from a file. The file has 18 lines: # 17 lines of data and a first line which has the names of the variables. # the function read.table reads such data and header=T warns S # that the first line is variable names. The first argument of # read.table is a character string containing the name of the file # to read from. # > x_read.table("linear",header=T) > x Dose Count 1 0 27043 2 0 26902 3 0 25959 4 150 27700 5 150 27530 6 150 27460 7 420 30650 8 420 30150 9 420 29480 10 900 34790 11 900 32020 12 1800 42280 13 1800 39370 14 1800 36200 15 3600 53230 16 3600 49260 17 3600 53030 # # the design matrix has a column of 1s and also # a column consisting of the first column of x # which is just the list of covariate values # The notation x[,1] picks out the first column of x # > design.mat_cbind(rep(1,17),x[,1]) # # To print out an object you type its name! # > design.mat [,1] [,2] [1,] 1 0 [2,] 1 0 [3,] 1 0 [4,] 1 150 [5,] 1 150 [6,] 1 150 [7,] 1 420 [8,] 1 420 [9,] 1 420 [10,] 1 900 [11,] 1 900 [12,] 1 1800 [13,] 1 1800 [14,] 1 1800 [15,] 1 3600 [16,] 1 3600 [17,] 1 3600 # # Compute X^T X -- this uses %*% to multiply matrices # and t(x) to compute the transpose of a matrix x. # > xprimex <- t(design.mat)%*% design.mat > xprimex [,1] [,2] [1,] 17 19710 [2,] 19710 50816700 # # Compute X^T Y # > xprimey <- t(design.mat)%*% x[,2] > xprimey [,1] [1,] 593054 [2,] 882452100 # # Next compute the least squares estimates by solving the # normal equations # > solve(xprimex,xprimey) [,1] [1,] 26806.734691 [2,] 6.968012 # # solve(A,b) computes the solution of Ax=b for A a # square matrix and b a vector. Of course x=A^{-1}b. # # # The next pice of code regresses the variable Count on # Dose taking the data from x. # > lm( Count~Dose,data=x) Call: lm(formula = Count ~ Dose, data = x) Coefficients: (Intercept) Dose 26806.73 6.968012 Degrees of freedom: 17 total; 15 residual Residual standard error: 1521.238 # # Notice that the estimates agree with our calculations # The residual standard error is the usual estimate of sigma # namely the square root of the Mean Square for Error. # # # Now we add a third column to fit the quadratic model # > design.mat2_cbind(design.mat,design.mat[,2]^2) # # Here is X^T X # > t(design.mat2)%*% design.mat2 [,1] [,2] [,3] [1,] 17 19710 5.081670e+07 [2,] 19710 50816700 1.591544e+11 [3,] 50816700 159154389000 5.367847e+14 # # Here is X^T Y # > t(design.mat2)%*% x[,2] [,1] [1,] 5.930540e+05 [2,] 8.824521e+08 [3,] 2.469275e+12 # # But the following illustrates the dangers of doing computations # blindly on the computer. The trouble is that the design matrix # has a third column which is so much larger that the first two. # > solve(t(design.mat2)%*% design.mat2, t(design.mat2)%*% x[,2]) Error in solve.qr(a, b): apparently singular matrix Dumped # # However, good packages know numerical techniques which avoid # the danger. # > lm(Count ~ Dose+Dose^2,data=x) Call: lm(formula = Count ~ Dose + Dose^2, data = x) Coefficients: (Intercept) Dose I(Dose^2) 26718.11 7.240314 -7.596867e-05 Degrees of freedom: 17 total; 14 residual Residual standard error: 1571.277 > # # WARNING: you can't tell from the size of the estimate # of an estimate such as that of beta_3 whether or not # it is important -- you have to compare it to values # of the corresponding covariate and to its standard error # > q() # Used to quit S: pay attention to () -- that part is essential!