![]() Chapter Contents |
![]() Previous |
![]() Next |
The TPSPLINE Procedure |
Suppose that Hm is a space of functions whose partial derivatives of total order m are in L2(Ed) where Ed is the domain of x.
Now, consider the data model
Using the notation from the section "The Penalized Least Squares Estimate", for a fixed ,estimate f by minimizing the penalized least squares function
There are several ways to define Jm(f). For the thin-plate smoothing spline, with x of dimension d, define Jm(f) as
When d=2 and m=2, Jm(f) is as follows:
In general, m and d must satisfy the condition that 2m - d > 0. For the sake of simplicity, the formulas and equations that follow assume m=2. Refer to Wahba (1990) and Bates et al. (1987) for more details.
Duchon (1976) showed that can be represented as
where
If you
define K = (K)ij = E2(xi-xj)
and T = (T)ij = (xij), the goal is to find
coefficients and
that minimize
A unique solution is guaranteed if the matrix T is of full rank
and .
If
and X = (T:Z), the expression for
becomes
The coefficients and
can be obtained by solving
To compute and
, let the QR decomposition of X be
where (Q1:Q2) is an orthogonal matrix and R is upper triangular, with XT Q2 = 0 (Dongarra et al. 1979).
Since ,
must be in the column space of
Q2. Therefore,
can be expressed
as
for
a vector
. Substituting
into
the preceding equation and multiplying through by Q2 T gives
or
The coefficient can be obtained by solving
The influence matrix is defined as
Similar to the regression case,
and if you consider the trace
of as the degrees of freedom for the information
signal and the trace of
as the degrees of freedom for the noise component, the
estimate
can be represented as
where is the residual sum of squares.
Theoretical properties of these estimates have not yet
been published. However, good
numerical results in simulation studies have been described by several
authors. For more information, refer to O'Sullivan and Wong (1987),
Nychka (1986a, 1986b, and 1988), and Hall and Titterington (1987).
where is the ith diagonal element of the
matrix and
is the
point of
the normal distribution. The confidence intervals are interpreted as
intervals "across the function" as opposed to point-wise
intervals.
Suppose that you fit a
spline estimate to experimental data that consists of a true function
f and a random error term, . In repeated experiments,
it is likely
that about
of the confidence intervals cover the
corresponding true values, although some values are covered every time
and other values are not covered by the confidence intervals most of
the time. This effect is more pronounced when the true surface or
surface has small regions of particularly rapid change.
A large heavily penalizes the mth derivative of the
function, thus forcing f(m) close to 0. The final
estimating function satisfies f(m)(x) = 0.
A small
places less of a penalty on rapid change in
f(m)(x), resulting in an estimate that tends to interpolate
the data points.
The smoothing parameter greatly affects the analysis,
and it should be selected with care. One method is to
perform several analyses with different values for and compare the resulting final estimates.
A more objective way to select the smoothing
parameter is to use the "leave-out-one" cross validation
function, which is an approximation of the predicted mean squares error. A
generalized version of the leave-out-one cross validation function is
proposed by Wahba (1990) and is easy to calculate. This Generalized Cross
Validation (GCV) function
is defined as
The justification for using the GCV function to select relies on asymptotic theory. Thus, you cannot expect good results for
very small sample sizes or when there is not enough information in the
data to separate the information signal from the noise component.
Simulation studies suggest that for independent and identically
distributed Gaussian noise, you can obtain reliable estimates of
for n greater than 25 or 30. Note that, even for large
values of n (say
), in extreme Monte Carlo simulations
there may be a small percentage of unwarranted extreme estimates in
which
or
(Wahba 1983).
Generally, if
is known to within an order of magnitude, the
occasional extreme case can be readily identified. As n gets larger,
the effect becomes weaker.
The GCV function is fairly robust against nonhomogeneity of variances and non-Gaussian errors (Villalobos and Wahba 1987). Andrews (1988) has provided favorable theoretical results when variances are unequal. However, this selection method is likely to give unsatisfactory results when the errors are highly correlated.
The GCV value may be suspect when is extremely small because
computed values may become indistinguishable from zero.
In practice, calculations with
or
near 0 can cause numerical instabilities resulting in an
unsatisfactory solution. Simulation studies have shown that
a
with
is small
enough that the final estimate based on this
almost
interpolates the data points. A GCV value based on a
may not be accurate.
![]() Chapter Contents |
![]() Previous |
![]() Next |
![]() Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.