Identification of Models

Introduction to Structural Equations with Latent Variables

Identification of Models

Unfortunately, if you try to fit models of Form B or Form C without additional constraints, you cannot obtain unique estimates of the parameters. These models have four parameters (one coefficient and three variances). The covariance matrix of the observed variables Y and X has only three elements that are free to vary, since Cov(Y,X)=Cov(X,Y). The covariance structure can, therefore, be expressed as three equations in four unknown parameters. Since there are fewer equations than unknowns, there are many different sets of values for the parameters that provide a solution for the equations. Such a model is said to be underidentified.

If the number of parameters equals the number of free elements in the covariance matrix, then there may exist a unique set of parameter estimates that exactly reproduce the observed covariance matrix. In this case, the model is said to be just identified or saturated.

If the number of parameters is less than the number of free elements in the covariance matrix, there may exist no set of parameter estimates that reproduces the observed covariance matrix. In this case, the model is said to be overidentified. Various statistical criteria, such as maximum likelihood, can be used to choose parameter estimates that approximately reproduce the observed covariance matrix. If you use ML or GLS estimation, PROC CALIS can perform a statistical test of the goodness of fit of the model under the assumption of multivariate normality of all variables and independence of the observations.

If the model is just identified or overidentified, it is said to be identified. If you use ML or GLS estimation for an identified model, PROC CALIS can compute approximate standard errors for the parameter estimates. For underidentified models, PROC CALIS obtains approximate standard errors by imposing additional constraints resulting from the use of a generalized inverse of the Hessian matrix.

You cannot guarantee that a model is identified simply by counting the parameters. For example, for any latent variable, you must specify a numeric value for the variance, or for some covariance involving the variable, or for a coefficient of the variable in at least one equation. Otherwise, the scale of the latent variable is indeterminate, and the model will be underidentified regardless of the number of parameters and the size of the covariance matrix. As another example, an exploratory factor analysis with two or more common factors is always underidentified because you can rotate the common factors without affecting the fit of the model.

PROC CALIS can usually detect an underidentified model by computing the approximate covariance matrix of the parameter estimates and checking whether any estimate is linearly related to other estimates (Bollen 1989, pp. 248 -250), in which case PROC CALIS displays equations showing the linear relationships among the estimates. Another way to obtain empirical evidence regarding the identification of a model is to run the analysis several times with different initial estimates to see if the same final estimates are obtained.

Bollen (1989) provides detailed discussions of conditions for identification in a variety of models.

The following example is inspired by Fuller (1987, pp. 40 -41). The hypothetical data are counts of two types of cells, cells forming rosettes and nucleated cells, in spleen samples. It is reasonable to assume that counts have a Poisson distribution; hence, the square roots of the counts should have a constant error variance of 0.25.

You can use PROC CALIS to fit a model of Form C to the square roots of the counts without constraints on the parameters, as displayed in following statements. The option OMETHOD=QUANEW is used in the PROC CALIS statement because in this case it produces more rapid convergence than the default optimization method.

   data spleen;
      input rosette nucleate;
      sqrtrose=sqrt(rosette);
      sqrtnucl=sqrt(nucleate);
      datalines;
   4 62
   5 87
   5 117
   6 142
   8 212
   9 120
   12 254
   13 179
   15 125
   19 182
   28 301
   51 357
   ;

   proc calis data=spleen cov omethod=quanew;
      lineqs sqrtrose=factrose + err_rose,
             sqrtnucl=factnucl + err_nucl,
             factrose=beta factnucl;
      std err_rose=v_rose,
          err_nucl=v_nucl,
          factnucl=v_factnu;
   run;

This model is underidentified. PROC CALIS displays the following warning:

   WARNING: Problem not identified: More parameters to estimate ( 4 ) 
            than given values in data matrix ( 3 ).

and diagnoses the indeterminacy as follows:

   NOTE: Hessian matrix is not full rank. Not all parameters are identified.
         Some parameter estimates are linearly related to other parameter
         estimates as shown in the following equations:

   v_nucl  =  -10.554977 - 0.036438 * beta + 1.00000 * v_factnu 
              + 0.149564 * v_rose

The constraint that the error variances equal 0.25 can be imposed by modifying the STD statement:

   proc calis data=spleen cov stderr;
      lineqs sqrtrose=factrose + err_rose,
             sqrtnucl=factnucl + err_nucl,
             factrose=beta factnucl;
      std err_rose=.25,
          err_nucl=.25,
          factnucl=v_factnu;
   run;

The resulting parameter estimates are displayed in Figure 14.2.

The CALIS Procedure

Covariance Structure Analysis: Maximum Likelihood Estimation

factrose	=	0.4034	*	factnucl
Std Err		0.0508		beta
t Value		7.9439

Variances of Exogenous Variables
Variable	Parameter	Estimate	Standard Error	t Value
factnucl	v_factnu	10.45846	4.56608	2.29
err_rose		0.25000
err_nucl		0.25000

Figure 14.2: Spleen Data: Parameter Estimates for Overidentified Model

This model is overidentified and the chi-square goodness-of-fit test yields a p-value of 0.0219, as displayed in Figure 14.3.

The CALIS Procedure

Covariance Structure Analysis: Maximum Likelihood Estimation

Fit Function	0.4775
Goodness of Fit Index (GFI)	0.7274
GFI Adjusted for Degrees of Freedom (AGFI)	0.1821
Root Mean Square Residual (RMR)	0.1785
Parsimonious GFI (Mulaik, 1989)	0.7274
Chi-Square	5.2522
Chi-Square DF	1
Pr > Chi-Square	0.0219
Independence Model Chi-Square	13.273
Independence Model Chi-Square DF	1
RMSEA Estimate	0.6217
RMSEA 90% Lower Confidence Limit	0.1899
RMSEA 90% Upper Confidence Limit	1.1869
ECVI Estimate	0.9775
ECVI 90% Lower Confidence Limit	.
ECVI 90% Upper Confidence Limit	2.2444
Probability of Close Fit	0.0237
Bentler's Comparative Fit Index	0.6535
Normal Theory Reweighted LS Chi-Square	9.5588
Akaike's Information Criterion	3.2522
Bozdogan's (1987) CAIC	1.7673
Schwarz's Bayesian Criterion	2.7673
McDonald's (1989) Centrality	0.8376
Bentler & Bonett's (1980) Non-normed Index	0.6535
Bentler & Bonett's (1980) NFI	0.6043
James, Mulaik, & Brett (1982) Parsimonious NFI	0.6043
Z-Test of Wilson & Hilferty (1931)	2.0375
Bollen (1986) Normed Index Rho1	0.6043
Bollen (1988) Non-normed Index Delta2	0.6535
Hoelter's (1983) Critical N	10

Figure 14.3: Spleen Data: Fit Statistics for Overidentified Model

The sample size is so small that the p-value should not be taken to be accurate, but to get a small p-value with such a small sample indicates it is possible that the model is seriously deficient. The deficiency could be due to any of the following:

The error variances are not both equal to 0.25.
The error terms are correlated with each other or with the true scores.
The observations are not independent.
There is a disturbance in the linear relation between factrose and factnucl.
The relation between factrose and factnucl is not linear.
The actual distributions are not adequately approximated by the multivariate normal distribution.

A simple and plausible modification to the model is to add a "disturbance term" or "error in the equation" to the structural model, as follows.

   proc calis data=spleen cov stderr;
      lineqs sqrtrose=factrose + err_rose,
             sqrtnucl=factnucl + err_nucl,
             factrose=beta factnucl + disturb;
      std err_rose=.25,
          err_nucl=.25,
          factnucl=v_factnu,
          disturb=v_dist;
   run;

The following parameter estimates are produced.

The CALIS Procedure

Covariance Structure Analysis: Maximum Likelihood Estimation

factrose	=	0.3907	*	factnucl	+	1.0000	disturb
Std Err		0.0771		beta
t Value		5.0692

Variances of Exogenous Variables
Variable	Parameter	Estimate	Standard Error	t Value
factnucl	v_factnu	10.50458	4.58577	2.29
err_rose		0.25000
err_nucl		0.25000
disturb	v_dist	0.38153	0.28556	1.34

Figure 14.4: Spleen Data: Parameter Estimated for Just Identified Model

This model is just identified, so there are no degrees of freedom for the chi-square goodness-of-fit test.

Chapter Contents
Previous
Next
Top