Chapter Contents |
Previous |
Next |
The PLS Procedure |
The following example demonstrates issues in spectrometric calibration. The data (Umetrics 1995) consist of spectrographic readings on 33 samples containing known concentrations of two amino acids, tyrosine and tryptophan. The spectra are measured at 30 frequencies across the overall range of frequencies. For example, Figure 51.11 shows the observed spectra for three samples, one with only tryptophan, one with only tyrosine, and one with a mixture of the two, all at a total concentration of 10-6.
Of the 33 samples, 18 are used as a training set and 15 as a test set. The data originally appear in McAvoy et al. (1989).
These data were created in a lab, with the concentrations fixed in order to provide a wide range of applicability for the model. You want to use a linear function of the logarithms of the spectra to predict the logarithms of tyrosine and tryptophan concentration, as well as the logarithm of the total concentration. Actually, because of the possibility of zeros in both the responses and the predictors, slightly different transformations are used. The following statements create SAS data sets containing the training and test data, named ftrain and ftest, respectively:
data ftrain; input obsnam $ tot tyr f1-f30 @@; try = tot - tyr; if (tyr) then tyr_log = log10(tyr); else tyr_log = -8; if (try) then try_log = log10(try); else try_log = -8; tot_log = log10(tot); datalines; 17mix35 0.00003 0 -6.215 -5.809 -5.114 -3.963 -2.897 -2.269 -1.675 -1.235 -0.900 -0.659 -0.497 -0.395 -0.335 -0.315 -0.333 -0.377 -0.453 -0.549 -0.658 -0.797 -0.878 -0.954 -1.060 -1.266 -1.520 -1.804 -2.044 -2.269 -2.496 -2.714 19mix35 0.00003 3E-7 -5.516 -5.294 -4.823 -3.858 -2.827 -2.249 -1.683 -1.218 -0.907 -0.658 -0.501 -0.400 -0.345 -0.323 -0.342 -0.387 -0.461 -0.554 -0.665 -0.803 -0.887 -0.960 -1.072 -1.272 -1.541 -1.814 -2.058 -2.289 -2.496 -2.712 21mix35 0.00003 7.5E-7 -5.519 -5.294 -4.501 -3.863 -2.827 -2.280 -1.716 -1.262 -0.939 -0.694 -0.536 -0.444 -0.384 -0.369 -0.377 -0.421 -0.495 -0.596 -0.706 -0.824 -0.917 -0.988 -1.103 -1.294 -1.565 -1.841 -2.084 -2.320 -2.521 -2.729 23mix35 0.00003 1.5E-6 -5.294 -4.705 -4.262 -3.605 -2.726 -2.239 -1.681 -1.250 -0.925 -0.697 -0.534 -0.437 -0.381 -0.359 -0.369 -0.426 -0.499 -0.591 -0.701 -0.843 -0.925 -0.989 -1.109 -1.310 -1.579 -1.852 -2.090 -2.316 -2.521 -2.743 25mix35 0.00003 3E-6 -4.600 -4.069 -3.764 -3.262 -2.598 -2.191 -1.680 -1.273 -0.958 -0.729 -0.573 -0.470 -0.422 -0.407 -0.422 -0.468 -0.538 -0.639 -0.753 -0.887 -0.968 -1.037 -1.147 -1.357 -1.619 -1.886 -2.141 -2.359 -2.585 -2.792 27mix35 0.00003 7.5E-6 -3.812 -3.376 -3.026 -2.726 -2.249 -1.919 -1.541 -1.198 -0.951 -0.764 -0.639 -0.570 -0.528 -0.525 -0.550 -0.606 -0.689 -0.781 -0.909 -1.031 -1.126 -1.191 -1.303 -1.503 -1.784 -2.058 -2.297 -2.507 -2.727 -2.970 29mix35 0.00003 0.000015 -3.053 -2.641 -2.382 -2.194 -1.977 -1.913 -1.728 -1.516 -1.317 -1.158 -1.029 -0.963 -0.919 -0.915 -0.933 -0.981 -1.055 -1.157 -1.271 -1.409 -1.505 -1.546 -1.675 -1.880 -2.140 -2.415 -2.655 -2.879 -3.075 -3.319 28mix35 0.00003 0.0000225 -2.626 -2.248 -2.004 -1.839 -1.742 -1.791 -1.786 -1.772 -1.728 -1.666 -1.619 -1.591 -1.575 -1.580 -1.619 -1.671 -1.754 -1.857 -1.982 -2.114 -2.210 -2.258 -2.379 -2.570 -2.858 -3.117 -3.347 -3.568 -3.764 -4.012 26mix35 0.00003 0.000027 -2.370 -1.990 -1.754 -1.624 -1.560 -1.655 -1.772 -1.899 -1.982 -2.074 -2.157 -2.211 -2.267 -2.317 -2.369 -2.460 -2.545 -2.668 -2.807 -2.951 -3.030 -3.075 -3.214 -3.376 -3.685 -3.907 -4.129 -4.335 -4.501 -4.599 24mix35 0.00003 0.0000285 -2.326 -1.952 -1.702 -1.583 -1.507 -1.629 -1.771 -1.945 -2.115 -2.297 -2.448 -2.585 -2.696 -2.808 -2.913 -3.030 -3.163 -3.265 -3.376 -3.534 -3.642 -3.721 -3.858 -4.012 -4.262 -4.501 -4.704 -4.822 -4.956 -5.292 22mix35 0.00003 0.00002925 -2.277 -1.912 -1.677 -1.556 -1.487 -1.630 -1.791 -1.969 -2.203 -2.437 -2.655 -2.844 -3.032 -3.214 -3.378 -3.503 -3.646 -3.812 -3.958 -4.129 -4.193 -4.262 -4.415 -4.501 -4.823 -5.111 -5.113 -5.294 -5.290 -5.294 20mix35 0.00003 0.0000297 -2.266 -1.912 -1.688 -1.546 -1.500 -1.640 -1.801 -2.011 -2.277 -2.545 -2.823 -3.094 -3.376 -3.572 -3.812 -4.012 -4.262 -4.415 -4.501 -4.705 -4.823 -4.823 -4.956 -5.111 -5.111 -5.516 -5.524 -5.806 -5.806 -5.806 18mix35 0.00003 0.00003 -2.258 -1.900 -1.666 -1.524 -1.479 -1.621 -1.803 -2.043 -2.308 -2.626 -2.895 -3.214 -3.568 -3.907 -4.193 -4.423 -4.825 -5.111 -5.111 -5.516 -5.516 -5.516 -5.516 -5.806 -5.806 -5.806 -5.806 -5.806 -6.210 -6.215 trp2 0.0001 0 -5.922 -5.435 -4.366 -3.149 -2.124 -1.392 -0.780 -0.336 -0.002 0.233 0.391 0.490 0.540 0.563 0.541 0.488 0.414 0.313 0.203 0.063 -0.028 -0.097 -0.215 -0.411 -0.678 -0.953 -1.208 -1.418 -1.651 -1.855 mix5 0.0001 0.00001 -3.932 -3.411 -2.964 -2.462 -1.836 -1.308 -0.796 -0.390 -0.076 0.147 0.294 0.394 0.446 0.460 0.443 0.389 0.314 0.220 0.099 -0.033 -0.128 -0.197 -0.308 -0.506 -0.785 -1.050 -1.313 -1.529 -1.745 -1.970 mix4 0.0001 0.000025 -2.996 -2.479 -2.099 -1.803 -1.459 -1.126 -0.761 -0.424 -0.144 0.060 0.195 0.288 0.337 0.354 0.330 0.274 0.206 0.105 -0.009 -0.148 -0.242 -0.306 -0.424 -0.626 -0.892 -1.172 -1.425 -1.633 -1.877 -2.071 mix3 0.0001 0.00005 -2.128 -1.661 -1.344 -1.160 -0.996 -0.877 -0.696 -0.495 -0.313 -0.165 -0.042 0.032 0.069 0.079 0.050 -0.006 -0.082 -0.179 -0.295 -0.436 -0.523 -0.584 -0.706 -0.898 -1.178 -1.446 -1.696 -1.922 -2.128 -2.350 mix6 0.0001 0.00009 -1.140 -0.757 -0.497 -0.362 -0.329 -0.412 -0.513 -0.647 -0.772 -0.877 -0.958 -1.040 -1.104 -1.162 -1.233 -1.317 -1.425 -1.543 -1.661 -1.804 -1.877 -1.959 -2.034 -2.249 -2.502 -2.732 -2.964 -3.142 -3.313 -3.576 ; data ftest; input obsnam $ tot tyr f1-f30 @@; try = tot - tyr; if (tyr) then tyr_log = log10(tyr); else tyr_log = -8; if (try) then try_log = log10(try); else try_log = -8; tot_log = log10(tot); datalines; 43trp6 1E-6 0 -5.915 -5.918 -6.908 -5.428 -4.117 -5.103 -4.660 -4.351 -4.023 -3.849 -3.634 -3.634 -3.572 -3.513 -3.634 -3.572 -3.772 -3.772 -3.844 -3.932 -4.017 -4.023 -4.117 -4.227 -4.492 -4.660 -4.855 -5.428 -5.103 -5.428 59mix6 1E-6 1E-7 -5.903 -5.903 -5.903 -5.082 -4.213 -5.083 -4.838 -4.639 -4.474 -4.213 -4.001 -4.098 -4.001 -4.001 -3.907 -4.001 -4.098 -4.098 -4.206 -4.098 -4.213 -4.213 -4.335 -4.474 -4.639 -4.838 -4.837 -5.085 -5.410 -5.410 51mix6 1E-6 2.5E-7 -5.907 -5.907 -5.415 -4.843 -4.213 -4.843 -4.843 -4.483 -4.343 -4.006 -4.006 -3.912 -3.830 -3.830 -3.755 -3.912 -4.006 -4.001 -4.213 -4.213 -4.335 -4.483 -4.483 -4.642 -4.841 -5.088 -5.088 -5.415 -5.415 -5.415 49mix6 1E-6 5E-7 -5.419 -5.091 -5.091 -4.648 -4.006 -4.846 -4.648 -4.483 -4.343 -4.220 -4.220 -4.220 -4.110 -4.110 -4.110 -4.220 -4.220 -4.343 -4.483 -4.483 -4.650 -4.650 -4.846 -4.846 -5.093 -5.091 -5.419 -5.417 -5.417 -5.907 53mix6 1E-6 7.5E-7 -5.083 -4.837 -4.837 -4.474 -3.826 -4.474 -4.639 -4.838 -4.837 -4.639 -4.639 -4.641 -4.641 -4.639 -4.639 -4.837 -4.838 -4.838 -5.083 -5.082 -5.083 -5.410 -5.410 -5.408 -5.408 -5.900 -5.410 -5.903 -5.900 -6.908 57mix6 1E-6 9E-7 -5.082 -4.836 -4.639 -4.474 -3.826 -4.636 -4.638 -4.638 -4.837 -5.082 -5.082 -5.408 -5.082 -5.080 -5.408 -5.408 -5.408 -5.408 -5.408 -5.408 -5.408 -5.900 -5.900 -5.900 -5.900 -5.900 -5.900 -5.900 -6.908 -6.908 41tyro6 1E-6 1E-6 -5.104 -4.662 -4.662 -4.358 -3.705 -4.501 -4.662 -4.859 -5.104 -5.431 -5.433 -5.918 -5.918 -5.918 -5.431 -5.918 -5.918 -5.918 -5.918 -5.918 -5.918 -5.918 -5.918 -6.908 -5.918 -5.918 -6.908 -6.908 -5.918 -5.918 28trp5 0.00001 0 -5.937 -5.937 -5.937 -4.526 -3.544 -3.170 -2.573 -2.115 -1.792 -1.564 -1.400 -1.304 -1.244 -1.213 -1.240 -1.292 -1.373 -1.453 -1.571 -1.697 -1.801 -1.873 -2.008 -2.198 -2.469 -2.706 -2.990 -3.209 -3.384 -3.601 37mix5 0.00001 1E-6 -5.109 -4.865 -4.501 -4.029 -3.319 -3.070 -2.569 -2.207 -1.895 -1.684 -1.516 -1.423 -1.367 -1.348 -1.374 -1.415 -1.503 -1.596 -1.718 -1.839 -1.927 -1.997 -2.118 -2.333 -2.567 -2.874 -3.106 -3.313 -3.579 -3.781 33mix5 0.00001 2.5E-6 -4.366 -4.129 -3.781 -3.467 -3.037 -2.939 -2.593 -2.268 -1.988 -1.791 -1.649 -1.565 -1.520 -1.509 -1.524 -1.580 -1.665 -1.758 -1.882 -2.037 -2.090 -2.162 -2.284 -2.465 -2.761 -3.037 -3.270 -3.520 -3.709 -3.937 31mix5 0.00001 5E-6 -3.790 -3.373 -3.119 -2.915 -2.671 -2.718 -2.555 -2.398 -2.229 -2.085 -1.971 -1.902 -1.860 -1.837 -1.881 -1.949 -2.009 -2.127 -2.230 -2.381 -2.455 -2.513 -2.624 -2.827 -3.117 -3.373 -3.586 -3.785 -4.040 -4.366 35mix5 0.00001 7.5E-6 -3.321 -2.970 -2.765 -2.594 -2.446 -2.548 -2.616 -2.617 -2.572 -2.550 -2.508 -2.487 -2.488 -2.487 -2.529 -2.593 -2.688 -2.792 -2.908 -3.037 -3.149 -3.189 -3.273 -3.467 -3.781 -4.029 -4.241 -4.501 -4.669 -4.865 39mix5 0.00001 9E-6 -3.142 -2.812 -2.564 -2.404 -2.281 -2.502 -2.589 -2.706 -2.842 -2.964 -3.068 -3.103 -3.182 -3.268 -3.361 -3.411 -3.517 -3.576 -3.705 -3.849 -3.932 -3.932 -4.029 -4.234 -4.501 -4.664 -4.860 -5.104 -5.431 -5.433 26tyro5 0.00001 0.00001 -3.037 -2.696 -2.464 -2.321 -2.239 -2.444 -2.602 -2.823 -3.144 -3.396 -3.742 -4.063 -4.398 -4.699 -4.893 -5.138 -5.140 -5.461 -5.463 -5.945 -5.461 -5.138 -5.140 -5.138 -5.138 -5.463 -5.461 -5.461 -5.461 -5.461 tyro2 0.0001 0.0001 -1.081 -0.710 -0.470 -0.337 -0.327 -0.433 -0.602 -0.841 -1.119 -1.423 -1.750 -2.121 -2.449 -2.818 -3.110 -3.467 -3.781 -4.029 -4.241 -4.366 -4.501 -4.366 -4.501 -4.501 -4.668 -4.668 -4.865 -4.865 -5.109 -5.111 ;
The following statements fit a PLS model with 10 factors.
proc pls data=ftrain nfac=10; model tot_log tyr_log try_log = f1-f30; run;
The table shown in Output 51.3.1 indicates that only three or four factors are required to explain almost all of the variation in both the predictors and the responses.
Output 51.3.1: Amount of Training Set Variation Explainedproc pls data=ftrain nfac=10 cv=testset(ftest) cvtest(stat=press seed=12345); model tot_log tyr_log try_log = f1-f30; run;
The results of the test set validation are shown in Output 51.3.2. They indicate that, although five PLS factors give the minimum predicted residual sum of squares, the residuals for four factors are insignificantly different from those for five. Thus, the smaller model is preferred.
Output 51.3.2: Test Set Validation for the Number of PLS Factors
|
ods listing close; ods output XLoadings=xloadings; proc pls data=ftrain nfac=4 details method=pls; model tot_log tyr_log try_log = f1-f30; run; ods listing; proc transpose data=xloadings(drop=NumberOfFactors) out =xloadings; data xloadings; set xloadings; n = _n_; rename col1=Factor1 col2=Factor2 col3=Factor3 col4=Factor4; run; goptions border; axis1 label=("Loading" ) major=(number=5) minor=none; axis2 label=("Frequency") minor=none; symbol1 v=none i=join c=red l=1; symbol2 v=none i=join c=green l=1 /*l= 3*/; symbol3 v=none i=join c=blue l=1 /*l=34*/; symbol4 v=none i=join c=yellow l=1 /*l=46*/; legend1 label=none cborder=black; proc gplot data=xloadings; plot (Factor1 Factor2 Factor3 Factor4)*n / overlay legend=legend1 vaxis=axis1 haxis=axis2 vref=0 lvref=2 frame cframe=ligr; run; quit;
The resulting plot is shown in Output 51.3.3.
Output 51.3.3: Predictor Loadings Across Frequencies
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.