Paper Summary: Active Shape Models – Their Training and Application, CVIU 1995
T. F. Cootes, C. J. Taylor, D. H. Cooper, J. Graham [DOI]
Introduction
This paper presents an algorithm for modeling rigid objects in the presence of noise, clutter, and occlusion and overcomes the problems facing the contemporary algorithms - sacrificing specificity to accommodate variability. The proposed models permits deformations only consistent with the class of objects it represents. The authors demonstrate the robustness of the algorithm by applying it to a variety of training image sets - resistors, heart, hand, and worm models.
Method
Given a shape, manually label significant points in images (also known as “landmark points”). They can be of three categories - points representing application-dependent significance, points representing application independent things, and points which can be interpolated from the points in the previous two categories.
Let $x_i$ be the vector containing $n$ points of the $i$th shape in the set. $$ \boldsymbol{x_i} = (x_{i0}, y_{i0}, x_{i1}, y_{i1}, \cdots, x_{ik}, y_{ik}, \cdots, x_{in-1}, y_{in-1}) $$
For aligning a pair of shapes $\boldsymbol{x_i}$ and $\boldsymbol{x_i}$, we have to minimize the weighted sum of squares of distances between equivalent points. $$ E_j = \big(\boldsymbol{x_i} - M(s_j, \theta_j)[\boldsymbol{x_j}] - \boldsymbol{t_j}\big)^T \boldsymbol{W} \big(\boldsymbol{x_i} - M(s_j, \theta_j)[\boldsymbol{x_j}] - \boldsymbol{t_j}\big) $$ where $M(s, \theta)[\boldsymbol{x}]$ is a rotation by $\theta$ and a scaling by $s$, and $\boldsymbol{W}$ is a diagonal matrix of weights for each point $$ w_k = \bigg(\sum_{l=0}^{n-1} V_{R_{kl}}\bigg)^{-1} $$
The alignment algorithm can be briefly summarized as - align each shape to the first shape by rotation, scaling, and translation, and then iteratively calculate the mean shape, normalize the orientation, scale, and origin of the current mean to suitable defaults, followed by realignment of each shape with the current mean until convergence. Although not formally proven, this iterative mean normalization guarantees convergence.
Next, the paper introduces a Point Distribution Model (PDM) to model the variability of these points. A set of $N$ sample shapes generates a cloud of $N$ points in the $2n$ dimensional space. The inherent assumptions are that the points lie withing what the authors call the “Allowable Shape Domain” and that the cloud is $2n$-D.
Given a set of $N$ aligned shapes, the mean shape $\bar{\boldsymbol{x}}$ which represents the center of the ellipsoidal Allowable Shape Domain can be represented as $$ \bar{\boldsymbol{x}} = \frac{1}{N} \sum_{i=1}^N \boldsymbol{x}_i $$
Using Principal Component Analysis (PCA), the principal axes of the ellipsoid can be calculated, where each axis represents a “mode of variation”. Let $\boldsymbol{S}$ represent the $2n \times 2n$ covariance matrix and $\boldsymbol{p}_k$ represent the principal axes ($\forall k = 1,\cdots,2n)$. We have $$ \boldsymbol{S} = \frac{1}{N} \sum_{i=1}^N d\boldsymbol{x}_i d\boldsymbol{x}_i^T,\ \text{and}$$$$ \boldsymbol{Sp_k} = \lambda_k \boldsymbol{p_k},\ \text{and}$$$$ \boldsymbol{p_k}^T\boldsymbol{p_k} = 1 $$ where $\lambda_k$ represents the $k$th eigenvalue of $\boldsymbol{S}$.
Using PCA, $t$ modes of variation can be chosen such that they express a sufficiently large fraction of the total variance. That is to say, $$ \sum_{i=1}^{2n} \lambda_i \approx \sum_{i=1}^{t} \lambda_i $$
The shapes of the training set can be approximated using the chosen $t$ modes. $$ \boldsymbol{x} = \bar{\boldsymbol{x}} + \boldsymbol{Pb} $$ where $P$ is the matrix of the chosen $t$ eigenvectors and $b$ is a vector of weights. Since most of the population lies within 3 standard deviations from the mean, $\boldsymbol{b}_k$ can be varied within suitable limits for similar shapes. $$ -3\sqrt{\lambda_k} \leq \boldsymbol{b}_k \leq 3\sqrt{\lambda_k} $$
The model points can be adjusted along the normal to the model boundary and proportional to the edge strength. Given the initial position $\boldsymbol{X}$, the vector of adjustments $d\boldsymbol{X}$ is calculated and X is moved close to this new position. The parameters are updated to better fit the image. Transforming $d\boldsymbol{X}$ to the parameter space permits only allowable changes in the parameters, therefore ensuring that the model parameters are updated within limits.