Feature Representation and Multi-modal Fusion using Deep Boltzmann Machine

November 7, 2018 in DEVELOPMENT • GOLANG
medical imaging diagnosis classification deep learning paper summary

Paper Summary: Hierarchical Feature Representation and Multimodal Fusion with Deep Learning for AD/MCI Diagnosis, NeuroImage 2014

Heung-Il Suk, Seong-Whan Lee, Dinggang Shen, and the Alzheimer’s Disease Neuroimaging Initiative [DOI]

This paper proposes a high level latent and shared feature representation from neuroimaging modalities (MRI and PET) via deep learning for the diagnosis of Alzheimer’s Disease (AD) and its prodromal stage, Mild Cognitive Impairment (MCI). In contrast to the previous works where the multimodal features were combined by concatenating into long vectors or transforming into a high dimensional kernel space, the authors propose using a Deep Boltzmann Machine (DBM) to find a latent hierarchical representation from a 3D patch, and then come up with a method for “a joint feature representation from the paired patches of MRI and PET with a multimodal DBM.”

The structural MR images were acquires using 1.5T scanners and the data was in NIfTI format, which had been pre-processed to deal with spatial distortion caused by gradient nonlinearity and B1 field inhomogeneity. The FDG-PET (18-Fluoro-DeoxyGlucose PET) images were acquired 30-60 minutes post injection, and were pre-processed (averaged, spatially aligned, interpolated to a standard voxel size, normalized in intensity, and smoothed to a common resolution of 8 mm full width at half maximum.)

Given a pair of MRI and PET images, class-discriminative patches are selected by a statistical significance test between classes using the statistical significance for voxels in each patch. Two-sample $t$-test is performed on each voxel, and the voxels with $p$-value smalled than the predefined threshold are selected. Then for each of the selected voxels, a mean $p$-value is calculated by averaging the $p$-values of all the voxels in the patch. Finally, class-discriminative patches are selected after performing a greedy search on all patches which satisfy 2 certain criteria.

After the patches have been selected, the tissue densities of a MRI patch and the voxel intensities of a PET patch are used as observations that are then used to build a patch level feature learning model, which the authors call a MultiModal DBM (MM-DBM). This MM-DBM finds a shared feature representation from the paired patches of MRI and PET. However, instead of using the voxel intensities of the PET and the tissue densities of the MRI, a Gaussian Restricted Boltzmann Machine (RBM) is trained. This is used to transform the real valued observations into binary vectors, which then in turn act as inputs to the MM-DBM.

After finding latent and shared features representations of the paired MRI and PET patches using the trained MM-DBM, an image level classifier is constructed by fusing multiple classifiers in a hierarchical layout. They are: patch level classifier learning, mega patch construction, and a final ensemble classification. The patch level classification is achieved using a linear SVM (Support Vector Machine), the output of which is converted to a probability using a softmax function.

A major limitation of the paper, the authors note, is the lack of interpretability of the resulting feature representations from a clinical perspective.