Multi-Planar Spatial-ConvNet for Segmentation and Survival Prediction in Brain Cancer

Subhashis Banerjee

Subhashis Banerjee

Kolkata, West Bengal

9 0
  • 0 Collaborators

A new deep learning method is introduced for the automatic delineation/segmentation of brain tumors from multi-sequence MR im- ages. A Radiomic model for predicting the Overall Survival (OS) is de- signed, based on the features extracted from the segmented Volume of Interest (VOI). An encoder-decode ...learn more

Project status: Published/In Market

Artificial Intelligence

Groups
DeepLearning

Intel Technologies
AI DevCloud / Xeon, Intel Python, Intel CPU, Intel Opt ML/DL Framework

Code Samples [1]Links [4]

Overview / Usage

A new deep learning method is introduced for the automatic delineation/segmentation of brain tumors from multi-sequence MR images. A Radiomic model for predicting the Overall Survival (OS) is designed, based on the features extracted from the segmented Volume of Interest (VOI). An encoder-decoder type ConvNet model is designed for pixel-wise segmentation of the tumor along three anatomical planes (axial, sagittal and coronal) at the slice level. These are then combined, using a consensus fusion strategy, to produce the final volumetric segmentation of the tumor and its sub-regions. Novel concepts such as spatial-pooling and unpooling are introduced to preserve the spatial locations of the edge pixels for reducing segmentation error around the boundaries. We also incorporate shortcut connections to copy and concatenate the receptive fields from the encoder to the decoder part, for helping the decoder network localize and recover the object details more effectively. These connections allow the network to simultaneously incorporate high-level features along with pixel-level details. A new aggregated loss function helps in effectively handling data imbalance. The integrated segmentation and OS prediction system is trained and validated on the BraTS 2018 dataset.

Methodology / Approach

MRI scans are volumetric and can be represented in three-dimensions using multi-planar representation along axial (X-Z axes), coronal (Y -X axes), and sagittal (Y -Z axes) planes. Taking advantage of this multi-view property, we propose a deep learning based segmentation model that uses three separate ConvNets for segmenting the tumor along the three individual planes at slice level. These are then combined using a consensus fusion strategy to produce the final volumetric segmentation of the tumor and its sub regions. It is observed that the integrated prediction from multiple planes is superior, in terms of accuracy and robustness of decision, with respect to the estimation based on any single plane. This is perhaps because of utilizing more information, while minimizing the loss.

The ConvNet architecture, used for slice wise segmentation along each plane, is an encoder-decoder type of network. The encoder or the contracting path uses pooling layers to down sample an image into a set of high-level features, followed by a decoder or an expanding part which uses the feature information to construct a pixel-wise segmentation mask. The main problem with this type of networks is that, during the down sampling or the pooling operation the network loses spatial information. Up sampling in the decoder network then tries to approximate this through interpolation. This produces segmentation error around the boundary of the region-of-interest (ROI) or volume-of-interest (VOI). It is a major drawback in medical image segmentation, where accurate delineation is of utmost importance.

In order to circumvent this problem we introduce an elitist spatial-max-pooling layer,which can retain the maximum locations to be subsequently used during unpooling through the spatial-max-unpooling layer. The procedure is illustrated in Figure. We also incorporate shortcut connections to copy and concatenate the receptive fields (after convolution block) from the encoder to the decoder part, in order to help the decoder network localize and recover the object details more effictively. These connections allow the network to simultaneously incorporate high-level features with the pixel-level details. The entire segmentation model architecture is depicted in Figure.

Tumors are typically heterogeneous, depending on cancer subtypes, and contain a mixture of structural and patch-level variability. Applying a ConvNet directly to the entire slice has its inherent drawbacks. Since the size of each slice is 240 × 240, therefore if we train the ConvNet on the whole image/slice then the number of parameters to train will be huge. Moreover, very little difference is observeable in adjacent MRI slices at the global level; whereas patches generated from the same slice often exhibit significant dissimilarity. Besides, the segmentation classes are highly imbalanced. Approximately 98% of the voxels belong to either the healthy tissue or to the black surrounding area. The NCR/NET volumes are of the lowest size amongst all the three classes, as depicted in Figures.

Each ConvNet is trained on patches of size 128 × 128 × 4, extracted from all four MRI sequences corresponding to a particular plane. A randomized patch extraction algorithm, developed by us, is employed. The patch selection is done using an entropy based criterion. The three ConvNets (along the three planes) are trained end-to-end/pixel-to-pixel, based on the patches extracted from the corresponding ground truth images. During testing the stack of slices are fed to the model, to produce pixel-wise segmentation of the tumor along the three planes. The training performance is evaluated using Dice overlap score, for the three segmented sub-regions WT, ET and TC. Since the dataset is highly imbalanced therefore standard loss functions used in literature are not suitable for training and optimizing the ConvNet. This is because most classifiers focus on learning the larger classes, thereby resulting in poor classification accuracy for the smaller classes. Hence we propose a new loss function, which is an aggregation of two loss components; viz. – Generalized Dice loss and Weighted Crossentropy.

For the OS prediction task we extract two types of Radiomic features, viz. “semantic” and “agnostic” [2]. The former includes attributes like size, shape, location, vascularity, spiculation, necrosis; and the latter attempts to capture lesion heterogeneity through quantitative descriptors like histogram, texture, etc. We extracted 33 semantic and 50 agnostic features from each segmented VOI. These are provided as input to a Multilayer Perceptron (MLP), having two hidden layers, to predict the number of survival days; which is further used to determine the survival class (short, mid or long).

Technologies Used

The ConvNet models were developed using TensorFlow, with Keras in Python. The experiments were performed on the Intel AI DevCloud platform having cluster of Intel Xeon Scalable processors. Codes developed for our experiments will soon be made available. The proposed segmentation model is trained and validated on the corresponding training and validation datasets provided by the BraTS 2018 organizers.

Repository

https://link.springer.com/chapter/10.1007/978-3-030-11726-9_9

Comments (0)