Breast cancer detection using deep learning
Sayak Paul
Kolkata, West Bengal
- 0 Collaborators
Experiments to show the usage of deep learning to detect breast cancer from breast histopathology images ...learn more
Project status: Under Development
Intel Technologies
Other
Overview / Usage
Context:
Invasive Ductal Carcinoma (IDC) is the most common subtype of all breast cancers. To assign an aggressiveness grade to a whole mount sample, pathologists typically focus on the regions which contain the IDC. As a result, one of the common pre-processing steps for automatic aggressiveness grading is to delineate the exact regions of IDC inside of a whole mount slide.
About the dataset:
The original dataset consisted of 162 whole mount slide images of Breast Cancer (BCa) specimens scanned at 40x. From that, 277,524 patches of size 50 x 50 were extracted (198,738 IDC negative and 78,786 IDC positive). Each patch’s file name is of the format: u_xX_yY_classC.png — > example 10253_idx5_x1351_y1101_class0.png . Where u is the patient ID (10253_idx5), X is the x-coordinate of where this patch was cropped from, Y is the y-coordinate of where this patch was cropped from, and C indicates the class where 0 is non-IDC and 1 is IDC.
Inspiration:
Breast cancer is the most common form of cancer in women, and invasive ductal carcinoma (IDC) is the most common form of breast cancer. Accurately identifying and categorizing breast cancer subtypes is an important clinical task, and automated methods can be used to save time and reduce error.
Adrian Rosebrock of PyImageSearch has a wonderful tutorial on this same topic as well. Be sure to check that out if you have not. I decided to use the fastai library and to see if I could improve the predictive performance by incorporating modern deep learning practices.
Methodology / Approach
The dataset used in this project is an imbalanced dataset. This needed for a careful tweaking in the loss function of the network to optimize. I carefully figured that out and as a result, I got a much better score than any of the other works done on this particular dataset.
I used many modern deep learning based practices like discriminative learning rates, mixed precision policy and 1cycle policy to train the network faster.
Future studies of this project include employing super-resolution to enhance the quality of the histopathology images and coming up with histopathology specific data augmentation techniques for achieving better performance.
Technologies Used
Python (language)
fastai, scikit-learn (libraries)
Google Colab (Environment)
Repository
https://github.com/sayakpaul/Breast-Cancer-Detection-using-Deep-Learning