Deep Learning Classifier with Patient’s Metadata of Dermoscopic Images in Malignant Melanoma Detection
Received 18 February 2021
Accepted for publication 25 March 2021
Published 21 April 2021 Volume 2021:14 Pages 877—885
Checked for plagiarism Yes
Review by Single anonymous peer review
Peer reviewer comments 2
Editor who approved publication: Dr Scott Fraser
Dina Nur Anggraini Ningrum,1,2 Sheng-Po Yuan,1,3 Woon-Man Kung,4 Chieh-Chen Wu,4 I-Shiang Tzeng,4– 6 Chu-Ya Huang,7 Jack Yu-Chuan Li,1,8,9,* Yao-Chin Wang10,11,*
1Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan; 2Public Health Department, Universitas Negeri Semarang, Semarang City, Indonesia; 3Department of Otorhinolaryngology, Shuang-Ho Hospital, Taipei Medical University, New Taipei City, Taiwan; 4Department of Exercise and Health Promotion, College of Kinesiology and Health, Chinese Culture University, Taipei, Taiwan; 5Department of Research, Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, New Taipei City, Taiwan; 6Department of Statistics, National Taipei University, Taipei, Taiwan; 7Taiwan College of Healthcare Executives, Taipei, Taiwan; 8Department Dermatology, Wan Fang Hospital, Taipei, Taiwan; 9Taipei Medical University Research Center of Cancer Translational Medicine, Taipei, Taiwan; 10Graduate Institute of Injury Prevention and Control, College of Public Health, Taipei Medical University, Taipei, Taiwan; 11Department of Emergency Medicine, Min-Sheng General Hospital, Taoyuan, Taiwan
*These authors contributed equally to this work
Correspondence: Jack Yu-Chuan Li
Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, 15 F, No. 172-1, Sec 2, Kellung Road, Daan District, Taipei, 11031, Taiwan
Tel/Fax +886-2-2736-1661 Ext. 7600
Email [email protected]
Department of Emergency, Min-Sheng General Hospital, No. 168, ChingKuo Road, Taoyuan, 33044, Taiwan
Tel +886-3-317-9599 Ext. 8134
Email [email protected]
Background: Incidence of skin cancer is one of the global burdens of malignancies that increase each year, with melanoma being the deadliest one. Imaging-based automated skin cancer detection still remains challenging owing to variability in the skin lesions and limited standard dataset availability. Recent research indicates the potential of deep convolutional neural networks (CNN) in predicting outcomes from simple as well as highly complicated images. However, its implementation requires high-class computational facility, that is not feasible in low resource and remote areas of health care. There is potential in combining image and patient’s metadata, but the study is still lacking.
Objective: We want to develop malignant melanoma detection based on dermoscopic images and patient’s metadata using an artificial intelligence (AI) model that will work on low-resource devices.
Methods: We used an open-access dermatology repository of International Skin Imaging Collaboration (ISIC) Archive dataset consist of 23,801 biopsy-proven dermoscopic images. We tested performance for binary classification malignant melanomas vs nonmalignant melanomas. From 1200 sample images, we split the data for training (72%), validation (18%), and testing (10%). We compared CNN with image data only (CNN model) vs CNN for image data combined with an artificial neural network (ANN) for patient’s metadata (CNN+ANN model).
Results: The balanced accuracy for CNN+ANN model was higher (92.34%) than the CNN model (73.69%). Combination of the patient’s metadata using ANN prevents the overfitting that occurs in the CNN model using dermoscopic images only. This small size (24 MB) of this model made it possible to run on a medium class computer without the need of cloud computing, suitable for deployment on devices with limited resources.
Conclusion: The CNN+ANN model can increase the accuracy of classification in malignant melanoma detection even with limited data and is promising for development as a screening device in remote and low resources health care.
Keywords: skin cancer, convolutional neural network, artificial neural network, embedded artificial intelligence
The global burden of skin cancer, especially in the white-skinned population, of which malignant melanoma is the deadliest, accounted for 0.11% of all types of death in 2017.1 Early detection of skin cancer is important, more than 99% of patients will have five-year survival if the skin cancer is detected early. When melanoma is recognized and treated early, it is almost always curable. Melanoma can advance and invade other body parts, where it becomes difficult to treat and can be catastrophic. While it is not the most common of the skin cancers, it causes the most deaths.2
Dermoscopy is a skin imaging modality that has reported augmentation for diagnosis of skin cancer in contrast to pure visual inspection. However, clinicians should be sufficiently trained for those improvements to be realized. In order to make expertise more widely available, the International Skin Imaging Collaboration (ISIC) has developed an archive containing dermoscopic images for research purposes along with clinical training toward automated algorithmic analysis by receiving challenges of ISIC.3
Early melanoma detection is feasible by visual inspection of pigmented dermatologic lesions, treated by malignant tumor simple excision. Nonetheless, owing to the scarcity of dermatologists, visual inspection has variable accuracy which leads patients to undergo a series of biopsies and complicates the management.
Attempts for early diagnosis of melanoma are paramount.4 In general, there is evidence that the prevalence of in situ and invasive skin cancer increases after the implementation of skin cancer screening; thin and thick were observed with increasing and decreasing rates, respectively.5 Previous studies related to image analysis using deep learning in skin cancer detection had been drawn to dermatologists’ attention.6–8 The convolutional neural network (CNN) is one of the deep learning methods, which possessed potential for analyzing general and highly variable tasks in dermoscopic images, but its implementation combined with patient’s metadata in the clinical setting is still lacking. The implementation of CNN in the clinical setting requires high-class cloud computational facility such as cloud computing which seems unfeasible in low-resource and remote areas of health care.
Therefore, in our study, we aimed to developed artificial intelligence (AI) model for malignant melanoma detection based on dermoscopic images and patient’s metadata with relatively small dataset. We want this model to work on low resource devices, so the model file size needs to be as small as possible.
Materials and Methods
We applied the dataset from open-access dermatology repository ISIC archive (www.isic-archive.com). The ISIC archive data are made up of melanocytic lesions which are confirmed by biopsy and classified either benign or malignant. The ISIC archive is the biggest non-private skin dermoscopic image dataset accessible, containing about 23,906 photographs which are cut off for both quality and privacy assurance. Lesion diagnosis in dermoscopic images are coupled with a definitive diagnosis, consist of nevus, melanoma, pigmented benign keratosis, basal cell carcinoma, seborrheic keratosis, and others.9
We selected metadata and the images of “ISIC 2019: Training” data are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/)10 We used binary classification to separate melanoma from others. Of about 9000 dermoscopic images from ISIC, 3807 images (both malignant and nonmalignant melanoma) were initially downloaded from HAM 10000,11 MSK-1, MSK-2, MSK-3, and MSK-4 datasets.12 Classes are quite unbalanced in this dataset, there are about 90% of nonmelanoma images in the dataset, and 10% malignant images. Thereafter, 1200 images were randomly selected.
The randomly selected images were cropped to remove any sticker, ruler, or hair to avoid any disturbance in the model (Figure 1). The images for the training data consisted of 900 (281 malignant, 619 benign), and were further split into training (720) and validation (180) during model training with a 0:2 ratio. For testing data, 300 images were used (93 malignant, 207 nonmalignant melanoma). Patient’s metadata consist of age, gender, anatomic site and location variables. For preprocessing patient’s metadata, we use one hot encoder for categorical variables (ie anatomy site, location, and gender), and for numerical variables (age), we use MinMaxScaler.
Figure 1 Dermoscopic image data preprocessing. Original dermoscopic image (left side) obtained from dermatology repository International Skin Imaging Collaboration (ISIC) archive (www.isic-archive.com) was cropped and removed the ruler, sticker, or hair (right side) for further analysis.
We propose a framework that includes two models, CNN and a combination of CNN with ANN. This framework consists of the first stages, that is, preprocessing of data. Then the data is entered into the neural network. The different classification algorithms of CNN and ANN are combined in order to make the best decision. Figure 2 shows an overview of the proposed framework.
After images data preprocessing, the dermatologic lesions contain only tiny percentage of the view. Thus, decreasing the scale of image may cause the lesions to be too little to identify. To document this point, the CNN model obtain 1200 rectangular image patches randomly from the middle of every picture at various scales (1/5, 2/5, 3/5, and 4/5 of original size), and then reorganized them into 300×300 pixels using bilinear interpolation, Then, the CNN administer on the fly data augmentation, including vertical and horizontal flips, random rotation (−10°, +10°), and zoom (90–110% of length and width) to boost the dataset.
The architecture and detail of the layer and parameters of CNN and CNN+ANN model has been shown in Figure 3A and B), respectively. From this architecture, we can see the number of layers in each model. In the CNN model architecture, we provided input image into convolution layer, performed convolution on the image and apply an activation to the matrix, performed pooling to reduce dimensionality size, feed to neural network, flatten the output, and dropped inputs to 0 (zero) randomly to reduce overfitting, then output the class using an activation function and classifies images. In the CNN+ANN model, we concatenated the CNN model output with ANN input, and combined all the features to fully connected layers, then output the class using an activation function and classifies images and patient’s metadata. Both models were implemented on Keras application program interface with TensorFlow backend, using the R programming language, version 1.14.
Figure 3 Model architecture. CNN model architecture (A) and CNN+ANN model architecture (B) showed the flow inside the model (printed from TensorFlow 1.14).
To assess our proposed deep learning model, we use area under the receiver operating characteristic curve (AUROC), accuracy, balanced accuracy, sensitivity, specificity, as performance metrics, which are defined as follow:
where TP is the number of true positive, TN is the number of true negative, FP is the number of false positive, and FN is the number of false negative. TPR and TNR stands for true positive rate and true negative rate, respectively. AUROC is the optimal cutoff risk score threshold which, identified at both sensitivity and specificity, were maximized. Accuracy is the amount of data that is correctly predicted from all the data. Balanced accuracy is a parameter used to evaluate how well a binary classifier is, especially useful when a class is unbalanced (one class appears more often than another). The F1 score is the harmonic mean of precision and recall, if a model gets the highest score of 1.0, it means perfect precision and recall.
The mean (SD) age in the malignant melanoma group was 62.6 (17.10) years, much older than the mean (SD) age among nonmalignant melanoma that was 17.5 (11.20) years. The distribution of age between malignant and nonmalignant melanoma were significantly different (p-value <0.01) (Table 1). Male is the dominant gender both in malignant and nonmalignant patients, and there is no significant difference between both groups.
Table 1 Distribution of Patient’s Metadata
Training loss, validation loss, training accuracy, validation accuracy and epochs are the important parameters to diagnose learning model problems. The error on the training set is defined as training loss; whereas the validation loss is an error followed by running the validation set via previously trained CNN. Therefore, in our study, we determined these parameters of the CNN and CNN+ANN, which have been represented as a model learning curve (Figure 4). The network model has been trained for 200 Epochs since extra training showed no decrease in validation loss. The CNN model (Figure 4A) demonstrated an overfitting due to the large gap between training and validation as well as loss and accuracy. The training loss diminished after every epoch, indicating the learning ability of the CNN model in recognizing specific images through the training set. However, the validation loss increased after every epoch, implying that training data does not fit into validation, and this model could not be satisfactorily generalized on the validation set. Further, the ANN+CNN model (Figure 4B) demonstrates a decrease in training and validation loss with increase in number of epochs. The minimum gap between training and validation both in loss and accuracy was also revealed. This curve indicates that the ANN+CNN model is less overfitting than the CNN model with improved learning optimization and performance.
Model accuracy is measured by AUROC, which has been represented for CNN and CNN+ANN model in Figure 5. ROC curve is a graph that plots the true positive rate (y-axis) against the false positive rate (x-axis) based on the variation in the threshold for assigning observations to a particular class, which functions to summarize the performance of the classifier across all possible thresholds. The results show a higher AUC (97.1%) of the CNN+ANN model compared to CNN (82.4%), which indicate its greater accuracy.
Figure 5 Area under the receiver operating characteristic curve (AUROC). The CNN+ANN model (red line) outperforms the CNN model (blue line).
The label distributions of CNN and CNN+ANN models are shown in Figure 6. The output of the model is described as T-distributed stochastic neighbor embedding (t-SNE) two-dimensional plot. Prediction results of the CNN model show that the malignant melanomas are mostly located on the same area of benign melanomas. Otherwise, prediction results of CNN+ANN shows fewer malignant melanomas found on benign melanomas area. This indicates that the malignant melanoma tends to be classified better in the CNN+ANN model than in CNN model.
Confusion matrix is a contingency table with two dimensions, namely actual and predictive, in each of these dimensions there is a collection of identical classes, so that each dimension and class combination is a variable in the contingency table. This table serves to visualize algorithm performance, to see if the system is confused or has mislabeled one another. The details of the CNN model and CNN+ANN model confusion matrix are shown in Tables 2 and 3, respectively. Table 2 shows that there were many misclassifications of nonmalignant melanoma in the CNN model, but most of these misclassifications were corrected in the CNN+ANN model (Table 3).
Table 2 Confusion Matrix CNN
Table 3 Confusion Matrix CNN+ANN
Based on Table 4, the CNN model, with only image input information, yields an AUROC of 82.40%. Meanwhile, the CNN+ANN model with a combination of image and patient’s metadata reaches an AUROC of 97.10%. The CNN+ANN model also achieved a recall (sensitivity) of 87.10% at a precision (positive predictive value) of 94.19%. The accuracy reaches 94.33% and the balanced accuracy is 92.34%. The CNN+ANN model shows higher performance over the CNN model.
Table 4 Malignant Melanomas Detection Result Using CNN Model and CNN+ANN Model
The final model file size is 25.4 MB (MB), relatively small to be loaded and run from any computer (with CPU Core I3 class or even lower, without discreet GPU/TPU or neural engine). It can be reduced further by the TensorFlow optimization toolkit for an on-device deployment as a fully offline classifier application.
Detection of abnormality in medical images is the process of identification of some diseases such as cancer. In the past, clinicians detected these abnormalities depending on time-consuming human labor effort. Therefore, development of an automatic system used to detect abnormality is urgently needed. Different methods are proposed for abnormality detection in medical images. For example, magnetic resonance imaging of brain split fusion (ie, potential field segmentation) is used to detect tumors.13
In general, a computer-aided diagnosis (CAD) system is widely used to assist radiologists and clinicians on diagnosis of medical images. The CAD system is established on medical image processing, machine learning, and computer vision algorithms. A typical CAD system involves the following stages: preprocessing, feature extraction, feature selection, and classification.14 Deep learning is a tool for machine learning on multiple linear processing units and nonlinear processing units which abstracted from the data.15 There are many well-known deep learning technique applications such as autoencoders, stacked autoencoders, restricted Boltzmann machines (RBMs), deep belief networks (DBNs) and deep CNNs. Lately, the CNN method has been applied widely in medical image analysis as well as vision systems.16–18 However, there are few studies considered to combine dermoscopic images and patient’s metadata as training data under CNN-based methods. In this study, we also successfully demonstrate an application of a CNN+ANN model which combined dermoscopic images and patient’s metadata to detect malignant melanoma.
In spite of the ability of deep learning methods with higher performance, there are several limitations which restrain implementation in clinical practice. Deep learning architecture needs substantial training data and computing power. Shortage of computing power causes more training network time which depends on the magnitude of the training data inputted. For example, CNNs unusually involve a difficult task which takes priority of labeling data for supervised learning and manual labeling of medical images. These limitations may be overcome by a stronger facility including increased computing power, data storage facilities enhancement, digital storage medical images upgrade, and improvement of the deep network architecture. The application of deep learning in diagnostic imaging analysis also had the same criticism as black box problem in artificial intelligence in which inputs and outputs are known, but the internal representations are not clarified. Applications are also altered by noise and brightness issues fundamental to the medical images.19 We use preprocessing steps to remove the noise in order to ameliorate the performance. We also reduced the number of training data to hundreds instead of thousands and resized the training input to 64x64 pixels to lower the computer capability requirements, and finally include the metadata into the training process to improve the learning progress.
Compared with the previous studies,20–25 our result has a similar high performance even though we used less dataset, unbalanced data which was closer to the clinical setting implementation, lower size and no need of high computational devices. This model will be useful for screening, early detection for patients in low-resource and remote area health care. A limitation of our study is that it required manual cropping. In future research we will optimize smaller size mode further by utilizing TensorFlow optimization kit if needed and develop the application from image cropping until prediction in mobile phone/low computational devices.
The CNN+ANN model outperformed the CNN model with only dermoscopic images. The CNN+ANN model combination of dermoscopic images and patient’s metadata can increase the accuracy of classification in malignant melanoma detection even with limited data and prevent the overfitting that happened in the CNN model with only dermoscopic images.
The first author thanks the Directorate General of Resources for Science, Technology and Higher Education, at the Ministry of Education and Culture, Republic Indonesia for the sponsorship of her doctoral study. The author is also grateful to Muhammad Solihuddin Muhtar, International Center for Health Information Technology, Taipei Medical University, who provided advice and assisted on the process of evaluating model performance. He was not compensated for his contribution.
The authors report no conflicts of interest in this work.
1. Fitzmaurice C, Akinyemiju TF, Al Lami FH, et al. Global, regional, and national cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life-years for 29 cancer groups, 1990 to 2016: a Systematic Analysis for the Global Burden of Disease Study. JAMA Oncol. 2018;4(11):1553–1568. doi:10.1001/jamaoncol.2018.2706
2. Meyskens FL, Mukhtar H, Rock CL, et al. Cancer prevention: obstacles, challenges, and the road ahead framing the major issues. J Natl Cancer Inst. 2016;108(2):djv309.
3. ISIC Archive. Available from: https://www.isic-archive.com/#!/topWithHeader/tightContentTop/about/isicArchive.
4. Tromme I, Legrand C, Devleesschauwer B, et al. Melanoma burden by melanoma stage: assessment through a disease transition model. Eur J Cancer. 2016;53:33–41. doi:10.1016/j.ejca.2015.09.016
5. Brunssen A, Waldmann A, Eisemann N, Katalinic A. Impact of skin cancer screening and secondary prevention campaigns on skin cancer incidence and mortality: a systematic review. J Am Acad Dermatol. 2017;76(1):129–139.e10. doi:10.1016/j.jaad.2016.07.045
6. Dascalu A, David EO. Skin cancer detection by deep learning and sound analysis algorithms: a prospective clinical study of an elementary dermoscope. EBioMedicine. 2019;43:107–113. doi:10.1016/j.ebiom.2019.04.055
7. Li Y, Shen L. Skin lesion analysis towards melanoma detection using deep learning network. Sensors (Basel). 2018;18(2).
8. Hosseinzadeh Kassani S, Hosseinzadeh Kassani P. A comparative study of deep learning architectures on melanoma detection. Tissue Cell. 2019;58:76–83. doi:10.1016/j.tice.2019.04.009
9. Zhang J, Xie Y, Xia Y, et al. Attention residual learning for skin lesion classification. IEEE Trans Med Imaging. 2019;1.
11. Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data. 2018;5(1):180161. doi:10.1038/sdata.2018.161
12. Codella NCF, Gutman D, Celebi ME, et al. Skin lesion analysis toward melanoma detection: a challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the international skin imaging collaboration (ISIC).
13. Cabria I, Gondra I. Mri segmentation fusion for brain tumor detection. Inf Fusion. 2017;36:1–9. doi:10.1016/j.inffus.2016.10.003
14. Mosquera-Lopez C, Agaian S, Velez-Hoyos A, Thompson I. Computer-aided prostate cancer diagnosis from digitized histopathology: a review on texture-based systems. IEEE Rev Biomed Eng. 2015;8:98–113. doi:10.1109/RBME.2014.2340401
15. Deng L, Yu D. Deep learning: methods and applications. foundations and trends®. Signal Process. 2014;7(3–4):197–387.
16. Premaladha J, Ravichandran K. Novel approaches for diagnosing melanoma skin lesions through supervised and deep learning algorithms. J Med Syst. 2016;40(4):96. doi:10.1007/s10916-016-0460-2
17. Kharazmi P, Zheng J, Lui H, Wang ZJ, Lee TK. A computer-aided decision support system for detection and localization of cutaneous vasculature in dermoscopy images via deep feature learning. J Med Syst. 2018;42(2):33. doi:10.1007/s10916-017-0885-2
18. Wang SH, Phillips P, Sui Y, Liu B, Yang M, Cheng H. Classification of alzheimer’s disease based on eight-layer convolutional neural network with leaky rectified linear unit and max pooling. J Med Syst. 2018;42(5):85. doi:10.1007/s10916-018-0932-7
19. Hussain S, Anwar SM, Majid M. Segmentation of glioma tumors in brain using deep convolutional neural network. Neurocomputing. 2018;282:248–261. doi:10.1016/j.neucom.2017.12.032
20. Manzo M, Pellino S. Bucket of deep transfer learning features and classification models for melanoma detection. J Imaging. 2020;6(12):129. doi:10.3390/jimaging6120129
21. Winkler JK, Sies K, Fink C, et al. Melanoma recognition by a deep learning convolutional neural network-performance in different melanoma subtypes and localisations. Eur J Cancer. 2020;127:21–29. doi:10.1016/j.ejca.2019.11.020
22. Banerjee S, Singh SK, Chakraborty A, Das A, Bag R. Melanoma diagnosis using deep learning and fuzzy logic. Diagnostics. 2020;10(8):577. doi:10.3390/diagnostics10080577
23. Bajwa MN, Muta K, Malik MI, et al. Computer-aided diagnosis of skin diseases using deep neural networks. Appl Sci. 2020;10(7):2488. doi:10.3390/app10072488
24. Kassem MA, Hosny KM, Fouad MM. Skin lesions classification into eight classes for ISIC 2019 using deep convolutional neural network and transfer learning. IEEE Access. 2020;8:114822–114832. doi:10.1109/ACCESS.2020.3003890
25. Yap J, Yolland W, Tschandl P. Multimodal skin lesion classification using deep learning. Exp Dermatol. 2018;27(11):1261–1267. doi:10.1111/exd.13777
This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.