Feasibility of computer-assisted diagnosis for breast ultrasound: the results of the diagnostic performance of S-detect from a single center in China
Authors Zhao C, Xiao M, Jiang Y, Liu H, Wang M, Wang H, Sun Q, Zhu Q
Received 15 October 2018
Accepted for publication 17 December 2018
Published 23 January 2019 Volume 2019:11 Pages 921—930
Checked for plagiarism Yes
Review by Single-blind
Peer reviewer comments 2
Editor who approved publication: Dr Chien-Feng Li
Chenyang Zhao,1,* Mengsu Xiao,1,* Yuxin Jiang,1 He Liu,1 Ming Wang,1 Hongyan Wang,1 Qiang Sun,2 Qingli Zhu1
1Department of Ultrasound, Chinese Academy of Medical Sciences and Peking Union Medical College Hospital, Beijing 100730, China; 2Department of Breast Surgery, Chinese Academy of Medical Sciences and Peking Union Medical College Hospital, Beijing 100730, China
*These authors contributed equally to this work
Objective: To investigate the feasibility of a CAD system S-detect on a database from a single Chinese medical center.
Materials and methods: An experienced radiologist performed breast US examinations and made assessments of 266 consecutive breast lesions in 227 patients. S-detect classified the lesions automatically in a dichotomous form. An in-training resident who was blind to both the US diagnostic results and histological results reviewed the images afterward. The final histological results were considered as the diagnostic gold standard. The diagnostic performances and interrater agreements were analyzed.
Results: A total of 266 focal breast lesions (161 benign lesions and 105 malignant lesions) were assessed in this study. S-detect had a lower sensitivity (87.07%) and a higher specificity (72.27%) compared with the experienced radiologist (sensitivity 98.1% and specificity 65.43%). The sensitivity and specificity of S-detect were better than that of the resident (sensitivity 82.86% and specificity 68.94%). The AUC value of S-detect (0.807) showed no significant difference with the experienced radiologist (0.817) and was higher than that of the resident (0.758). S-detect had moderate agreement with the experienced radiologist.
Conclusion: In this single-center study, a high level of diagnostic performance of S-detect on 266 breast lesions of Chinese women was observed. S-detect had almost equal diagnostic capacity with an experienced radiologist and performed better than a novice reader. S-detect was also distinguished for its high specificity. These results supported the feasibility of S-detect in aiding the diagnosis of breast lesions on an independent database.
Keywords: ultrasonography, breast neoplasms, image interpretation, computer-assisted, diagnostic imaging
Plain language summary
S-detect is a sophisticated CAD system for breast US imaging based on deep learning algorithms. To investigate its feasibility in Chinese population, we utilized the software in a single-center of China. S-detect presented high-level of diagnostic performance in classifying breast lesions, and distinguished for its high specificity. S-detect can possess great potential in further clinical application for diagnosing breast lesions according to this study.
As the most common cancer expected to occur all over the world, breast cancer poses a great threat to women’s health, arousing intense concern in the medical society. It was reported that 246,000 females were newly diagnosed with breast cancer in the USA in 2016 and 268,000 in China in 2015.1,2 Early detection of breast cancer plays an essential role in the management of breast cancer.3 In addition to breast mammography, US is playing an increasingly essential role in the diagnosis of breast lesions, for the interpretation of US features can be helpful in distinguishing benign and malignant breast lesions.4 Currently, the BI-RADS lexicon is utilized as a standard protocol for the assessment of breast lesions by US imaging.5–7 Despite the use of BI-RADS lexicon, operator independence remains to be the main defect of US.8
Recent advancements of artificial intelligence techniques heightened the development of CAD systems in medical imaging. CAD systems have shown great potential as an effective assistant diagnostic tool in breast imaging, such as mammography CAD.9 Providing diagnostic decisions for breast lesions, CAD systems can act as a second reader to assist radiologists in diagnosis. CAD for breast US has also been investigated in early years.10,11 The process of CAD systems for breast US includes image segmentation, feature extraction, and classification. Several newly developed segmentation techniques, including clustering-based, watershed-based, and neural networks were developed and proved to be applicable in the detection and following analysis of breast lesions.12 And the CAD systems based on support vector machines and deep learning method have also shown a good diagnostic performance for classifying breast lesions.10,13
Recently, a highly established CAD software, S-detect, was developed by Samsung Medison, Co. Ltd., Seoul, Korea. S-detect can implement autosegmentation and interpretation of US morphological descriptions, allowing classification of breast lesions in a dichotomic form as a reference for radiologists to assist with final diagnosis. S-detect was developed using a deep learning algorithm, after extensive training on a large-scale database from Korea. The compact CAD system can generate the decisions automatically based on abstraction of multiple layers rather than relying on hand-crafted features. Several studies on S-detect have been launched by Korean and Italian researchers, identifying the high diagnostic efficiency of S-detect as an adjunct tool for breast lesion characterization.14–16 For further recognition of the feasibility of S-detect, studies on S-detect with more validation sets from different populations are required.
In this study, in order to assess the feasibility of S-detect in aiding diagnosis of breast lesions in Chinese population and explore its clinical application, 266 cases of focal breast lesions were collected and classified by different radiologists as well as S-detect in an independent Chinese medical center. The diagnostic performance of S-detect and the readers was compared. We further discussed the potential value of S-detect in improving specificity, and the overall diagnostic accuracy of radiologists was discussed.
Materials and methods
This research was designed as a prospective study. It was approved by the Institutional Review Board of Peking Union Medical College Hospital. Informed consents were received from all the patients who underwent an examination.
From January 2018 to March 2018, female patients with focal breast lesions detected by US, who were referred to the hospital for biopsy, were recruited. All the focal lesions were evaluated by conventional US examination before hospitalization. And we included category 3, 4, and 5 lesions according to BI-RADS lexicon for US. When multiple lesions were found in one patient, the suspicious lesions or largest lesions were enrolled.
Exclusion criteria were listed as follows:
- Patients who had received a biopsy of breast lesion before the US examination.
- Patients who were pregnant or lactating.
- Patients who were undergoing neoadjuvant treatment.
Finally, there were 227 patients with 266 focal breast lesions who were recruited consecutively in the study. The mean age of the patients was 45.7 years, and the median age was 45.0 (15–82) years. The study flow is presented in Figure 1.
Figure 1 The schematic representation of the study flow.
Abbreviation: US, ultrasound.
US examinations and imaging review
A 3–12 MHz linear transducer (RS80A with Prestige, Samsung Medison, Co. Ltd.) was utilized for the US examination. An expert radiologist with 10 years of experience of breast US imaging performed the bilateral breast US examinations using the routine scanning protocol. Both longitudinal and transverse sections were recorded for size measurement and further imaging evaluation. Informed of the patients’ clinical information and mammography results, the radiologist made a judgment of the breast lesions according to the fifth edition of BI-RADS lexicon after finishing the examination.
CAD examination with S-detect software (Samsung Healthcare, Seoul, South Korea) was conducted by the experienced radiologist immediately after classification of the lesions. The input image that displayed the lesion with the maximum diameter was chosen in the preliminary assessments. After selecting the image, examination was conducted in the S-detect mode by clicking the S-detect option on the screen. The lesion was automatically contoured by S-detect software in the region of interests. The outline was adjusted manually when the autocontour was considered to be unsatisfied. When satisfactory segmentation of the lesion was completed, S-detect provided a final assessment of the lesion and a detailed report of each US descriptor, including shape, orientation, margin, echo pattern, and posterior acoustic features. The final assessments of breast lesions by S-detect were in a dichotomic form, including possibly benign or possibly malignant. It took about <2 minutes to perform the S-detect examination.
An in-training resident with 2 years of experience with breast US imaging reviewed the US images of the breast lesions 2 weeks later, blind to both the US diagnostic results and the histological results. Appropriate US descriptors were chosen by the resident to make the final assessments, based on the 2013 BI-RADS lexicon.
The results of the experienced radiologist and the resident in the form of BI-RADS 1–5 classification were transformed into a dichotomized pattern. According to BI-RADS lexicon, category 4a was represented for low likelihood of malignancy, and biopsy was recommended for the subgroup. Therefore, the cutoff for benign and malignant was set at 4a. The possibly benign group included categories 2 and 3, and the possibly malignant group comprised categories 4a, 4b, 4c, and 5. SPSS (SPSS 19.0, IBM) was used for data analysis. Sensitivity, specificity, PLR, NLR, PPV, and NPV were measured and compared by 2×2 contingency tables and the chi-squared test. The ROCs were also used for a more intuitive comparison of the results. The AUC was obtained using SPSS. A Z test was applied to compare the values of AUC of the radiologists and the software. Cohen’s kappa values were calculated for evaluating the agreement between the experienced radiologist and S-detect and the resident and S-detect.
The criteria for kappa values were listed as follows:
- Poor agreement: κ<0
- Fair agreement: 0.20<κ<0.40
- Moderate agreement: 0.40<κ<0.60
- Substantial agreement: 0.60<κ<0.80
- Perfect agreement: 0.80<κ<1
For all the tests mentioned above, statistical significance was considered when the P-value was <0.05.
We confirm that this study was conducted in accordance with the Declaration of Helsinki, and the patient consent was written informed consent.
A total of 266 focal breast lesions of 227 patients were assessed in this study, among which there were 161 benign lesions and 105 malignant lesions. The pathological types of the benign lesions and malignant ones are listed in Table 1.
Table 1 The pathological types of the 266 breast lesions
Diagnostic performances of the experienced radiologists, S-detect, and the in-training resident were demonstrated in Table 2, including sensitivity, specificity, PLR, NLR, PPV, NPV, and the AUC. The ROCs for the three results are shown in Figure 2.
Figure 2 The ROC for the experienced radiologists, S-detect, and the in-training resident.
Abbreviation: ROCs, receiver operating characteristic curves.
In comparison with the experienced radiologist, the S-detect had a lower sensitivity (98.1% vs 87.07%) and NPV (98.15% vs 81.9%), and a higher specificity (72.27% vs 65.43%) and PPV (79.5% vs 64.8%). There was a statistical significance between the specificity of S-detect and the radiologist, with a P-value <0.001. The difference of sensitivity was of no statistical significance (P>0.05). A typical malignant breast lesion correctly diagnosed by S-detect is presented in Figure 3.
The NPV (86.05% vs 81.9%) of the resident was slightly higher than that of S-detect. Sensitivity (87.07% vs 82.86%), specificity (72.27% vs 68.94%), and PPV (79.5% vs 63.5%) of the resident were lower than S-detect. The differences in both the sensitivity and the specificity were statistically significant (P<0.001).
The experienced radiologist had the highest value of AUC (0.817) among the three raters. S-detect had an AUC value of 0.807, which was slightly lower than that of the radiologist, with no statistical significance (P>0.05). The AUC of S-detect was higher than that of the resident with 2 years of experiences (0.758), with significance (P<0.05). The results indicated that S-detect presented a high-level diagnostic performance; and the AUC value showed no difference with the experienced radiologist, and it was better than the resident lacking experience.
S-detect had moderate agreement with the experienced radiologist, and the κ value was 0.514. The interrater agreement for S-detect and the resident was fair, with a κ value of 0.337.
The benign and malignant lesions in each category of the experienced radiologist are summarized in Table 3. The numbers of S-detect dichotomic groups in each category are also recorded in the table. The proportions of malignant lesions in the category of BI-RADS 3, 4a, 4b, 4c, and 5 were 1.87%, 13.0%, 52.0%, 90.91%, and 100%, respectively. There were statistically significant differences between the diagnostic accuracy of the radiologist and S-detect in the category of 3 and 4a. The radiologist had better sensitivity in both BI-RADS 3 and 4a lesions than that of S-detect.
Table 3 The subcategorization of breast lesions by an experienced radiologist
Abbreviation: BI-RADS, Breast Imaging Report and Data System.
Typical cases of special pathological types in the study are presented in Figures 4–6. The inflammatory lesion was correctly classified by S-detect, while the two rare breast malignant lesions were misdiagnosed.
The fifth edition of BI-RADS lexicon, a data-driven management system for breast US imaging, was developed by American College of Radiology as a standardized protocol to guide diagnosis, which had worldwide clinical application for breast lesions after consecutive revisions.6,17 According to Lee et al, expert assessment of the breast lesions with the BI-RADS-US lexicon had a sensitivity of 98% and a specificity of 58.6%, whereas the resident had an obviously inferior performance compared with the expert (a sensitivity of 66%, a specificity of 52.9%).8 Solutions to enhance the diagnostic efficiency of radiologists of different levels are greatly needed. Several novel ultrasonic techniques, including elastography and contrast enhanced ultrasonography, have shown the potential for assisting the diagnosis of breast lesions as a supplementary method to BI-RADS lexicon, which still remains controversial.18,19
This decade has witnessed an accelerated development of CAD in the application of breast imaging along with the advancement of machine learning.9 The CAD systems for breast US usually work with a successive procedure to provide final assessments of breast lesions, including tumor segmentation, feature extraction, and diagnostic classification. Several CAD systems have been reported recently that differed in sensitivity and specificity, some of which showed good sensitivity but low specificity and some were the opposite.20–22 S-detect is a dedicated CAD software that is integrated on a high-end commercial US apparatus. Different from the traditional CAD systems depending on the manual feature design, S-detect was constructed on a deep convolutional neural network, which enabled accurate decisions by extracting high-order statistics and optimizing the balance of input and output data through multiple hidden layers, after extensive training and iteration on large scale of databases. Of note, raw ultrasonic signals without image postprocessing were collected as the raw data for the development of S-detect. Consequently, S-detect is free from human intervention, as well as ultrasonic artifacts and speckles, making it a more reliable CAD system for independent machine diagnosis of breast lesions. To elucidate the feasibility of S-detect and its potential use to assist radiologists in improving diagnostic performance, several studies have been carried out in Korea and Italy.14–16 For further clinical promotion of S-detect, more clinical tests are required to validate its feasibility in databases from various sources due to the underlying variable distributions of different pathological types in different population. This study was aimed to provide a large database of breast lesions as a validation data set of Chinese women to evaluate the diagnostic performance of S-detect in a different group of people.
The results of our study were in accord with previous studies from Korea and Italy, which approved the feasibility of S-detect on a validation set from a Chinese medical center. In our study, S-detect was investigated on 266 cases from a Chinese medical center. The overall performance of S-detect was satisfying, as the AUC value (0.807) showed no difference with experienced radiologist (0.817). This result was similar to the previous studies. According to Choi et al, the values of AUC and specificity of the recruited breast lesions for experienced doctors showed a significant increase from 76.6% to 80.3% and 0.84 to 0.86, respectively, after using S-detect, whereas the sensitivity remained the same.23 di Segni et al also verified that S-detect had a better specificity in comparison with radiologists and could help improve the specificity of radiologists.16 In the study by Cho et al, S-detect presented a higher specificity, PPV, and accuracy compared with experienced radiologist and lower sensitivity and NPV.15 For our study, S-detect had lower sensitivity (87.07%) and NPV (81.9%), and better specificity (72.27%) and NPV (81.9%) than the experienced radiologist. The difference in specificity of the two results was of statistical significance. These results, including that of our study, indicated that S-detect could be useful in improving the specificity of radiologists to avoid unnecessary biopsy or surgery.
For each subcategorization of the experienced radiologist, proportions of benign lesions in the category of BI-RADS 3, 4a, 4b, 4c, and 5 were 98.13%, 87.0%, 48.0%, 9.09%, and 0%, respectively. This conclusion for subcategorization was nearly consistent with the BI-RADS lexicon, such that the risks of category 3, 4a, 4b, 4c, and 5 were appropriately <2%, 3%–10%, 10%–49%, 50%–94%, and >95%, respectively. Category 4a is a controversial subtype for breast lesions, which is likely to be made up by benign cases more commonly, also mentioned by previous studies.14 In category 4a of our study, the proportion of malignancy exceeded 10%, higher than the BI-RADS lexicon. This might be accounted by some special histological types of malignant lesions included in our study. These malignant lesions presented few malignant imaging features and were classified in category 4a by readers, such as intraductal papilloma with local cancerous changes, triple-negative breast carcinoma. The proportion of this kind of lesions was above the average level in this study, which might lead to the 10% ratio of malignancy in 4a lesions. As was observed, less benign lesions were defined by S-detect than the experienced radiologist, with statistical significance, indicating that S-detect might be helpful in reducing false-positive cases and improving the specificity in the trade-off of 4a lesions. The better performance of S-detect in this rather confusing category of lesions might be due to the latent diagnostic information from raw US signals acquired by deep learning algorithms of S-detect, which were not visible to human eyes.
The moderate agreement between S-detect and the radiologist might be explained by the different methods of the two raters in diagnostic procedures. For S-detect, imaging information on grayscale US was the only reference for the assessment of breast lesions. Whereas, the classifications of the lesions made by the radiologist were based on comprehensive evaluations of all information, including medical history and assistant imaging methods, such as Doppler US and elastography. The discrepancy between the radiologist and S-detect could be evident in some special cases (Figures 5 and 6). The radiologist could classify these cases into category 4 when informed of clinical information, as well as the results of Doppler US and elastography. In those cases, the use of S-detect could be limited, and radiologists should still play the main role in the diagnosis of breast lesions.
From the initial results of S-detect in our medical center, S-detect performed well in aiding the diagnosis of breast lesions with US imaging, presenting great potential for clinical application. Of note, S-detect is also a user-friendly and time-saving software system, and it is imbedded in a high-end US machine. These advantages make it more suitable for commercial use than other homemade CAD systems. In conclusion, it can be beneficial for radiologists to use S-detect as a tool in diagnosing breast lesions based on US examination. We hope that this research can provide evidence for clinical application of S-detect in a wider range of regions.
In this study, the best-quality static images obtained by highly qualified radiologists might exaggerate the performance of S-detect. Handheld breast US is highly operator dependent. We cannot guarantee the same level of performance in every real clinical environment. Additionally, the diagnostic performance of the resident might be underestimated. The resident made classifications based on the static pictures recorded by the experienced radiologist rather than a comprehensive evaluation of the lesions in the conventional diagnostic procedure. There was also a slightly unequal distribution of benign and malignant lesions in the sample. The higher prevalence of benign lesions might be good for S-detect, which seemed more effective in identifying the benign lesions compared with the radiology. More studies about S-detect in other medical centers are warranted to further evaluate S-detect.
In this study, the feasibility of S-detect as dedicated CAD software to aid the diagnosis of breast lesions was elucidated based on 227 cases in a Chinese medical center. S-detect had a higher specificity than the experienced radiologist. The AUC value of S-detect had no difference compared with the experienced radiologist. In addition to its good diagnostic performance, S-detect also has the remarkable advantage of convenience and swiftness for clinical adaptions. Wide clinical applications of S-detect can be expected.
CAD, computer-assisted diagnosis; AUC, area under the curve; BI-RADS, Breast Imaging Report and Data System; ROC, receiver operating characteristic curve; PLR, positive likelihood ratio; NLR, negative likelihood ratio; PPV, positive predictive value; NPV, negative predictive value; US, ultrasound.
This work was funded by CAMS Innovation Fund for Medical Sciences (2017-I2M-1-006) and the 2016 Peking Union Medical College education and teaching reform project (2016zlgc0113).
QZ had the role of funding source for this study. The authors report no other conflicts of interest in this work.
Miller KD, Siegel RL, Lin CC, et al. Cancer treatment and survivorship statistics, 2016. CA Cancer J Clin. 2016;66(4):271–289.
Chen W, Zheng R, Baade PD, et al. Cancer statistics in China, 2015. CA Cancer J Clin. 2016;66(2):115–132.
Dubey AK, Gupta U, Jain S. Breast cancer statistics and prediction methodology: a systematic review and analysis. Asian Pac J Cancer Prev. 2015;16(10):4237–4245.
Brem RF, Lenihan MJ, Lieberman J, Torrente J. Screening breast ultrasound: past, present, and future. AJR Am J Roentgenol. 2015;204(2):234–240.
Rao AA, Feneis J, Lalonde C, Ojeda-Fournier H. A pictorial review of changes in the BI-RADS fifth edition. Radiographics. 2016;36(3):623–639.
D’Orsi CJ, Bassett LW, Berg WA, et al. Breast imaging reporting and data system: ACR BI-RADS®. 4th ed. Reston, VA, American College of Radiology; 2003.
D’Orsi CJ, Sickles EA, Mendelson EB, Morris EA, et al. ACR BI-RADS® Atlas, breast imaging reporting and data system. Reston, VA, American College of Radiology; 2013.
Lee YJ, Choi SY, Kim KS, Yang PS. Variability in observer performance between faculty members and residents using Breast Imaging Reporting and Data System (BI-RADS)-Ultrasound, Fifth Edition (2013). Iran J Radiol. 2016;13(3):e28281.
Dromain C, Boyer B, Ferré R, Canale S, Delaloge S, Balleyguier C. Computed-aided diagnosis (CAD) in the detection of breast cancer. Eur J Radiol. 2013;82(3):417–423.
Chang RF, Wu WJ, Moon WK, Chen DR. Improvement in breast tumor discrimination by support vector machines and speckle-emphasis texture analysis. Ultrasound Med Biol. 2003;29(5):679–686.
Chen CM, Chou YH, Han KC, et al. Breast lesions on sonograms: computer-aided diagnosis with nearly setting-independent features and artificial neural networks. Radiology. 2003;226(2):504–514.
Huang Q, Luo Y, Zhang Q. Breast ultrasound image segmentation: a survey. Int J Comput Assist Radiol Surg. 2017;12(3):493–507.
Han S, Kang HK, Jeong JY, et al. A deep learning framework for supporting the classification of breast lesions in ultrasound images. Phys Med Biol. 2017;62(19):7714–7728.
Kim K, Song MK, Kim EK, Yoon JH. Clinical application of S-detect to breast masses on ultrasonography: a study evaluating the diagnostic performance and agreement with a dedicated breast radiologist. Ultrasonography. 2017;36(1):3–9.
Cho E, Kim EK, Song MK, Yoon JH. Application of computer-aided diagnosis on breast ultrasonography: evaluation of diagnostic performances and agreement of radiologists according to different levels of experience. J Ultrasound Med. 2018;37(1):209–216.
Di Segni M, de Soccio V, Cantisani V, et al. Automated classification of focal breast lesions according to S-detect: validation and role as a clinical and teaching tool. J Ultrasound. 2018;21(2):105–118.
Mendelson EB, Berg WA, Merritt CR. Toward a standardized breast ultrasound lexicon, BI-RADS: ultrasound. Semin Roentgenol. 2001;36(3):217–225.
Park CS, Kim SH, Jung NY, Choi JJ, Kang BJ, Jung HS. Interobserver variability of ultrasound elastography and the ultrasound BI-RADS lexicon of breast lesions. Breast Cancer. 2015;22(2):153–160.
Drudi FM, Cantisani V, Gnecchi M, Malpassini F, Di Leo N, de Felice C. Contrast-enhanced ultrasound examination of the breast: a literature review. Ultraschall Med. 2012;33(7):E1–E7.
Drukker K, Gruszauskas NP, Sennett CA, Giger ML. Breast US computer-aided diagnosis workstation: performance with a large clinical diagnostic population. Radiology. 2008;248(2):392–397.
Chabi ML, Borget I, Ardiles R, et al. Evaluation of the accuracy of a computer-aided diagnosis (CAD) system in breast ultrasound according to the radiologist’s experience. Acad Radiol. 2012;19(3):311–319.
Shen WC, Chang RF, Moon WK. Computer aided classification system for breast ultrasound based on Breast Imaging Reporting and Data System (BI-RADS). Ultrasound Med Biol. 2007;33(11):1688–1698.
Choi JH, Kang BJ, Baek JE, Lee HS, Kim SH. Application of computer-aided diagnosis in breast ultrasound interpretation: improvements in diagnostic performance according to reader experience. Ultrasonography. 2018;37(3):217–225.
This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.Download Article [PDF]