Breast X-ray imaging is a primary method for early breast cancer screening. However, applying machine learning techniques to this task presents significant challenges, particularly in capturing fine-grained details from high-resolution medical images. This study explored multiple machine learning approaches for breast cancer detection, including image-level and patch-level CNN-based classification, diffusion model-based anomaly detection, and federated learning (FL) for training models across distributed datasets. We evaluated these methods on real-world breast cancer X-ray datasets. Our findings suggest that FL offers a promising balance between performance and privacy preservation, underscoring its potential for real-world medical AI applications.
Citation: Weitong Liao, Ligang He, Mohammed H. Alghamdi, Hammam M. Alghamdi. Exploring machine learning for breast cancer detection in X-ray imaging[J]. Big Data and Information Analytics, 2025, 9: 385-399. doi: 10.3934/bdia.2025019
Breast X-ray imaging is a primary method for early breast cancer screening. However, applying machine learning techniques to this task presents significant challenges, particularly in capturing fine-grained details from high-resolution medical images. This study explored multiple machine learning approaches for breast cancer detection, including image-level and patch-level CNN-based classification, diffusion model-based anomaly detection, and federated learning (FL) for training models across distributed datasets. We evaluated these methods on real-world breast cancer X-ray datasets. Our findings suggest that FL offers a promising balance between performance and privacy preservation, underscoring its potential for real-world medical AI applications.
| [1] |
Karellas A, Vedantham S, (2008) Breast cancer imaging: A perspective for the next decade. Med Phys 35: 4878–4897. https://doi.org/10.1118/1.2986144 doi: 10.1118/1.2986144
|
| [2] |
Zhang Y, Xia K, Li C, Wei B, Zhang B, (2021) Review of breast cancer pathologigcal image processing. BioMed Res Int 2021: 1994764. https://doi.org/10.1155/2021/1994764 doi: 10.1155/2021/1994764
|
| [3] |
Fishman MDC, Rehani MM, (2021) Monochromatic X-rays: The future of breast imaging. Eur J Radiol 144: 109961. https://doi.org/10.1016/j.ejrad.2021.109961 doi: 10.1016/j.ejrad.2021.109961
|
| [4] | Coleman C, (2017) Early detection and screening for breast cancer, In: Seminars in Oncology Nursing, 33: 141–155. https://doi.org/10.1016/j.soncn.2017.02.009 |
| [5] |
Nguyen HT, Nguyen HQ, Pham HH, Lam K, Le LT, Dao M, et al. (2023) VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography. Sci Data 10: 277. https://doi.org/10.1038/s41597-023-02100-7 doi: 10.1038/s41597-023-02100-7
|
| [6] |
Lee RS, Gimenez F, Hoogi A, Miyake KK, Gorovoy M, Rubin DL, (2017) A curated mammography data set for use in computer-aided detection and diagnosis research. Sci Data 4: 1–9. https://doi.org/10.1038/sdata.2017.177 doi: 10.1038/sdata.2017.177
|
| [7] |
Cai H, Wang J, Dan T, Li J, Fan Z, Yi W, et al. (2023) An online mammography database with biopsy confirmed types. Sci Data 10: 123. https://doi.org/10.1038/s41597-023-02025-1 doi: 10.1038/s41597-023-02025-1
|
| [8] |
Balleyguier C, Ayadi S, Van Nguyen K, Vanel D, Dromain C, Sigal R, (2007) BIRADSTM classification in mammography. Eur J Radiol 61: 192–194. https://doi.org/10.1016/j.ejrad.2006.08.033 doi: 10.1016/j.ejrad.2006.08.033
|
| [9] | Tang W, Zhou F, Huang S, Zhu X, Zhang Y, Liu B, (2024) Feature re-embedding: Towards foundation model-level performance in computational pathology, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11343–11352. https://doi.org/10.1109/CVPR52733.2024.01078 |
| [10] | Pathak S, Schlötterer J, Geerdink J, Veltman J, van Keulen M, Strisciuglio N, et al. (2023) Case-level breast cancer prediction for real hospital settings, preprint, arXiv: 2310.12677. https://doi.org/10.48550/arXiv.2310.12677 |
| [11] | Shen L, (2017) End-to-end training for whole image breast cancer diagnosis using an all convolutional design, preprint, arXiv: 1711.05775. https://doi.org/10.48550/arXiv.1711.05775 |
| [12] |
Liao L, Aagaard EM, (2024) An open codebase for enhancing transparency in deep learning-based breast cancer diagnosis utilizing CBIS-DDSM data. Sci Rep 14: 27318. https://doi.org/10.1038/s41598-024-78648-0 doi: 10.1038/s41598-024-78648-0
|
| [13] | Wolleb J, Bieder F, Sandkühler R, Cattin PC, (2022) Diffusion models for medical anomaly detection, In: International Conference on Medical Image Computing and Computer-Assisted Intervention, 35–45. https://doi.org/10.1007/978-3-031-16452-1_4 |
| [14] | Zhu J, Ding C, Tian Y, Pang G, (2024) Anomaly heterogeneity learning for open-set supervised anomaly detection, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17616–17626. https://doi.org/10.1109/CVPR52733.2024.01668 |
| [15] | Rahman A, Valanarasu JMJ, Hacihaliloglu I, Patel VM, (2023) Ambiguous medical image segmentation using diffusion models, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11536–11546. https://doi.org/10.1109/CVPR52729.2023.01110 |
| [16] |
Ribli D, Horváth A, Unger Z, Pollner P, Csabai I, (2018) Detecting and classifying lesions in mammograms with deep learning. Sci Rep 8: 4165. https://doi.org/10.1038/s41598-018-22437-z doi: 10.1038/s41598-018-22437-z
|
| [17] |
Bandi P, Geessink O, Manson Q, Van Dijk M, Balkenhol M, Hermsen M, et al. (2018) From detection of individual metastases to classification of lymph node status at the patient level: The camelyon17 challenge. IEEE Trans Med Imaging 38: 550–560. https://doi.org/10.1109/TMI.2018.2867350 doi: 10.1109/TMI.2018.2867350
|
| [18] | Lingle W, Erickson BJ, Zuley ML, Jarosz R, Bonaccio E, Filippini J, et al. (2016) The cancer genome atlas breast invasive carcinoma collection (TCGA-BRCA). Cancer Imaging Arch 2016. https://doi.org/10.7937/k9/tcia.2016.ab2nazrp |
| [19] | Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, et al. (2021) Learning transferable visual models from natural language supervision, In: International Conference on Machine Learning, 139: 8748–8763. |
| [20] | Xu A, Li W, Guo P, Yang D, Roth HR, Hatamizadeh A, et al. (2022) Closing the generalization gap of cross-silo federated medical image segmentation, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20866–20875. https://doi.org/10.1109/CVPR52688.2022.02020 |
| [21] |
Guan H, Yap PT, Bozoki A, Liu M, (2024) Federated learning for medical image analysis: A survey. Pattern Recognit 2024: 110424. https://doi.org/10.1016/j.patcog.2024.110424 doi: 10.1016/j.patcog.2024.110424
|
| [22] | McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA, (2017) Communication-efficient learning of deep networks from decentralized data, In: Artificial Intelligence and Statistics, 54: 1273–1282. |
| [23] |
Jiang M, Yang H, Cheng C, Dou Q, (2023) IOP-FL: Inside-outside personalization for federated medical image segmentation. IEEE Trans Med Imaging 42: 2106–2117. https://doi.org/10.1109/TMI.2023.3263072 doi: 10.1109/TMI.2023.3263072
|
| [24] | Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. (2020) An image is worth 16x16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929. |
| [25] | Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. (2017) Attention is all you need. Adv Neural Inf Process Syst 30. |
| [26] | He K, Zhang X, Ren S, Sun J, (2016) Deep residual learning for image recognition, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. https://doi.org/10.1109/CVPR.2016.90 |
| [27] | Gupta S, Zhang D, Sampat MP, Markey MK, (2006) Combining texture features from the MLO and CC views for mammographic CADx, In: Medical Imaging 2006: Image Processing, 6144: 1877–1885. https://doi.org/10.1117/12.657023 |
| [28] | Shao Z, Bian H, Chen Y, Wang Y, Zhang J, Ji X, et al. (2021) Transmil: Transformer based correlated multiple instance learning for whole slide image classification. Adv Neural Inf Process Syst 34: 2136–2147. |
| [29] |
Singh NK, Raza K, (2021) Medical image generation using generative adversarial networks: A review. Health Inf Comput Perspect Healthcare 2021: 77–96. https://doi.org/10.1007/978-981-15-9735-0_5 doi: 10.1007/978-981-15-9735-0_5
|
| [30] | Ho J, Jain A, Abbeel P, (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33: 6840–6851. |
| [31] | Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V, (2020) Federated optimization in heterogeneous networks, In: Proceedings of Machine Learning and Systems, 2: 429–450. |
| [32] |
Khan HN, Shahid AR, Raza B, Dar AH, Alquhayz H, (2019) Multi-view feature fusion based four views model for mammogram classification using convolutional neural network. IEEE Access 7: 165724–165733. https://doi.org/10.1109/ACCESS.2019.2953318 doi: 10.1109/ACCESS.2019.2953318
|
| [33] | Zhao Y, Li M, Lai L, Suda N, Civin D, Chandra V, (2018) Federated learning with non-iid data, preprint, arXiv: 1806.00582. https://doi.org/10.48550/arXiv.1806.00582 |