Classification of Breast Cancer with Implementation of Principal Component Analysis Techniques

Main Article Content

José León-Alarcón
https://orcid.org/0009-0004-6190-0990
Roly Cedeño-Menéndez
https://orcid.org/0009-0004-1571-9410

Abstract

Breast cancer is one of the leading causes of mortality in women worldwide, underscoring the importance of implementing accurate and efficient diagnostic tools. This study evaluated the performance of several machine learning algorithms for breast tumor classification using the Wisconsin Breast Cancer Dataset. Principal Component Analysis (PCA) was applied to reduce the dimensionality of the dataset, improving computational efficiency while maintaining critical information for classification.
The models evaluated included Logistic Regression, Support Vector Machines (SVM), Neural Networks, reaching maximum AUC-ROC values of 0.96, 0.95 and 0.99, respectively. The results were compared with previous studies, evidence of the robustness and applicability of the proposed approach.
Although the findings are promising, the study acknowledges limitations, such as the use of a single dataset, and suggests integrating additional clinical features in future research. This work demonstrates the ability of machine learning to improve early diagnosis of breast cancer, with potential for applications in clinical settings.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Article Details

How to Cite
[1]
J. León-Alarcón and R. Cedeño-Menéndez, “Classification of Breast Cancer with Implementation of Principal Component Analysis Techniques”, INGENIO, vol. 9, no. 1, pp. 15–25, Jan. 2026.
Section
Original Research
Author Biographies

José León-Alarcón, Universidad Técnica de Manabí

Area of Expertise: Information Systems with a Concentration in Data Science

email: jose.leon@utm.edu.ec

Roly Cedeño-Menéndez, Universidad Técnica de Manabí

Area of Expertise: Information Systems with a concentration in Data Science

email: Roly.cedeno@utm.edu.ec

References

M. M. Cedeño Cedeño et al., «Impact of primary prevention in the early diagnosis and mortality of breast cancer in Ecuador», Rev. Latinoam. Hipertens., vol. 19, n.o 3, abr. 2024, http://doi.org/10.5281/zenodo.10980345.

J. Álvarez Fernández, P. Palacios Ozores, V. Cebey López, A. Cortegoso Mosquera, y R. López López, «Cáncer de mama», Medicine (Baltimore), vol. 13, n.o 27, pp. 1506-1517, mar. 2021, https://10.1016/j.med.2021.03.002.

IBM, «¿Qué es el análisis de componentes principales (PCA)?» [En línea]. Disponible en: https://www.ibm.com/es-es/think/topics/principal-component-analysis

H. Ait Brahim, S. El-Hadaj, y A. Metrane, «Machine learning analysis of breast cancer treatment protocols and cycle counts: A case study at Mohammed vi hospital, Morocco», Syst. Soft Comput., vol. 6, p. 200097, dic. 2024, https://doi.org/10.1016/j.sasc.2024.200097.

M. Emily, F. Meidioktaviana, G. Z. Nabiilah, y J. V. Moniaga, «Comparative analysis of machine learning and survival analysis for breast cancer prediction», Procedia Comput. Sci., vol. 245, pp. 759-767, nov. 2024, https://doi.org/10.1016/j.procs.2024.10.302.

V. Nemade y V. Fegade, «Machine Learning Techniques for Breast Cancer Prediction», Procedia Comput. Sci., vol. 218, pp. 1314-1320, ene. 2023, https://doi.org/10.1016/j.procs.2023.01.110.

A. F. Agarap, «On Breast Cancer Detection: An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset», en Proceedings of the 2nd International Conference on Machine Learning and Soft Computing, feb. 2018, pp. 5-9. https://doi.org/10.48550/arXiv.1711.07831.

R. Entezari, «Breast Cancer Diagnosis via Classification Algorithms», 3 de julio de 2018, arXiv, Toronto, Canadá: 1807.01334. https://doi.org/10.48550/arXiv.1807.01334.

H. Benbrahim, H. Hachimi, y A. Amine, «Comparative Study of Machine Learning Algorithms Using the Breast Cancer Dataset», en AI2SD 2019, Cham, Springer: Springer International Publishing, feb. 2020, pp. 83-91. https://doi.org/10.1007/978-3-030-36664-3_10.

E. S. Simant Prakoonwit, «Effective Feature Engineering and Classification of Breast Cancer Diagnosis: A Comparative Study», BioMedInformatics, n.o 3, pp. 616-631, agosto de 2023.

S. Aamir et al., «Predicting Breast Cancer Leveraging Supervised Machine Learning Techniques», Comput. Math. Methods Med., vol. 2022, n.o 1, p. 5869529, ago. 2022, https://doi.org/10.1155/2022/5869529.

W. W. Olvi Mangasarian, «Breast Cancer Wisconsin (Diagnostic)». UCI Machine Learning Repository, 1993. https://doi.org/10.24432/C5DW2B.

H.-Y. Kim, «Statistical notes for clinical researchers: logistic regression», Restor. Dent. Endod., vol. 42, n.o 4, pp. 342-348, sep. 2017, https://doi.org/10.5395/rde.2017.42.4.342.

E. Kavlakoglu, «What Is Support Vector Machine? », IBM. [En línea]. Disponible en: https://www.ibm.com/think/topics/support-vector-machine

F. Chollet, «Keras documentation: The Sequential model». [En línea]. Disponible en: https://keras.io/guides/sequential_model/

Keras Team, «Keras documentation: Dense layer». [En línea]. Disponible en: https://keras.io/api/layers/core_layers/dense/

Keras Team, «Keras documentation: Dropout layer». [En línea]. Disponible en: https://keras.io/api/layers/regularization_layers/dropout/

Keras Team, «Keras documentation: Model plotting utilities». [En línea]. Disponible en: https://keras.io/api/utils/model_plotting_utils/

Keras Team, «Keras documentation: EarlyStopping». [En línea]. Disponible en: https://keras.io/api/callbacks/early_stopping/

Keras Team, «Keras documentation: ReduceLROnPlateau». [En línea]. Disponible en: https://keras.io/api/callbacks/reduce_lr_on_plateau/

Y. Feng et al., «Predicting breast cancer-specific survival in metaplastic breast cancer patients using machine learning algorithms», J. Pathol. Inform., vol. 14, p. 100329, ago. 2023, https://doi.org/10.1016/j.jpi.2023.100329.

S. R. Gupta, «Prediction time of breast cancer tumor recurrence using Machine Learning», Cancer Treat. Res. Commun., vol. 32, p. 100602, jul. 2022, https://doi.org/10.1016/j.ctarc.2022.100602.

K. Bian, M. Zhou, F. Hu, y W. Lai, «RF-PCA: A New Solution for Rapid Identification of Breast Cancer Categorical Data Based on Attribute Selection and Feature Extraction», Front. Genet., vol. 11, sep. 2020, https://doi.org/10.3389/fgene.2020.566057.

R. Pirchio, «Clasificación de cáncer de mama con técnicas de análisis de la componente principal-Kernel PCA, algoritmos de máquina de vectores de soporte y regresión logística», MediSur, vol. 20, n.o 2, pp. 199-209, abr. 2022.

G. Esen, A. Altaibek, J. Amankulov, B. Matkerim, y M. Nurtas, «Enhancing Breast Cancer Detection with Dimensionality Reduction Techniques: A Study Using PCA and LDA on Wisconsin Breast Cancer Data», Procedia Comput. Sci., vol. 251, pp. 414-421, dic. 2024, https://doi.org/10.1016/j.procs.2024.11.128.