medigraphic.com
SPANISH

MediSur

ISSN 1727-897X (Electronic)
  • Contents
  • View Archive
  • Information
    • General Information        
    • Directory
  • Publish
    • Instructions for authors        
  • medigraphic.com
    • Home
    • Journals index            
    • Register / Login
  • Mi perfil

2022, Number 2

<< Back Next >>

Medisur 2022; 20 (2)

Classification of breast cancer with analysis techniques of the principal component-Kernel PCA, support vector machine algorithms and logistic regression

Pirchio R
Full text How to cite this article

Language: Spanish
References: 7
Page: 199-209
PDF size: 844.08 Kb.


Key words:

machine learning, artificial intelligence, data management.

ABSTRACT

Background: there are many computational tools for managing images and data sets; reducing the size of these favors the management of information.
Objective: reduce the data set size for better information management.
Methods: the Breast Cancer Wisconsin data set (biopsy information - nuclear cells) and the Python Jupyter platform were used. Principal Component Analysis (PCA) and Kernel PCA (kPCA) techniques were implemented to reduce the dimension to 2, 4, 6. Cross-validation was made to select the best hyperparameters of the regression and support vector machine algorithms Logistics. The classification was carried out with the original training test, training test (PCA and kPCA) and training test (data transformed from PCA and kPCA). Accuracy, precision, completeness, recovery, and area under the curve were analyzed.
Results: the PCA with six components explained the variation rate by almost 90%. The best hyperparameters found for the vector support machine: linear kernel and C = 100, for logistic regression were C = 100, Newton-cg solution (solver) and I2. The best results of the metrics were for PCA 2 and 4 (0.99, 0.99, 1, 0.99, 0.99). For the training set with original data they were 0.96; 0.95; 0.99; 0.97; 0.95. For logistic regression the best results were for kPCA with 6 components. The statistical results were equal to 1. For the training set with original data, these values were 0.96; 0.95; 0.99; 0.97; 0.95.
Conclusions: the results of the metrics improved using PCA and kPCA.


REFERENCES

  1. Universidad de California. Breast Cancer Wisconsin (Diagnostic). In: UCI Machine Learning Repository Wisconsin[Internet]. Irvine: Universidad de California; 2000[citado 07/09/2020]. Disponible en: Disponible en: https://archive.ics.uci.edu/ml/setsets/Breast+Cancer+Wisconsin+(Diagnostic) 1.

  2. Akinnuwesi BA, Macaulay BO, Aribisala BS. Breast cancer risk assessment and early diagnosis using Principal Component Analysis and support vector machine techniques. Informatics in Medicine Unlocked. 2020;21:1-13.

  3. Mushtaq Z, Yaqub A, Hassan A, Su SF. Performance Analysis of Supervised Classifiers Using PCA Based Techniques on Breast Cancer, 2019. In: International Conference on Engineering and Emerging Technologies[Internet]. Lahore: IEEE; 2019.p. 1-6. Disponible en: https://ieeexplore.ieee.org/document/87118683.

  4. Mert A, Kilic N, Bilgili E, Akan A. Breast Cancer Detection with Reduced Feature Set, Comput Math Methods Med. 2015;2015:265138.

  5. Saxena S, Gyanchandani M. A Model for Classification of Wisconsin Breast Cancer Datasets using Principal Component Analysis and Back Propagation Neural Network. IJSR. 2019;8(7):1324-7.

  6. You H, Rumbe G. Comparative Study of Classification Techniques on Breast Cancer FNA Biopsy Data. Int J Interact Multim Artif Intell. 2010;1:5-12.

  7. Galarza Hernández J. Reducción de dimensionalidad en Machine Learning. Diagnóstico de cáncer de mama basado en datos genómicos y de imagen[Internet]. Valencia: Universitat Politècnica de València. 2017[citado 06/07/2021]. Disponible en: https://riunet.upv.es/handle/10251/925657.




2020     |     www.medigraphic.com

Mi perfil

C?MO CITAR (Vancouver)

Medisur. 2022;20