A Comparative Performance Analysis of Classification Algorithms for Hypertension Diagnosis

Imannudin Akbar; Titan Parama Yoga; Acep Hendra; Arnold Ropen Sinaga

doi:10.37278/sisinfo.v8i1.1491

Authors

Imannudin Akbar Information System, Faculty of Technology and Informatics, Universitas Informatika dan Bisnis Indonesia
Titan Parama Yoga Information System, Faculty of Technology and Informatics, Universitas Informatika dan Bisnis Indonesia
Acep Hendra Information System, Faculty of Technology and Informatics, Universitas Informatika dan Bisnis Indonesia
Arnold Ropen Sinaga Information System, Faculty of Technology and Informatics, Universitas Informatika dan Bisnis Indonesia

DOI:

https://doi.org/10.37278/sisinfo.v8i1.1491

Keywords:

Naïve Baiyes, SVM, Random Forest, XGBoost, Hypertension

Abstract

Hypertension is a leading cause of cardiovascular diseases, strokes, and kidney failure, with early diagnosis being critical for prevention. Traditional diagnostic methods often face challenges such as human error and inconsistent measurements. While machine learning (ML) has been explored as a potential solution, previous studies have mainly focused on accuracy, often neglecting other important metrics like precision, recall, and F1-score, especially in imbalanced datasets. The primary purpose of this research is to address this gap by comprehensively comparing the performance of four machine learning algorithms - Naive Bayes, Support Vector Machines (SVM), Random Forest (RF), and XGBoost—to provide valuable insights for practical hypertension screening. The dataset consists of 1,985 records with 10 predictor features, including both categorical and continuous variables, and a binary target variable (Has_Hypertension: Yes/No) with a class distribution of 1,032 Yes and 953 No. The data undergoes preprocessing, including categorical encoding and feature scaling for SVM. Models are evaluated using a balanced set of metrics, including accuracy, precision, recall, and F1-score. The results show that RF/XGBoost perform best, with the highest F1 and accuracy, while SVM and Naive Bayes serve as competitive alternatives.

References

E. J. Topol, "High-performance medicine: the convergence of human and artificial intelligence," Nature Medicine, vol. 25, no. 1, pp. 44-56, 2019.

B. Shickel, P. J. Tighe, A. Bihorac, and P. Rashidi, "Deep EHR: A survey of recent advances in deep learning techniques for electronic health record analysis," IEEE J. Biomed. Health Inform., vol. 22, no. 5, pp. 1589-1604, 2018.

E. Christodoulou et al., "A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models," J. Clin. Epidemiol., vol. 110, pp. 12-22, 2019.

T. Chen and C. Guestrin, "XGBoost: A scalable tree boosting system," Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pp. 785-794, 2016.

L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.

R. Kohavi, "A study of cross-validation and bootstrap for accuracy estimation and model selection," Proc. IJCAI, pp. 1137-1143, 1995.

D. Chicco and G. Jurman, "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation," BMC Genomics, vol. 21, no. 1, Art. no. 6, 2020.

F. Pedregosa et al., "Scikit-learn: Machine learning in Python," J. Mach. Learn. Res., vol. 12, pp. 2825-2830, 2011.

S. Garcia, S. Ramirez-Gallego, J. Luengo, J. M. Benitez, and F. Herrera, "Big data preprocessing: methods and prospects," Big Data Analytics, vol. 1, Art. no. 9, 2016.

N. Chamidah, E. Z. Astuti, and S. Slamin, "Comparison of Min-Max and Z-Score normalization for breast cancer classification," Jurnal RESTI, vol. 6, no. 1, pp. 10-15, 2022.

P. W. Handayani et al., "Health information systems research in Indonesia: A systematic review," Heliyon, vol. 6, no. 8, Art. no. e04588, 2020.

O. D. Nurhayati et al., "Penerapan machine learning untuk klasifikasi penyakit," Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 8, no. 3, pp. 501-510, 2021.

A. Wibowo and D. Riana, "Analisis performa algoritma klasifikasi pada data medis," Jurnal Sistem Informasi, vol. 16, no. 2, pp. 93-104, 2020.

Kementerian Kesehatan Republik Indonesia, Profil Kesehatan Indonesia 2022. Jakarta, Indonesia: Kemenkes RI, 2022.

Suyanto, Machine Learning Tingkat Dasar dan Lanjut. Bandung, Indonesia: Informatika, 2018.

A Comparative Performance Analysis of Classification Algorithms for Hypertension Diagnosis

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License