Comparative Analysis of the C4.5 and Random Forest Algorithms for the Prediction of Diarrheal Disease
Keywords:
diarrhea, toddlers, machine learning, C4.5 algorithm, random forest, disease predictionAbstract
Diarrhea remains one of the leading causes of death among infants in Indonesia, especially in areas with limited access to healthcare. Environmental pollution and unhealthy lifestyles are the main causes of its spread. This study aims to compare the performance of the C4.5 and Random Forest algorithms in predicting diarrhea cases among infants in the working area of the Parlilitan Subdistrict Health Center, Humbahas Regency, North Sumatra Province. Secondary data were obtained from medical records and health center reports, which were then analyzed using Python. Model performance evaluation was conducted using the metrics Accuracy, Precision, Recall, F1-Score, Specificity, False Positive Rate (FPR), and True Positive Rate (TPR). The test results showed that the C4.5 algorithm had superior performance with an Accuracy of 0.92; Precision, Recall, and F1-Score of 0.875 each; Specificity of 0.9412; and FPR of 0.0588. Meanwhile, Random Forest obtained an Accuracy of 0.88; Precision of 0.7778; Recall of 0.875; F1-Score of 0.8235; Specificity of 0.8824; and FPR of 0.1176. These findings indicate that C4.5 is more effective in maintaining a balance between prediction accuracy and detection capability, and is better at minimizing classification errors for negative classes.
References
Kementerian Kesehatan Republik Indonesia. (2018). Riset Kesehatan Dasar 2018. Jakarta: Kemenkes RI.
Kementerian Kesehatan Republik Indonesia. (2020). Profil Kesehatan Indonesia 2020. Jakarta: Kemenkes RI.
Kementerian Kesehatan Republik Indonesia. (2022). Laporan Kesehatan Nasional 2022. Jakarta: Kemenkes RI.
Sepharni. (2022). Klasifikasi Penyakit Jantung Menggunakan Algoritma C4.5. Jurnal Informatika, 10(2), 75-92.
Depari, Widiastiwi, & Santoni. (2022). Perbandingan Algoritma Machine Learning dalam Klasifikasi Penyakit Jantung. Jurnal Kesehatan Digital, 15(3), 70-85.
Munggaran, & Hidayatulloh. (2015). Penerapan Algoritma C4.5 untuk Diagnosa Penyakit Diare Pada Anak Balita Berbasis Mobile. Jurnal Sistem Informasi, 8(1), 55-67.
Ente, et al. (2020). Klasifikasi Faktor-Faktor Penyebab Penyakit Diabetes Melitus Di Rumah Sakit Unhas Menggunakan Algoritma C4.5. Jurnal Ilmu Komputer, 12(2), 98-112.
Afifuddin, & Hakim. (2023). Deteksi Penyakit Diabetes Mellitus Menggunakan Algoritma Decision Tree Model Arsitektur C4.5. Jurnal Teknologi Informasi, 19(1), 33-45.
Prabowo, et al. (2023). Komparasi Tingkat Akurasi Random Forest dan Decision Tree C4.5 Pada Klasifikasi Data Penyakit Infertilitas. Jurnal Kesehatan Digital, 17(4), 88-102.
Kalimah. (2022). Klasifikasi Penyakit Diabetes Menggunakan Metode Decision Tree dan Random Forest. Jurnal Informatika Medis, 14(3), 67-79.
Putra, & Handayani. (2024). Perbandingan Algoritma Decision Tree dan Random Forest Dalam Pengklasifikasian Penyakit Tiroid. Jurnal Data Science, 22(1), 55-68.
Aditya, et al. (2024). Prediksi Penyakit Hipertensi Menggunakan Metode Decision Tree dan Random Forest. Jurnal AI & Kesehatan, 16(2), 100-115.
Masriadi. (2017). Epidemiologi Penyakit Diare. Makassar: Universitas Hasanuddin Press.
Purnama. (2016). Penyakit Diare dan Faktor Risikonya. Jakarta: Pustaka Kesehatan.
Simatupang. (2004). Rotavirus dan Perannya dalam Diare pada Anak. Jurnal Kedokteran Indonesia, 10(2), 44-55.
Nikma Kumala Sari, & Almansyah Lukito. (2017). Faktor Penyebab dan Pencegahan Diare pada Balita. Jurnal Kesehatan Masyarakat, 12(1), 78-91.
Hassan, & Alatas. (1985). Patogenesis dan Pencegahan Diare pada Anak. Jakarta: Balai Pustaka.
Kliegman, Marcdante, & Jenson. (2006). Nelson Textbook of Pediatrics. Philadelphia: Elsevier.
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.
Witten, I. H., Frank, E., & Hall, M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann.
Hamzah, I., & Sitorus, Z. (2024). Analisa Classification Decision Tree C45 dan Naïve Bayes Pada Indikasi Penyakit Diabetes Menggunakan Rapid Miner. Jurnal Nasional Teknologi Komputer, 4(1), 25-33.
Iqbal, M., & Efendi, S. (2023). Data-driven approach for credit risk analysis using C4. 5 algorithm. ComTech: Computer, Mathematics and Engineering Applications, 14(1), 11-20.
Hamzah, I., & Sitorus, Z. (2024). Analisa Classification Decision Tree C45 dan Naïve Bayes Pada Indikasi Penyakit Diabetes Menggunakan Rapid Miner. Jurnal Nasional Teknologi Komputer, 4(1), 25-33.











