Comparative Analysis of SMOTE-Based Random Forest and XGBoost Algorithms for Handling Imbalanced Datasets in Credit Card Fraud Detection
Kata Kunci:
Credit Card Fraud Detection, Imbalanced Dataset, Random Forest, Xgboost, Smote, Machine LearningAbstrak
The rapid growth of digital payment systems has increased the complexity and risk of credit card fraud, particularly due to the highly imbalanced nature of transaction data. This study aims to compare the performance of Random Forest and XGBoost algorithms combined with the Synthetic Minority Over sampling Technique in detecting fraudulent credit card transactions. The proposed approach focuses on improving classification effectiveness by addressing class imbalance and reducing bias toward legitimate transactions. Data preprocessing includes normalization, stratified data splitting, and the application of over sampling techniques on the training dataset. Model performance is evaluated using precision, recall, F score, and the area under the receiver operating characteristic curve, which are more appropriate for imbalanced classification problems. The findings indicate that Random Forest demonstrates more stable and balanced performance, particularly in minimizing false fraud alerts while maintaining adequate fraud detection capability. These results suggest that Random Forest with over sampling provides a practical and reliable solution for real world credit card fraud detection systems.
Unduhan
Referensi
E. Esenogho, I. D. Mienye, T. G. Swart, K. Aruleba, and G. Obaido, “A neural network ensemble with feature engineering for improved credit card fraud detection,” IEEE access, vol. 10, pp. 16400–16407, 2022.
I. Almubark, “Advanced Credit Card Fraud Detection: An Ensemble Learning Using Random Under Sampling and Two-Stage Thresholding,” IEEE Access, 2024.
I. E. Eteng, U. L. Chinedu, and A. E. Ibor, “A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets,” J. Niger. Soc. Phys. Sci., p. 2066, 2025.
V. Sinap, “Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets,” Turkish J. Eng., vol. 8, no. 2, pp. 196–208, 2024.
R. Bounab, K. Zarour, B. Guelib, and N. Khlifa, “Enhancing medicare fraud detection through machine learning: Addressing class imbalance with SMOTE-ENN,” IEEE Access, vol. 12, pp. 54382–54396, 2024.
B. Ahmed, S. Hussain, D. Shakir, N. ur Rehman, and G. Nadeem, “Identifying Credit Card Fraud with Machine Learning: Evaluation of Algorithms and Oversampling Techniques,” Asian Bull. Big Data Manag., vol. 4, no. 3, pp. 33–50, 2024.
N. Yathiraju and B. Dash, “Gamification Of E-Wallets With The Use Of Defi Technology-A Revisit To Digitization In Fintech,” Int. J. Eng. Sci., vol. 3, no. 1, pp. 2582–9734, 2023.
E. M. Al‐dahasi, R. K. Alsheikh, F. A. Khan, and G. Jeon, “Optimizing fraud detection in financial transactions with machine learning and imbalance mitigation,” Expert Syst., vol. 42, no. 2, p. e13682, 2025.
C. D. Ikemefuna, O. Okusi, A. C. Iwuh, and S. Yusuf, “Adaptive fraud detection systems: Using ML to identify and respond to evolving financial threats,” Int. Res. J. Mod. Eng., vol. 6, pp. 2077–2092, 2024.
A. B. Musa, “Comparative study on classification performance between support vector machine and logistic regression,” Int. J. Mach. Learn. Cybern., vol. 4, no. 1, pp. 13–24, 2013.
T. A. Shaikh, T. Rasool, P. Verma, and W. A. Mir, “A fundamental overview of ensemble deep learning models and applications: systematic literature and state of the art,” Ann. Oper. Res., pp. 1–77, 2024.
L. Theodorakopoulos, A. Theodoropoulou, A. Tsimakis, and C. Halkiopoulos, “Big data-driven distributed machine learning for scalable credit card fraud detection using PySpark, XGBoost, and CatBoost,” Electronics, vol. 14, no. 9, p. 1754, 2025.
A. Ayodele, “A comparative study of ensemble learning techniques for imbalanced classification problems,” World J. Adv. Res. Rev., vol. 19, no. 1, pp. 1633–1643, 2023.
H. Y. J. Lam, “Reducing Fraud with Anomaly Detection Algorithms,” 2025.
K. M. Sujon, R. Hassan, K. Choi, and M. A. Samad, “Accuracy, precision, recall, f1-score, or MCC? empirical evidence from advanced statistics, ML, and XAI for evaluating business predictive models,” J. Big Data, vol. 12, no. 1, p. 268, 2025.
Z. Wang, Y. Hong, L. Huang, M. Zheng, H. Yuan, and R. Zeng, “A comprehensive review and future research directions of ensemble learning models for predicting building energy consumption,” Energy Build., p. 115589, 2025.
M. Imani, A. Beikmohammadi, and H. R. Arabnia, “Comprehensive analysis of random forest and XGBoost performance with SMOTE, ADASYN, and GNUS under varying imbalance levels,” Technologies, vol. 13, no. 3, p. 88, 2025.











