Comparative Analysis of SMOTE-Based Random Forest and XGBoost Algorithms for Handling Imbalanced Datasets in Credit Card Fraud Detection

Muhamad Lutfi Azizan; Yasin Kamil; Septiano Alvian Ismau; Dede Sandi; Ihsan Maulana; Ahmad Nursodiq

Authors

Muhamad Lutfi Azizan Informatics Engineering Study Program, Pamulang University
Yasin Kamil Informatics Engineering Study Program, Pamulang University
Septiano Alvian Ismau Informatics Engineering Study Program, Pamulang University
Dede Sandi Informatics Engineering Study Program, Pamulang University
Ihsan Maulana Informatics Engineering Study Program, Pamulang University
Ahmad Nursodiq Informatics Engineering Study Program, Pamulang University

Keywords:

Credit Card Fraud Detection, Imbalanced Dataset, Random Forest, Xgboost, Smote, Machine Learning

Abstract

The rapid growth of digital payment systems has increased the complexity and risk of credit card fraud, particularly due to the highly imbalanced nature of transaction data. This study aims to compare the performance of Random Forest and XGBoost algorithms combined with the Synthetic Minority Over sampling Technique in detecting fraudulent credit card transactions. The proposed approach focuses on improving classification effectiveness by addressing class imbalance and reducing bias toward legitimate transactions. Data preprocessing includes normalization, stratified data splitting, and the application of over sampling techniques on the training dataset. Model performance is evaluated using precision, recall, F score, and the area under the receiver operating characteristic curve, which are more appropriate for imbalanced classification problems. The findings indicate that Random Forest demonstrates more stable and balanced performance, particularly in minimizing false fraud alerts while maintaining adequate fraud detection capability. These results suggest that Random Forest with over sampling provides a practical and reliable solution for real world credit card fraud detection systems.

Downloads

Download data is not yet available.

References

E. Esenogho, I. D. Mienye, T. G. Swart, K. Aruleba, and G. Obaido, “A neural network ensemble with feature engineering for improved credit card fraud detection,” IEEE access, vol. 10, pp. 16400–16407, 2022.

I. Almubark, “Advanced Credit Card Fraud Detection: An Ensemble Learning Using Random Under Sampling and Two-Stage Thresholding,” IEEE Access, 2024.

I. E. Eteng, U. L. Chinedu, and A. E. Ibor, “A stacked ensemble approach with resampling techniques for highly effective fraud detection in imbalanced datasets,” J. Niger. Soc. Phys. Sci., p. 2066, 2025.

V. Sinap, “Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets,” Turkish J. Eng., vol. 8, no. 2, pp. 196–208, 2024.

R. Bounab, K. Zarour, B. Guelib, and N. Khlifa, “Enhancing medicare fraud detection through machine learning: Addressing class imbalance with SMOTE-ENN,” IEEE Access, vol. 12, pp. 54382–54396, 2024.

B. Ahmed, S. Hussain, D. Shakir, N. ur Rehman, and G. Nadeem, “Identifying Credit Card Fraud with Machine Learning: Evaluation of Algorithms and Oversampling Techniques,” Asian Bull. Big Data Manag., vol. 4, no. 3, pp. 33–50, 2024.

N. Yathiraju and B. Dash, “Gamification Of E-Wallets With The Use Of Defi Technology-A Revisit To Digitization In Fintech,” Int. J. Eng. Sci., vol. 3, no. 1, pp. 2582–9734, 2023.

E. M. Al‐dahasi, R. K. Alsheikh, F. A. Khan, and G. Jeon, “Optimizing fraud detection in financial transactions with machine learning and imbalance mitigation,” Expert Syst., vol. 42, no. 2, p. e13682, 2025.

C. D. Ikemefuna, O. Okusi, A. C. Iwuh, and S. Yusuf, “Adaptive fraud detection systems: Using ML to identify and respond to evolving financial threats,” Int. Res. J. Mod. Eng., vol. 6, pp. 2077–2092, 2024.

A. B. Musa, “Comparative study on classification performance between support vector machine and logistic regression,” Int. J. Mach. Learn. Cybern., vol. 4, no. 1, pp. 13–24, 2013.

T. A. Shaikh, T. Rasool, P. Verma, and W. A. Mir, “A fundamental overview of ensemble deep learning models and applications: systematic literature and state of the art,” Ann. Oper. Res., pp. 1–77, 2024.

L. Theodorakopoulos, A. Theodoropoulou, A. Tsimakis, and C. Halkiopoulos, “Big data-driven distributed machine learning for scalable credit card fraud detection using PySpark, XGBoost, and CatBoost,” Electronics, vol. 14, no. 9, p. 1754, 2025.

A. Ayodele, “A comparative study of ensemble learning techniques for imbalanced classification problems,” World J. Adv. Res. Rev., vol. 19, no. 1, pp. 1633–1643, 2023.

H. Y. J. Lam, “Reducing Fraud with Anomaly Detection Algorithms,” 2025.

K. M. Sujon, R. Hassan, K. Choi, and M. A. Samad, “Accuracy, precision, recall, f1-score, or MCC? empirical evidence from advanced statistics, ML, and XAI for evaluating business predictive models,” J. Big Data, vol. 12, no. 1, p. 268, 2025.

Z. Wang, Y. Hong, L. Huang, M. Zheng, H. Yuan, and R. Zeng, “A comprehensive review and future research directions of ensemble learning models for predicting building energy consumption,” Energy Build., p. 115589, 2025.

M. Imani, A. Beikmohammadi, and H. R. Arabnia, “Comprehensive analysis of random forest and XGBoost performance with SMOTE, ADASYN, and GNUS under varying imbalance levels,” Technologies, vol. 13, no. 3, p. 88, 2025.

Comparative Analysis of SMOTE-Based Random Forest and XGBoost Algorithms for Handling Imbalanced Datasets in Credit Card Fraud Detection

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

QUICK MENU

Template

Journal Visitors

Language

Jurnal Multidisiplin Sahombu

Jurnal Multidisiplin Sahombu

Policies and Regulations Link