Prediction Model Using Machine Learning: Analysis Of Determinants Of Customer Churn At PT XYZ
Keywords:
Customer Churn, Machine Learning, Random Forest, Logistic Regression, Purchase RetentionAbstract
This study aims to identify the factors influencing customer churn at PT XYZ, a B2B application-based company selling essential goods. Machine learning algorithms such as Random Forest and Logistic Regression were used to predict churn based on demographic and behavioral variables, including age, membership duration, monthly transaction averages, spending value, and product variety. Transaction data from January 2023 to August 2024 was analyzed to understand partner behavior patterns. The results indicate that the Random Forest algorithm provides more accurate predictions than Logistic Regression, based on evaluation metrics such as accuracy, precision, recall, and ROC-AUC. This study provides strategic insights for PT XYZ to reduce churn and maintain customer purchase retention through a data-driven approach.
Downloads
References
Agrawal, S., Das, A., Gaikwad, A., & Dhage, S. (2018). Customer Churn Prediction Modelling Based on Behavioural Patterns Analysis using Deep Learning. 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), 1–6. https://doi.org/10.1109/ICSCEE.2018.8538420
Ahmed, A., & Linen, D. M. (2017). A review and analysis of churn prediction methods for customer retention in telecom industries. 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS), 1–7. https://doi.org/10.1109/ICACCS.2017.8014605
Almahadeen, L. (2024). Evaluating Machine Learning Techniques for Predicting Customer Churn in E-Commerce: A Comparative Analysis. Journal of Logistics, Informatics and Service Science. https://doi.org/10.33168/JLISS.2024.0627
Apa itu churn pelanggan? (2024). https://www.ibm.com/id-id/think/topics/customer-churn
Banda, P. K., & Tembo, S. (2017). Factors Leading to Mobile Telecommunications Customer Churn in Zambia. International Journal of Engineering Research in Africa, 31, 143–154. https://doi.org/10.4028/www.scientific.net/JERA.31.143
Boulesteix, A.-L. (n.d.). Random forest versus logistic regression: A large-scale benchmark experiment.
Cassidy, A. P., & Deviney Jr., F. A. (2014). International Conference on Big Data. IEEE. https://doi.org/10.1109/BigData.2014.7004352
Celik, O., & Osmanoglu, U. O. (2019). Comparing to Techniques Used in Customer Churn Analysis.
Chen, M.-M., & Chen, M.-C. (2020). Modeling Road Accident Severity with Comparisons of Logistic Regression, Decision Tree and Random Forest. Information, 11(5), 270. https://doi.org/10.3390/info11050270
Chen, W., Xie, X., Wang, J., Pradhan, B., Hong, H., Bui, D. T., Duan, Z., & Ma, J. (2017). A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. CATENA, 151, 147–160. https://doi.org/10.1016/j.catena.2016.11.032
Churn Rate Benchmarks. (n.d.). https://recurly.com/research/churn-rate-benchmarks/
Customer Lifetime Value (CLV): Pengertian, Rumus dan Contohnya. (2020). https://aksaragama.com/customer-lifetime-value-clv
Dutschmann, T., & Kinzel, L. (2023). Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation. https://doi.org/10.1186/s13321-023-00709-9
Elgeldawi, E., Sayed, A., Galal, A. R., & Zaki, A. M. (2021). Hyperparameter Tuning for Machine Learning Algorithms Used for Arabic Sentiment Analysis. Informatics. https://doi.org/10.3390/informatics8040079
González-Benito, Ó. (2002). Geodemographic and socioeconomic characterization of the retail attraction of leading hypermarket chains in Spain. The International Review of Retail, Distribution and Consumer Research, 12(1), 81–103. https://doi.org/10.1080/09593960110103869
Hair, J. F., Risher, J. J., Sarstedt, M., & Ringle, C. M. (2019). When to use and how to report the results of PLS-SEM. Emerald Publishing Limited, 31. https://doi.org/10.1108/EBR-11-2018-0203
Haver, J. V. (2017). Benchmarking analytical techniques for churn modelling in a B2B context.
Hills, W., Daniel, W., Lu, M. Y., Schaer, O., & Adams, S. (2020). Modeling Client Churn for Small Business-to-Business Firms. 2020 Systems and Information Engineering Design Symposium (SIEDS), 1–7. https://doi.org/10.1109/SIEDS49339.2020.9106673
Ishwaran, H., & Lu, M. (2018). Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival. John Wiley & Sons, Ltd. https://doi.org/10.1002/sim.7803
Jahromi, T. A., Stakhovych, S., & Ewing, M. (2014). Managing B2B customer churn, retention and profitability. Industrial Marketing Management, 43(7), 1258–1268. https://doi.org/10.1016/j.indmarman.2014.06.016
Jamjoom, A. A. (2021). The use of knowledge extraction in predicting customer churn in B2B. Journal of Big Data, 8(1), 110. https://doi.org/10.1186/s40537-021-00500-3
Jung, Y. (2018). Multiple predicting K-fold cross-validation for model selection. Taylor & Francis, 30(1), 197–215. https://doi.org/10.1080/10485252.2017.1404598
Kabut, A. S., & Windasari, N. A. (2024). A Predictive CRM Analytics Framework For Merchant Retention: Applying RFM Segmentation, Customer Profiling, and Behavioral Analytics In The B2B Payment Gateway Company. Return : Study of Management, Economic and Bussines, 3(6), 409–428. https://doi.org/10.57096/return.v3i6.246
Kamarulzaman, Y. (2010). Geodemographics of Travel E-shoppers: An Empirical Analysis of Uk Consumers. 16(2).
Kim, K.-M. (2023). Development of a prediction model for the depression level of the elderly in low-income households: Using decision trees, logistic regression, neural networks, and random forest. https://www.nature.com/articles/s41598-023-38742-1
Kirasich, K., Smith, T., & Sadler, B. (2018). Random Forest vs Logistic Regression: Binary Classification for Heterogeneous Datasets. 1(3).
Klopotan, I., Buntak, K., & Drozdjek, I. (2014). International Journal for Quality research.
Koeslag, S. (n.d.). PREDICTION OF PARTIAL CHURNERS AND BEHAVIOURAL LOYAL CUSTOMERS THROUGH BEHAVIOURAL HISTORICAL CUSTOMER DATA PUBLIC, NON CONFIDENTIAL VERSION.
Kriti. (2019). Customer churn: A study of factors affecting customer churn using machine learning (0 ed.). Iowa State University. https://doi.org/10.31274/cc-20240624-464
Lemmens, A., & Gupta, S. (2020). Managing Churn to Maximize Profits. Marketing Science, 39(5), 956–973. https://doi.org/10.1287/mksc.2020.1229
Ling, H., Qian, C., Kang, W., Liang, C., & Chen, H. (2019). Construction and Building Materials. Elsevier Ltd. https://doi.org/10.1016/j.conbuildmat.2019.02.071
Liu, Y., & Wang, Y. (2012). New Machine Learning Algorithm: Random Forest. Springer.
Matuszelański, K., & Kopczewska, K. (2022). Customer Churn in Retail E-Commerce Business: Spatial and Machine Learning Approach. Journal of Theoretical and Applied Electronic Commerce Research, 17(1), 165–198. https://doi.org/10.3390/jtaer17010009
Mencarelli, R., & Rivière, A. (2015). Perceived value in B2B and B2C: A comparative approach and cross-fertilization. Marketing Theory, 15(2), 201–220. https://doi.org/10.1177/1470593114552581
Mirkovic, M., Lolic, T., Stefanovic, D., Anderla, A., & Gracanin, D. (2022). Customer Churn Prediction in B2B Non-Contractual Business Settings Using Invoice Data. Applied Sciences, 12(10), 5001. https://doi.org/10.3390/app12105001
Nand Kumar, C. N. (2017). Comparative Analysis of Machine Learning Algorithms for their Effectiveness in Churn Prediction in the Telecom Industry. International Research Journal of Engineering and Technology, 04(08), 485–489.
Nhu, V.-H., Mohammadi, A., Shahabi, H., Ahmad, B. B., Al-Ansari, N., Shirzadi, A., Geertsema, M., Kress, V. R., Karimzadeh, S., Kamran, K. V., Chen, W., & Nguyen, H. (2020). Landslide Detection and Susceptibility Modeling on Cameron Highlands (Malaysia): A Comparison between Random Forest, Logistic Regression and Logistic Model Tree Algorithms.
Park, W., & Ahn, H. (2022). Not All Churn Customers Are the Same: Investigating the Effect of Customer Churn Heterogeneity on Customer Value in the Financial Sector. Sustainability, 14(19), 12328. https://doi.org/10.3390/su141912328
Pranckevičius, T., & Marcinkevičius, V. (2017). Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification. Baltic Journal of Modern Computing, 5(2). https://doi.org/10.22364/bjmc.2017.5.2.05
Probst, P., Wright, M. N., & Boulesteix, A.-L. (2019). Hyperparameters and tuning strategies for random forest. John Wiley & Sons, Inc., 9(3). https://doi.org/10.1002/widm.1301
Ringle, C. M., Sarstedt, M., Mitchell, R., & Gudergan, S. P. (2018). The International Journal of Human Resource Management. Informa UK Limited, 1617–1643. https://doi.org/10.1080/09585192.2017.1416655
Rushi, W. A., & Pradhan, V. (2023). Factors Influencing Customer Grocery Shopping Behaviour Amid Covid-19 Pandemic. CARDIOMETRY, 25, 743–755. https://doi.org/10.18137/cardiometry.2022.25.743755
Russo, I., Confente, I., Gligor, D. M., & Autry, C. W. (2016). To be or not to be (loyal): Is there a recipe for customer loyalty in the B2B context? Journal of Business Research, 69(2), 888–896. https://doi.org/10.1016/j.jbusres.2015.07.002
Sahani, N. (2021). GIS-based spatial prediction of recreational trail susceptibility in protected area of Sikkim Himalaya using logistic regression, decision tree and random forest model. Ecological Informatics.
Salma, N., & Aprianingsih, Ph.D, A. (2021). Customer Churn Analysis: Analyzing Customer Churn Determinants on an ISP Company in Indonesia. Buletin Pos Dan Telekomunikasi, 29–40. https://doi.org/10.17933/bpostel.2021.190103
Suthaharan, S. (2016). Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning (Vol. 36). Springer US. https://doi.org/10.1007/978-1-4899-7641-3
Tamaddoni Jahromi, A., Stakhovych, S., & Ewing, M. (2014). Managing B2B customer churn, retention and profitability. Industrial Marketing Management, 43(7), 1258–1268. https://doi.org/10.1016/j.indmarman.2014.06.016
Wadikar, D. (2020). Customer Churn Prediction. https://doi.org/10.21427/KPSZ-X829