Comparison and Evaluation of Euclidean Distance and Divergence in Adaptive K-Means Algorithm for Clustering Human Development Index of Indonesia Province

Maria Claudia Purba; Zakarias Situmorang

doi:10.58471/jds.v3i2.6942

Authors

Maria Claudia Purba Universitas Katolik Santo Thomas, Indonesia
Zakarias Situmorang Universitas Katolik Santo Thomas, Indonesia

DOI:

https://doi.org/10.58471/jds.v3i2.6942

Keywords:

K-Means Adaptive, Divergence distance, Euclidean distance, Indeks Pembangunan Manusia, Clustering

Abstract

This research explores the application of the Adaptive K-Means clustering algorithm on Human Development Index (HDI) data across 34 provinces in Indonesia, comparing the performance of Euclidean and Divergence distance metrics. The HDI indicators used include life expectancy, years of schooling, and per capita expenditure. Data processing was conducted both manually on sample data and automatically using Python for the complete dataset. Results demonstrate that the choice of distance metric significantly impacts clustering effectiveness. Divergence outperformed Euclidean based on silhouette score evaluations, offering more representative cluster separation. Scatter plot visualizations tracked the iterative clustering process. The study contributes to optimizing clustering techniques for socio-economic indicators such as HDI.

References

Bagnall, A., & Janacek, G. (2005). Clustering time series with clipped data. Machine Learning, 58(2–3), 151–178. https://doi.org/10.1007/s10994-005-5825-6

Basbug, M. E., & Engelhardt, B. (2015). AdaCluster : Adaptive Clustering for Heterogeneous Data. 17, 1–34. http://arxiv.org/abs/1510.05491

Biabiany, E., Page, V., Bernard, D., & ... (2020). Using an expert deviation carrying the knowledge of climate data in usual clustering algorithms. ArXiv Preprint ArXiv …, 1–9. https://arxiv.org/abs/2006.05603%0Ahttps://arxiv.org/pdf/2006.05603

Bora, M. D. J., & Gupta, D. A. K. (2014). Effect of Different Distance Measures on the Performance of K-Means Algorithm: An Experimental Study in Matlab. 5(2), 2501–2506. http://arxiv.org/abs/1405.7471

Dani, A. T. R., Putra, F. B., Fauziyah, M., Sifriyani, Suyitno, & Fathurahman, M. (2023). K-Means Algorithm for grouping provinces in Indonesia based on macroeconomic and criminality indicators. Jurnal Statistika, 11(2), 12–21. https://doi.org/10.14710/JSUNIMUS.11.12.-21

Fahmiyah, I., & Ningrum, R. A. (2023). Human Development Clustering in Indonesia: Using K-Means Method and Based on Human Development Index Categories. Journal of Advanced Technology and Multidiscipline, 2(1), 27–33. https://doi.org/10.20473/jatm.v2i1.45070

Ha, J., Kambe, M., & Pe, J. (2011). Data Mining: Concepts and Techniques. In Data Mining: Concepts and Techniques. https://doi.org/10.1016/C2009-0-61819-5

Hedar, A. R., Ibrahim, A. M. M., Abdel-Hakim, A. E., & Sewisy, A. A. (2018). K-means cloning: Adaptive spherical K-means clustering. Algorithms, 11(10), 1–21. https://doi.org/10.3390/a11100151

Holmström, L. (2008). Nonlinear Dimensionality Reduction by John A. Lee, Michel Verleysen. International Statistical Review, 76(2), 308–309. https://doi.org/10.1111/j.1751-5823.2008.00054_10.x

Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666. https://doi.org/10.1016/j.patrec.2009.09.011

Nurhasanah, N., Salwa, N., Ornila, L., Hasan, A., & Mardhani, M. (2021). Classifying regencies and cities on human development index dimensions: Application of K-Means cluster analysis. Jurnal Sains Sosio Humaniora, 5(2), 913–918. https://doi.org/10.22437/jssh.v5i2.15801

Rahardja, U., Aini, Q., & Iqbal, M. (2020). Analisis Cluster dalam Pengelompokan Provinsi di Indonesia Berdasarkan Variabel Penyakit Menular Menggunakan Metode Complete Linkage, Average Linkage dan Ward. InfoTekJar : Jurnal Nasional Informatika Dan Teknologi Jaringan, 5(1), 40–43.

Ram, A., Jalal, S., Jalal, A. S., & Kumar, M. (2010). A Density Based Algorithm for Discovering Density Varied Clusters in Large Spatial Databases. International Journal of Computer Applications, 3(6), 1–4. https://doi.org/10.5120/739-1038

Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11, 2837–2854.