Comparison and Evaluation of Euclidean Distance and Divergence in Adaptive K-Means Algorithm for Clustering Human Development Index of Indonesia Province
DOI:
https://doi.org/10.58471/jds.v3i2.6942Keywords:
K-Means Adaptive, Divergence distance, Euclidean distance, Indeks Pembangunan Manusia, ClusteringAbstract
This research explores the application of the Adaptive K-Means clustering algorithm on Human Development Index (HDI) data across 34 provinces in Indonesia, comparing the performance of Euclidean and Divergence distance metrics. The HDI indicators used include life expectancy, years of schooling, and per capita expenditure. Data processing was conducted both manually on sample data and automatically using Python for the complete dataset. Results demonstrate that the choice of distance metric significantly impacts clustering effectiveness. Divergence outperformed Euclidean based on silhouette score evaluations, offering more representative cluster separation. Scatter plot visualizations tracked the iterative clustering process. The study contributes to optimizing clustering techniques for socio-economic indicators such as HDI.
References
Bagnall, A., & Janacek, G. (2005). Clustering time series with clipped data. Machine Learning, 58(2–3), 151–178. https://doi.org/10.1007/s10994-005-5825-6
Basbug, M. E., & Engelhardt, B. (2015). AdaCluster : Adaptive Clustering for Heterogeneous Data. 17, 1–34. http://arxiv.org/abs/1510.05491
Biabiany, E., Page, V., Bernard, D., & ... (2020). Using an expert deviation carrying the knowledge of climate data in usual clustering algorithms. ArXiv Preprint ArXiv …, 1–9. https://arxiv.org/abs/2006.05603%0Ahttps://arxiv.org/pdf/2006.05603
Bora, M. D. J., & Gupta, D. A. K. (2014). Effect of Different Distance Measures on the Performance of K-Means Algorithm: An Experimental Study in Matlab. 5(2), 2501–2506. http://arxiv.org/abs/1405.7471
Dani, A. T. R., Putra, F. B., Fauziyah, M., Sifriyani, Suyitno, & Fathurahman, M. (2023). K-Means Algorithm for grouping provinces in Indonesia based on macroeconomic and criminality indicators. Jurnal Statistika, 11(2), 12–21. https://doi.org/10.14710/JSUNIMUS.11.12.-21
Fahmiyah, I., & Ningrum, R. A. (2023). Human Development Clustering in Indonesia: Using K-Means Method and Based on Human Development Index Categories. Journal of Advanced Technology and Multidiscipline, 2(1), 27–33. https://doi.org/10.20473/jatm.v2i1.45070
Ha, J., Kambe, M., & Pe, J. (2011). Data Mining: Concepts and Techniques. In Data Mining: Concepts and Techniques. https://doi.org/10.1016/C2009-0-61819-5
Hedar, A. R., Ibrahim, A. M. M., Abdel-Hakim, A. E., & Sewisy, A. A. (2018). K-means cloning: Adaptive spherical K-means clustering. Algorithms, 11(10), 1–21. https://doi.org/10.3390/a11100151
Holmström, L. (2008). Nonlinear Dimensionality Reduction by John A. Lee, Michel Verleysen. International Statistical Review, 76(2), 308–309. https://doi.org/10.1111/j.1751-5823.2008.00054_10.x
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666. https://doi.org/10.1016/j.patrec.2009.09.011
Nurhasanah, N., Salwa, N., Ornila, L., Hasan, A., & Mardhani, M. (2021). Classifying regencies and cities on human development index dimensions: Application of K-Means cluster analysis. Jurnal Sains Sosio Humaniora, 5(2), 913–918. https://doi.org/10.22437/jssh.v5i2.15801
Rahardja, U., Aini, Q., & Iqbal, M. (2020). Analisis Cluster dalam Pengelompokan Provinsi di Indonesia Berdasarkan Variabel Penyakit Menular Menggunakan Metode Complete Linkage, Average Linkage dan Ward. InfoTekJar : Jurnal Nasional Informatika Dan Teknologi Jaringan, 5(1), 40–43.
Ram, A., Jalal, S., Jalal, A. S., & Kumar, M. (2010). A Density Based Algorithm for Discovering Density Varied Clusters in Large Spatial Databases. International Journal of Computer Applications, 3(6), 1–4. https://doi.org/10.5120/739-1038
Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11, 2837–2854.










