Machine learning algorithms have revolutionized data analysis by uncovering hidden patterns and structures. Clustering algorithms play a crucial role in organizing data into coherent groups. We focused on K-Means, hierarchical, and Self-Organizing Map (SOM) clustering algorithms for analyzing homogeneous datasets based on archaeological finds from the middle phase of Pre-Pottery B Neolithic in Southern Levant (10,500–9500 cal B.P.). We aimed to assess the repeatability of these algorithms in identifying patterns using quantitative and qualitative evaluation criteria. Thorough experimentation and statistical analysis revealed the pros and cons of each algorithm, enabling us to determine their appropriateness for various clustering scenarios and data types. Preliminary results showed that traditional K-Means may not capture datasets’ intricate relationships and uncertainties. The hierarchical technique provided a more probabilistic approach, and SOM excelled at maintaining high-dimensional data structures. Our research provides valuable insights into balancing repeatability and interpretability for algorithm selection and allows professionals to identify ideal clustering solutions.
A Comparative Analysis of Machine Learning Algorithms for Identifying Cultural and Technological Groups in Archaeological Datasets through Clustering Analysis of Homogeneous Data
Mastrogiuseppe, Marco;
2024-01-01
Abstract
Machine learning algorithms have revolutionized data analysis by uncovering hidden patterns and structures. Clustering algorithms play a crucial role in organizing data into coherent groups. We focused on K-Means, hierarchical, and Self-Organizing Map (SOM) clustering algorithms for analyzing homogeneous datasets based on archaeological finds from the middle phase of Pre-Pottery B Neolithic in Southern Levant (10,500–9500 cal B.P.). We aimed to assess the repeatability of these algorithms in identifying patterns using quantitative and qualitative evaluation criteria. Thorough experimentation and statistical analysis revealed the pros and cons of each algorithm, enabling us to determine their appropriateness for various clustering scenarios and data types. Preliminary results showed that traditional K-Means may not capture datasets’ intricate relationships and uncertainties. The hierarchical technique provided a more probabilistic approach, and SOM excelled at maintaining high-dimensional data structures. Our research provides valuable insights into balancing repeatability and interpretability for algorithm selection and allows professionals to identify ideal clustering solutions.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.