The Internet explosion and the massive diffusion of mobile devices lead to the creation of a worldwide collaborative system, daily used by millions of users through search engines and application interfaces. New paradigms permit to calculate the similarity of terms using only the statistical information returned by a query, or from additional features; also old algorithms and measures have been applied to new domains and scopes, to efficiently find words clusters from the Web. The problem of evaluating such techniques and algorithms in new domains emerges, and highlights a still open field of experimentation. In this paper, preliminary tests have been held on different semantic proximity measures (average confidence, NGD, PMI, chi(2), PMING Distance), and different clustering algorithms among the most used in literature have been compared (e.g. k-means, Expectation-Maximization, spectral clustering) for evaluating such measures. The suitability of the considered measures and methods to calculate the semantic proximity was verified at the state-of-art, and problems were identified, comparing the results of measurements to a ground truth provided by models of contextualized knowledge, clustering and human perception of semantic relations, which data are already studied in literature.

A semantic comparison of clustering algorithms for the evaluation of web-based similarity measures

MILANI, Alfredo
2016-01-01

Abstract

The Internet explosion and the massive diffusion of mobile devices lead to the creation of a worldwide collaborative system, daily used by millions of users through search engines and application interfaces. New paradigms permit to calculate the similarity of terms using only the statistical information returned by a query, or from additional features; also old algorithms and measures have been applied to new domains and scopes, to efficiently find words clusters from the Web. The problem of evaluating such techniques and algorithms in new domains emerges, and highlights a still open field of experimentation. In this paper, preliminary tests have been held on different semantic proximity measures (average confidence, NGD, PMI, chi(2), PMING Distance), and different clustering algorithms among the most used in literature have been compared (e.g. k-means, Expectation-Maximization, spectral clustering) for evaluating such measures. The suitability of the considered measures and methods to calculate the semantic proximity was verified at the state-of-art, and problems were identified, comparing the results of measurements to a ground truth provided by models of contextualized knowledge, clustering and human perception of semantic relations, which data are already studied in literature.
2016
9783319420912
Data mining
Clustering
Semantic evaluation
Semantic similarity
Information retrieval
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14085/43006
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 18
  • ???jsp.display-item.citation.isi??? ND
social impact