The TrAVaSI_VoDIM Corpus is a sample of the corpus built for the Vocabolario Dinamico Dell’Italiano Moderno (VoDIM, Marazzini and Maconi, 2018), gathering Italian texts from 1861 to the present day, after the Unification of Italy. TrAVaSI_VoDIM is balanced and representative of different prose domains (art, gastronomy, law, newspapers, literature, popular fiction, science), for a total of about 21.000 tokens. TrAVaSI_VoDIM is morpho-syntactically annotated and lemmatized. The annotation, conforming to the Universal Dependencies standard (UD, De Marneffe et al. 2021), has been carried out semi-automatically. First, TrAVaSI_VoDIM was automatically annotated with the Stanza “combined” model for Italian. Automatic annotation was then manually revised. The resulting corpus has also been used to retrain Stanza to deal with historical varieties of the Italian language: achieved results are encouraging.

TrAVaSI_VoDIM Corpus

Favaro M;
2022-01-01

Abstract

The TrAVaSI_VoDIM Corpus is a sample of the corpus built for the Vocabolario Dinamico Dell’Italiano Moderno (VoDIM, Marazzini and Maconi, 2018), gathering Italian texts from 1861 to the present day, after the Unification of Italy. TrAVaSI_VoDIM is balanced and representative of different prose domains (art, gastronomy, law, newspapers, literature, popular fiction, science), for a total of about 21.000 tokens. TrAVaSI_VoDIM is morpho-syntactically annotated and lemmatized. The annotation, conforming to the Universal Dependencies standard (UD, De Marneffe et al. 2021), has been carried out semi-automatically. First, TrAVaSI_VoDIM was automatically annotated with the Stanza “combined” model for Italian. Automatic annotation was then manually revised. The resulting corpus has also been used to retrain Stanza to deal with historical varieties of the Italian language: achieved results are encouraging.
2022
corpus
VoDIM
NLP
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14085/63117
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact