April 2, 2024, 7:51 p.m. | Giuseppe G. A. Celano

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.00739v1 Announce Type: new
Abstract: In this article, the beta version 0.1.0 of Opera Graeca Adnotata (OGA), the largest open-access multilayer corpus for Ancient Greek (AG) is presented. OGA consists of 1,687 literary works and 34M+ tokens coming from the PerseusDL and OpenGreekAndLatin GitHub repositories, which host AG texts ranging from about 800 BCE to about 250 CE. The texts have been enriched with seven annotation layers: (i) tokenization layer; (ii) sentence segmentation layer; (iii) lemmatization layer; (iv) morphological layer; …

abstract article arxiv beta building cs.cl github opera repositories token tokens type

