This contribution presents a novel approach to the development and evaluation of transformer-based models for Named Entity Recognition and Classification in Ancient Greek texts. We trained two models with annotated datasets by consolidating potentially ambiguous entity types under a harmonized set of classes. Then, we tested their performance with out-of-domain texts, reproducing a real-world use case. Both models performed very well under these conditions, with the multilingual model being slightly superior on the monolingual one. In the conclusion, we emphasize current limitations due to the scarcity of high-quality annotated corpora and to the lack of cohesive annotation strategies for ancient languages.
This paper reviews the experience of the Ramses Project in constructing a richly annotated corpus of Late Egyptian that consists of 300 000 words in 2011 (and is expected to grow up to more than 1 million words in coming years). During the first five years of the project, this corpus has been encoded in hieroglyphic script, translated in French or English and received annotations for part-of-speech information, lemmatization, and morphological analysis. The methodology and working tools that have been developed in order to build this corpus are here discussed and future developments are presented.
This paper reports on the construction-based Treebank currently under development in the frame-work of the Ramses Project, which aims at building a multifaceted annotated corpus of Late Egyptian texts. We describe the specifications that have been implemented and we introduce the syntactic formalism and the related representation format that are used for the syntactic annotation. Further-more, the annotation scheme is discussed with particular attention paid to its evolutionary nature. Finally, we explain the methods as well as the annotating tool, called SyntaxEditor; we conclude by addressing the question of forthcoming developments, especially the search engine and a context-sensitive parser.
This paper introduces Ramses, a database of Late Egyptian texts, currently under development at the University of Liège (Belgium). Ramses sets out to be a new and powerful research tool. Its main applications are linguistically and philologically orientated. After a general overview of the structure of the database, the search engines are described with some detail.
The Digital Archive for the Study of pre-Islamic Arabian Inscriptions (DASI, https://dasi. cnr.it/) currently provides open access to the digital editions of nearly 8800 ancient epigraphic texts from the Arabian Peninsula. After presenting an outline of DASI ecosystem through its 25-year history, this paper focuses on the recent enrichment of its data model, carried out within a pilot project of the E-RIHS infrastructure under the H2IOSC programme. The aim was to optimise DASI as an up-to-date tool for the digital critical edition of a broad spectrum of epigraphic sources from ancient Arabia, including graffiti, instrumenta inscripta, coins, and inscribed sticks, alongside ‘monumental’ inscriptions. Most of the interventions targeted the description of the visual aspect of writing and related contextual information, enhancing the digital representation of the material dimension of written heritage, which is often overlooked in philological studies. Ongoing work is targeting the FAIRification of DASI data, which has so far resulted in the sharing of an extensive bibliography of 1800 records through Zotero.
Based on archaeological data and large epigraphic corpuses (DASI, OCIANA), the project aims to develop three free online research instruments, adhering to Open Science and FAIR principles: 1/ Digital atlas of ancient Arabia 2/ Gazetteer of ancient Arabia 3/ Thematic Dictionary of Ancient Arabia (TDAA)
Through the annals of time, writing has slowly scrawled its way from the painted surfaces of stone walls to the grooves of inscriptions to the strokes of quill, pen, and ink. While we still inscribe stone (tombstones, monuments) and we continue to write on skin (tattoos abound), our quotidian method of writing on paper is increasingly abandoned in favor of the quick-to-generate digital text. And even though the stone-inscribed text of epigraphy offers demonstrably better permanence than that of writing on skin and paper—even better than that of the memory system of the modern computer (Bollacker in Am Sci 98:106, 2010)—this field of study has also made the digital leap. Today’s scholarly analyses of epigraphic content increasingly rely on high-tech approaches involving data science and computer models. This essay discusses how advances in a number of exciting technologies are enabling the digital analysis of epigraphic texts and accelerating the ability of scholars to preserve, renew, and reinvigorate the study of the inscriptions that remain from throughout history.
This article advances the thesis that three decades of investments by national and international funders, combined with those of scholars, technologists, librarians, archivists, and their institutions, have resulted in a digital infrastructure in the humanities that is now capable of supporting end-to-end research workflows. The article refers to key developments in the epigraphy and paleography of the premodern period. It draws primarily on work in classical studies but also highlights related work in the adjacent disciplines of Egyptology, ancient Near East studies, and medieval studies. The argument makes a case that much has been achieved but it does not declare “mission accomplished.” The capabilities of the infrastructure remain unevenly distributed within and across disciplines, institutions, and regions. Moreover, the components, including the links between steps in the workflow, are generally far from user-friendly and seamless in operation. Because further refinements and additional capacities are still much needed, the article concludes with a discussion of key priorities for future work.
First official presentation of the "Ramses Project", an richly annotated corpus of Late Egyptian [Paper submitted in 2008/2009]
This paper provides an overview of diverse applications of parallel corpora in ancient languages, particularly Ancient Greek. In the first part, we provide the fundamental principles of parallel corpora and a short overview of their applications in the study of ancient texts. In the second part, we illustrate how to leverage on parallel corpora to perform various NLP tasks, including automatic translation alignment, dynamic lexica induction, and Named Entity Recognition. In the conclusions, we emphasize current limitations and future work.