Epigraphy is witnessing a growing integration of artificial intelligence, notably through its subfield of machine learning (ML), especially in tasks like extracting insights from ancient inscriptions. However, scarce labeled data for training ML algorithms severely limits current techniques, especially for ancient scripts like Old Aramaic. Our research pioneers an innovative methodology for generating synthetic training data tailored to Old Aramaic letters. Our pipeline synthesizes photo-realistic Aramaic letter datasets, incorporating textural features, lighting, damage, and augmentations to mimic real-world inscription diversity. Despite minimal real examples, we engineer a dataset of 250 000 training and 25 000 validation images covering the 22 letter classes in the Aramaic alphabet. This comprehensive corpus provides a robust volume of data for training a residual neural network (ResNet) to classify highly degraded Aramaic letters. The ResNet model demonstrates 95% accuracy in classifying real images from the 8th century BCE Hadad statue inscription. Additional experiments validate performance on varying materials and styles, proving effective generalization. Our results validate the model’s capabilities in handling diverse real-world scenarios, proving the viability of our synthetic data approach and avoiding the dependence on scarce training data that has constrained epigraphic analysis. Our innovative framework elevates interpretation accuracy on damaged inscriptions, thus enhancing knowledge extraction from these historical resources.
Human history is born in writing. Inscriptions are among the earliest written forms, and offer direct insights into the thought, language and history of ancient civilizations. Historians capture these insights by identifying parallels—inscriptions with shared phrasing, function or cultural setting—to enable the contextualization of texts within broader historical frameworks, and perform key tasks such as restoration and geographical or chronological attribution. However, current digital methods are restricted to literal matches and narrow historical scopes. Here we introduce Aeneas, a generative neural network for contextualizing ancient texts. Aeneas retrieves textual and contextual parallels, leverages visual inputs, handles arbitrary-length text restoration, and advances the state of the art in key tasks. To evaluate its impact, we conduct a large study with historians using outputs from Aeneas as research starting points. The historians find the parallels retrieved by Aeneas to be useful research starting points in 90% of cases, improving their confidence in key tasks by 44%. Restoration and geographical attribution tasks yielded superior results when historians were paired with Aeneas, outperforming both humans and artificial intelligence alone. For dating, Aeneas achieved a 13-year distance from ground-truth ranges. We demonstrate Aeneas’ contribution to historical workflows through analysis of key traits in the renowned Roman inscription Res Gestae Divi Augusti, showing how integrating science and humanities can create transformative tools to assist historians and advance our understanding of the past.
Ancient history relies on disciplines such as epigraphy—the study of inscribed texts known as inscriptions—for evidence of the thought, language, society and history of past civilizations1. However, over the centuries, many inscriptions have been damaged to the point of illegibility, transported far from their original location and their date of writing is steeped in uncertainty. Here we present Ithaca, a deep neural network for the textual restoration, geographical attribution and chronological attribution of ancient Greek inscriptions. Ithaca is designed to assist and expand the historian’s workflow. The architecture of Ithaca focuses on collaboration, decision support and interpretability. While Ithaca alone achieves 62% accuracy when restoring damaged texts, the use of Ithaca by historians improved their accuracy from 25% to 72%, confirming the synergistic effect of this research tool. Ithaca can attribute inscriptions to their original location with an accuracy of 71% and can date them to less than 30 years of their ground-truth ranges, redating key texts of Classical Athens and contributing to topical debates in ancient history. This research shows how models such as Ithaca can unlock the cooperative potential between artificial intelligence and historians, transformationally impacting the way that we study and write about one of the most important periods in human history.
The paper describes the main challenges faced, and the solutions adopted in the frame of the project DASI - Digital Archive for the study of pre-Islamic Arabian inscriptions. In particular, the methodological and technological issues emerged in the conversion from a domain-specific text-based project of digital edition of an epigraphic corpus, to an objective-driven archive for the study and dissemination of inscriptions in different languages and scripts are discussed. With a view to keeping pace with, and possibly fostering reasoning on best practices in the community of digital epigraphers beyond each specific cultural/linguistic domain, special attention is devoted to: the modelling of data and encoding (XML annotation vs database approach; the conceptual model for the valorization of the material aspect of the epigraph; the textual encoding for critical editions); interoperability (pros and cons of compliance to standards; harmonization of metadata; openness; semantic interoperability); lexicography (tools for under-resourced languages; translations).
Taking the work on the graphemic and morphemic analysis of the cuneiform texts of Ebla as a starting point, the paper reviews the ‘grammatical’ criteria that make digital coding not only more efficient and dynamic, but also intellectually more in tune with the goal of establishing an argument and unfolding a narrative. This throws light on aspects of software application on the one hand (such as the semantic web) and of the digital humanities on the other, ranging from textual to archaeological data.
Cuneiform tablets remain founding cornerstones of two hundred plus collections in American academic institutions, having been acquired a century or more ago under dynamic ethical norms and global networks. To foster data sharing, this contribution incorporates empirical data from interactive ArcGIS and reusable OpenContext maps to encourage tandem dialogues about using the inscribed works and learning their collecting histories. Such provenance research aids, on their own, initiate the narration of objects’ journeys over time while cultivating the digital inclusion of expert local knowledge relevant to an object biography. The paper annotates several approaches institutions are or might consider using to expand upon current provenance information in ways that encourage visitors’ critical thinking and learning about global journeys, travel archives, and such dispositions as virtual reunification, reconstructions, or restitution made possible by the provenance research.
The Tesserae Project offers a free online intertextual search tool for ancient Greek, Latin, and English. Tesserae has in the past allowed for a pairwise searching of literary texts in these languages for exact word or lemma similarities. This paper describes two new types of search now offered by Tesserae, by meaning (semantic search) and by sound.
At present, the issue of digital epigraphy seems limited to the digitalization of epigraphs by means of the creation of databases. Digital epigraphy, unlike the digital palaeography that in the last few years has known a potential development that is likely to produce very interesting results, still does not have its own defined search line, at least at current research.
Annotated corpora, provided that they adopt international standards and expose data in open format, have many more chances to be easily exploited and reused for different objectives than traditional, analogue corpora. This paper aims at presenting the results of the early adhesion to best practices and principles afterward codified as Open Science and FAIR principles in the frame of projects concerned with digital textual corpora, in a niche area of research such as the pre-Islamic Arabian epigraphy. The case study analysed in this paper is the Digital Archive for the Study of pre-Islamic Arabian inscriptions – DASI, an online annotated corpus of the textual sources from Ancient Arabia, which also exposes its records in standard formats (oai_dc, EpiDoc, EDM) in an OAI-PMH repository. The initiatives of reuse of DASI open data in the frame of the recently ANR-funded project Maparabia (CNRS-CNR) are discussed in the paper, focusing on the exploitation of DASI’s onomastic and geographic data in a new reference tool, the Gazetteer of Ancient Arabia. After introducing DASI and Maparabia projects and highlighting the objectives of the Gazetteer, the paper describes the conceptual model of its database and the module importing data from DASI. The population of the Gazetteer, implying also a data entry and manipulation phase, is exemplified by the case-study of the Ancient South Arabian place ‘Barāqish/Yathill’. Based on the above experience, limitations and opportunities of data reuse and synchronisation issues between systems are discussed.