Epigraphy is witnessing a growing integration of artificial intelligence, notably through its subfield of machine learning (ML), especially in tasks like extracting insights from ancient inscriptions. However, scarce labeled data for training ML algorithms severely limits current techniques, especially for ancient scripts like Old Aramaic. Our research pioneers an innovative methodology for generating synthetic training data tailored to Old Aramaic letters. Our pipeline synthesizes photo-realistic Aramaic letter datasets, incorporating textural features, lighting, damage, and augmentations to mimic real-world inscription diversity. Despite minimal real examples, we engineer a dataset of 250 000 training and 25 000 validation images covering the 22 letter classes in the Aramaic alphabet. This comprehensive corpus provides a robust volume of data for training a residual neural network (ResNet) to classify highly degraded Aramaic letters. The ResNet model demonstrates 95% accuracy in classifying real images from the 8th century BCE Hadad statue inscription. Additional experiments validate performance on varying materials and styles, proving effective generalization. Our results validate the model’s capabilities in handling diverse real-world scenarios, proving the viability of our synthetic data approach and avoiding the dependence on scarce training data that has constrained epigraphic analysis. Our innovative framework elevates interpretation accuracy on damaged inscriptions, thus enhancing knowledge extraction from these historical resources.
Human history is born in writing. Inscriptions are among the earliest written forms, and offer direct insights into the thought, language and history of ancient civilizations. Historians capture these insights by identifying parallels—inscriptions with shared phrasing, function or cultural setting—to enable the contextualization of texts within broader historical frameworks, and perform key tasks such as restoration and geographical or chronological attribution. However, current digital methods are restricted to literal matches and narrow historical scopes. Here we introduce Aeneas, a generative neural network for contextualizing ancient texts. Aeneas retrieves textual and contextual parallels, leverages visual inputs, handles arbitrary-length text restoration, and advances the state of the art in key tasks. To evaluate its impact, we conduct a large study with historians using outputs from Aeneas as research starting points. The historians find the parallels retrieved by Aeneas to be useful research starting points in 90% of cases, improving their confidence in key tasks by 44%. Restoration and geographical attribution tasks yielded superior results when historians were paired with Aeneas, outperforming both humans and artificial intelligence alone. For dating, Aeneas achieved a 13-year distance from ground-truth ranges. We demonstrate Aeneas’ contribution to historical workflows through analysis of key traits in the renowned Roman inscription Res Gestae Divi Augusti, showing how integrating science and humanities can create transformative tools to assist historians and advance our understanding of the past.
Ancient history relies on disciplines such as epigraphy—the study of inscribed texts known as inscriptions—for evidence of the thought, language, society and history of past civilizations1. However, over the centuries, many inscriptions have been damaged to the point of illegibility, transported far from their original location and their date of writing is steeped in uncertainty. Here we present Ithaca, a deep neural network for the textual restoration, geographical attribution and chronological attribution of ancient Greek inscriptions. Ithaca is designed to assist and expand the historian’s workflow. The architecture of Ithaca focuses on collaboration, decision support and interpretability. While Ithaca alone achieves 62% accuracy when restoring damaged texts, the use of Ithaca by historians improved their accuracy from 25% to 72%, confirming the synergistic effect of this research tool. Ithaca can attribute inscriptions to their original location with an accuracy of 71% and can date them to less than 30 years of their ground-truth ranges, redating key texts of Classical Athens and contributing to topical debates in ancient history. This research shows how models such as Ithaca can unlock the cooperative potential between artificial intelligence and historians, transformationally impacting the way that we study and write about one of the most important periods in human history.
Cuneiform tablets remain founding cornerstones of two hundred plus collections in American academic institutions, having been acquired a century or more ago under dynamic ethical norms and global networks. To foster data sharing, this contribution incorporates empirical data from interactive ArcGIS and reusable OpenContext maps to encourage tandem dialogues about using the inscribed works and learning their collecting histories. Such provenance research aids, on their own, initiate the narration of objects’ journeys over time while cultivating the digital inclusion of expert local knowledge relevant to an object biography. The paper annotates several approaches institutions are or might consider using to expand upon current provenance information in ways that encourage visitors’ critical thinking and learning about global journeys, travel archives, and such dispositions as virtual reunification, reconstructions, or restitution made possible by the provenance research.
Annotated corpora, provided that they adopt international standards and expose data in open format, have many more chances to be easily exploited and reused for different objectives than traditional, analogue corpora. This paper aims at presenting the results of the early adhesion to best practices and principles afterward codified as Open Science and FAIR principles in the frame of projects concerned with digital textual corpora, in a niche area of research such as the pre-Islamic Arabian epigraphy. The case study analysed in this paper is the Digital Archive for the Study of pre-Islamic Arabian inscriptions – DASI, an online annotated corpus of the textual sources from Ancient Arabia, which also exposes its records in standard formats (oai_dc, EpiDoc, EDM) in an OAI-PMH repository. The initiatives of reuse of DASI open data in the frame of the recently ANR-funded project Maparabia (CNRS-CNR) are discussed in the paper, focusing on the exploitation of DASI’s onomastic and geographic data in a new reference tool, the Gazetteer of Ancient Arabia. After introducing DASI and Maparabia projects and highlighting the objectives of the Gazetteer, the paper describes the conceptual model of its database and the module importing data from DASI. The population of the Gazetteer, implying also a data entry and manipulation phase, is exemplified by the case-study of the Ancient South Arabian place ‘Barāqish/Yathill’. Based on the above experience, limitations and opportunities of data reuse and synchronisation issues between systems are discussed.
The two volumes of this Special Issue explore the intersections of digital libraries, epigraphy and paleography. Digital libraries research, practices and infrastructures have transformed the study of ancient inscriptions by providing organizing principles for collections building, defining interoperability requirements and developing innovative user tools and services. Yet linking collections and their contents to support advanced scholarly work in epigraphy and paleography tests the limits of current digital libraries applications. This is due, in part, to the magnitude and heterogeneity of works created over a time period of more than five millennia. The remarkable diversity ranges from the types of artifacts to the methods used in their production to the singularity of individual marks contained within them. Conversion of analogue collections to digital repositories is well underway—but most often not in a way that meets the basic requirements needed to support scholarly workflows. This is beginning to change as collections and content are being described more fully with rich annotations and metadata conforming to established standards. New use of imaging technologies and computational approaches are remediating damaged works and revealing text that has, over time, become illegible or hidden. Transcription of handwritten text to machine-readable form is still primarily a manual process, but research into automated transcription is moving forward. Progress in digital libraries research and practices coupled with collections development of ancient writtten works suggests that epigraphy and paleography will gain new prominence in the Academy.
This article is a report about the progress and current status of the World Historical Gazetteer (whgazetteer.org) (WHG) in the context of its value for helping to organize and record digital and paleographic information. It summarizes the development and functionality of the WHG as a software platform for connecting specialist collections of historical place names. It also reviews the idea of places as entities (rather than simple objects with single labels). It also explains the utility of gazetteers in digital library infrastructure and describes potential future developments.
The article presents the main tools and methods applied in the creation of the Telamon database of the ancient Greek inscriptions from Bulgaria encoded in TEI XML. The work so far on the project is reported, the modifications to the existing services are enumerated and some future perspectives are discussed.
In 2001, Rutgers University Libraries (RUL) accepted a substantial donation of Roman Republican coins. The work to catalog, house, digitize, describe, and present this collection online provided unique challenges for the institution. Coins are often seen as museum objects; however, they can serve pedagogical purposes within libraries. In the quest to innovate, RUL digitized coins from seven angles to provide a 180-degree view of coins. However, this strategy had its drawbacks; it had to be reassessed as the project continued. RUCore, RUL’s digital repository, uses Metadata Object Description Schema (MODS). Accordingly, it was necessary to adapt numismatic description to bibliographic metadata standards.With generous funding from the Loeb Foundation, the resulting digital collection of 1200 coins was added to RUCore from 2012 to 2018. Rutgers’s Badian Roman Coins Collection serves as an exemplar of numismatics in a library environment that is freely available to all on the Web.
The Hesperia project is being currently developed at the Universidad Complutense de Madrid. It is a digitization project aiming at producing an electronic corpus of all the inscriptions in Greek and pre-Roman languages from ancient Hispania (Spain and Portugal). It also includes all the onomastic records in the pre-Roman languages of that area. This paper provides a general overview of the project with some examples of the various types of files used in it. It also mentions future developments of the electronic corpus and directions of research.
Recent years have seen an exponential growth (+98% in 2022 w.r.t. the previous year) of the number of research articles in the few-shot learning field, which aims at training machine learning models with extremely limited available data. The research interest toward few-shot learning systems for Named Entity Recognition (NER) is thus at the same time increasing. NER consists in identifying mentions of pre-defined entities from unstructured text, and serves as a fundamental step in many downstream tasks, such as the construction of Knowledge Graphs, or Question Answering. The need for a NER system able to be trained with few-annotated examples comes in all its urgency in domains where the annotation process requires time, knowledge and expertise (e.g., healthcare, finance, legal), and in low-resource languages. In this survey, starting from a clear definition and description of the few-shot NER (FS-NER) problem, we take stock of the current state-of-the-art and propose a taxonomy which divides algorithms in two macro-categories according to the underlying mechanisms: model-centric and data-centric. For each category, we line-up works as a story to show how the field is moving toward new research directions. Eventually, techniques, limitations, and key aspects are deeply analyzed to facilitate future studies.
The Digital Archive for the Study of pre-Islamic Arabian Inscriptions (DASI, https://dasi. cnr.it/) currently provides open access to the digital editions of nearly 8800 ancient epigraphic texts from the Arabian Peninsula. After presenting an outline of DASI ecosystem through its 25-year history, this paper focuses on the recent enrichment of its data model, carried out within a pilot project of the E-RIHS infrastructure under the H2IOSC programme. The aim was to optimise DASI as an up-to-date tool for the digital critical edition of a broad spectrum of epigraphic sources from ancient Arabia, including graffiti, instrumenta inscripta, coins, and inscribed sticks, alongside ‘monumental’ inscriptions. Most of the interventions targeted the description of the visual aspect of writing and related contextual information, enhancing the digital representation of the material dimension of written heritage, which is often overlooked in philological studies. Ongoing work is targeting the FAIRification of DASI data, which has so far resulted in the sharing of an extensive bibliography of 1800 records through Zotero.
Based on archaeological data and large epigraphic corpuses (DASI, OCIANA), the project aims to develop three free online research instruments, adhering to Open Science and FAIR principles: 1/ Digital atlas of ancient Arabia 2/ Gazetteer of ancient Arabia 3/ Thematic Dictionary of Ancient Arabia (TDAA)
Through the annals of time, writing has slowly scrawled its way from the painted surfaces of stone walls to the grooves of inscriptions to the strokes of quill, pen, and ink. While we still inscribe stone (tombstones, monuments) and we continue to write on skin (tattoos abound), our quotidian method of writing on paper is increasingly abandoned in favor of the quick-to-generate digital text. And even though the stone-inscribed text of epigraphy offers demonstrably better permanence than that of writing on skin and paper—even better than that of the memory system of the modern computer (Bollacker in Am Sci 98:106, 2010)—this field of study has also made the digital leap. Today’s scholarly analyses of epigraphic content increasingly rely on high-tech approaches involving data science and computer models. This essay discusses how advances in a number of exciting technologies are enabling the digital analysis of epigraphic texts and accelerating the ability of scholars to preserve, renew, and reinvigorate the study of the inscriptions that remain from throughout history.
This article advances the thesis that three decades of investments by national and international funders, combined with those of scholars, technologists, librarians, archivists, and their institutions, have resulted in a digital infrastructure in the humanities that is now capable of supporting end-to-end research workflows. The article refers to key developments in the epigraphy and paleography of the premodern period. It draws primarily on work in classical studies but also highlights related work in the adjacent disciplines of Egyptology, ancient Near East studies, and medieval studies. The argument makes a case that much has been achieved but it does not declare “mission accomplished.” The capabilities of the infrastructure remain unevenly distributed within and across disciplines, institutions, and regions. Moreover, the components, including the links between steps in the workflow, are generally far from user-friendly and seamless in operation. Because further refinements and additional capacities are still much needed, the article concludes with a discussion of key priorities for future work.