Epigraphy is witnessing a growing integration of artificial intelligence, notably through its subfield of machine learning (ML), especially in tasks like extracting insights from ancient inscriptions. However, scarce labeled data for training ML algorithms severely limits current techniques, especially for ancient scripts like Old Aramaic. Our research pioneers an innovative methodology for generating synthetic training data tailored to Old Aramaic letters. Our pipeline synthesizes photo-realistic Aramaic letter datasets, incorporating textural features, lighting, damage, and augmentations to mimic real-world inscription diversity. Despite minimal real examples, we engineer a dataset of 250 000 training and 25 000 validation images covering the 22 letter classes in the Aramaic alphabet. This comprehensive corpus provides a robust volume of data for training a residual neural network (ResNet) to classify highly degraded Aramaic letters. The ResNet model demonstrates 95% accuracy in classifying real images from the 8th century BCE Hadad statue inscription. Additional experiments validate performance on varying materials and styles, proving effective generalization. Our results validate the model’s capabilities in handling diverse real-world scenarios, proving the viability of our synthetic data approach and avoiding the dependence on scarce training data that has constrained epigraphic analysis. Our innovative framework elevates interpretation accuracy on damaged inscriptions, thus enhancing knowledge extraction from these historical resources.
This work presents a corpus of transliterated cuneiform tablets from the Electronic Babylonian Library (eBL) platform, including a public API endpoint to download the latest version of the data, and a Python library to parse the transliterations in ATF format. As of the time of writing, the constantly growing dataset contains around 25,000 tablets with over 350,000 lines of transliterated text. This dataset is a sizeable addition to open-source cuneiform data and a major milestone for research within the fields of cuneiform studies and NLP.
A digital gazetteer records information associated with specific places. This lesson teaches you how to create a gazetteer from a historical text, using the Linked Places Delimited (LP-TSV) format.
The application of FAIR (Findable, Accessible, Interoperable, Reusable) principles can revolutionise the epigraphic discipline by facilitating quantitative and reproducible research. Despite the richness of Latin inscriptions, the lack of low-barrier tools for accessing and analysing these datasets has hindered largescale studies and the uptake of FAIR and Open Science principles in ancient studies. The LatEpig v2.0 tool addresses this gap by enabling researchers to programmatically access the Epigraphic Database Clauss-Slaby, and generate reproducible research following state-of-the-art standards. The main aim of LatEpig is to democratise data access and enhance research potential without requiring advanced technical skills. A case study on ‘viator’ inscriptions exemplifies the tool’s utility, illustrating spatial and temporal trends in inscriptions addressing messengers and travellers across the Roman Empire. LatEpig exemplifies that the development of similar tools is crucial for advancing FAIR and Open Science practices in the Humanities, ensuring that substantial investments in digital resources are fully realised.
This contribution presents a novel approach to the development and evaluation of transformer-based models for Named Entity Recognition and Classification in Ancient Greek texts. We trained two models with annotated datasets by consolidating potentially ambiguous entity types under a harmonized set of classes. Then, we tested their performance with out-of-domain texts, reproducing a real-world use case. Both models performed very well under these conditions, with the multilingual model being slightly superior on the monolingual one. In the conclusion, we emphasize current limitations due to the scarcity of high-quality annotated corpora and to the lack of cohesive annotation strategies for ancient languages.
Study of Roman Sicily is well established and has a long tradition, with the two most authoritative and well-established epigraphic corpora –CIL X (1883) and IG XIV (1890)– dating to the late 19th century. While I.Sicily was conceived to offer easy and up-to-date access to the evergrowing but increasingly scattered epigraphic evidence of Sicily, its digital nature also enables the adoption of new approaches and the pursuit of novel research questions. The open-access dataset has recently been expanded to include institutional annotations, which hold great promise for research, particularly in fields that rely on extensive and detailed datasets, such as administrative and onomastic history (prosopographic annotation will follow). This paper aims to demonstrate both the potential and the limitations of a digitally annotated dataset as a tool for historical research, through a preliminary case study on the practice of dedications to the Roman emperor in Sicily. Recent scholarship suggests that provincial subjects also contributed to shaping the notion and the expectations around emperorship, which were not only imposed from above. The data-driven approach facilitated by an annotated corpus is well-suited to the new bottom-up perspective, but it is not without methodological pitfalls, which will be highlighted in this paper.
The Maya hieroglyphic script, an indigenous graphic notation system in the Americas, presents a formidable decipherment challenge. Approximately 40 per cent of its approximately one thousand known signs remain elusive owing to limited comprehension of the Classic Mayan language. Spanning modern-day south-eastern Mexico, Guatemala, Belize, and western Honduras, the Classic Maya civilization left over ten thousand inscriptions, primarily detailing the lives of political elites. The ‘Text Database and Dictionary of Classic Mayan’ project endeavours to unveil the script’s mysteries via an online text database and dictionary at https://classicmayan.org. Collaborative digital humanities methodologies and tools empower insights into the Maya’s cultural and historical legacy. The project catalogues inscribed artefacts and images in the virtual research environment TextGrid and the ‘Maya Image Archive (MIA)’, enhancing accessibility and collaboration. It further converts Maya hieroglyphic texts into machine-readable XML/TEI format and employs a novel sign classification framework. A new linguistic tool facilitates linguistic analysis and translation, enriching our understanding of Classic Mayan language and culture. Furthermore, the project compiles a vast repository of digitized Maya culture-related images and textual data, accessible online. As of 2024, it focuses on hieroglyphic texts from specific regions, with ongoing transliteration, transcription, and linguistic analysis. This digital approach not only facilitates dynamic Maya script research but also offers a platform for comprehensive source material evaluation and publication.
The Open Digital Epigraphy Hub (EpiHub) is an open access digital platform developed to streamline accessibility and organization of resources in digital epigraphy. Created within the Humanities and Cultural Heritage Italian Open Science Cloud (H2IOSC), EpiHub addresses the fragmented landscape of digital epigraphic resources, which span disciplines like linguistics, philology, and archaeology. Offering a comprehensive catalogue of national and international resources – such as datasets, digital tools, geographical and chronological gazetteers, dictionaries, and text-processing software – EpiHub structures these assets through descriptive metadata to facilitate discoverability and usability for researchers and practitioners across diverse cultural and temporal scopes. The platform’s flexible back-end architecture supports efficient data management and real-time updates to enhance front-end accessibility, organizing resources by thematic collections and allowing advanced searches based on specific epigraphic needs, such as language, geographic region, or historical period. Emphasizing FAIR principles, EpiHub standardizes metadata and controlled vocabularies to foster broader interoperability and data reuse across research projects. Integrated with related H2IOSC resources, including H-SeTIS and DHeLO, EpiHub aims to become a central resource, continuously enriched to support collaboration and innovation within the digital epigraphy community.
Метою дослідження є ідентифікація та класифікація термінів верхнього рівня в українській епіграфіці для розробки словника SKOS, що сприятиме категоризації, організації та пошуку епіграфічних написів. Дане дослідження заповнює значну прогалину в цифрових гуманітарних науках, де українська епіграфічна спадщина була недостатньо представлена. Основою для опису епіграфічних артефактів в українській академії обрано «Корпус графіті Софії Київської» В. Карнієнка, який представляє структурований формат української епіграфічної спадщини. Дослідження ґрунтується на порівняльних методах та детальному корпусному лінгвістичному аналізі контекстного застосування термінів у науковому дискурсі. Корпус графіті Софії Київської є обґрунтованим вибором завдяки його структурованості та систематичному підходу до опису епіграфічних пам'яток. Важливим елементом дослідження є аналіз структури та змісту робіт Карнієнка для розвитку стандартизованого словника SKOS для українських епіграфічних написів. Дослідження пропонує систематизований підхід до розвитку словника SKOS для української епіграфіки, інтегрованого з існуючими рамками, такими як словник EAGLE для греко-римських артефактів. Це не лише технічний, а й стратегічний крок, який забезпечує ширшу застосовність та інтероперабельність, дозволяючи вивчати українські написи поряд із написами інших культур. Отримані результати сприяють більш інтегрованому та доступному цифровому уявленню епіграфічної спадщини, що не лише збагачує світовий ландшафт цифрових гуманітарних наук, а й забезпечує належну увагу та наукове визнання багатої епіграфічної спадщини України. Важливість цифрових інструментів та корпусного аналізу розвитку цифрових гуманітарних наук, зокрема цифрової епіграфіки, наголошується у цьому дослідженні. Розробка комплексного словника SKOS для української епіграфіки дозволить інтегрувати ці словники з існуючими рамками, такими як словник EAGLE для греко-римських артефактів, що забезпечить ширшу застосовність та інтероперабельність, дозволяючи українським написам вивчати поряд з написами між мовами та культурами.