EpiHub - Open Digital Epigraphy Hub

Aioanei, Andrei C., Regine R. Hunziker-Rodewald, Konstantin M. Klein, and Dominik L. Michels. 2024. “Deep Aramaic: Towards a Synthetic Data Paradigm Enabling Machine Learning in Epigraphy.” PLOS ONE 19 (4): e0299297. doi:10.1371/journal.pone.0299297.

Epigraphy is witnessing a growing integration of artificial intelligence, notably through its subfield of machine learning (ML), especially in tasks like extracting insights from ancient inscriptions. However, scarce labeled data for training ML algorithms severely limits current techniques, especially for ancient scripts like Old Aramaic. Our research pioneers an innovative methodology for generating synthetic training data tailored to Old Aramaic letters. Our pipeline synthesizes photo-realistic Aramaic letter datasets, incorporating textural features, lighting, damage, and augmentations to mimic real-world inscription diversity. Despite minimal real examples, we engineer a dataset of 250 000 training and 25 000 validation images covering the 22 letter classes in the Aramaic alphabet. This comprehensive corpus provides a robust volume of data for training a residual neural network (ResNet) to classify highly degraded Aramaic letters. The ResNet model demonstrates 95% accuracy in classifying real images from the 8th century BCE Hadad statue inscription. Additional experiments validate performance on varying materials and styles, proving effective generalization. Our results validate the model’s capabilities in handling diverse real-world scenarios, proving the viability of our synthetic data approach and avoiding the dependence on scarce training data that has constrained epigraphic analysis. Our innovative framework elevates interpretation accuracy on damaged inscriptions, thus enhancing knowledge extraction from these historical resources.

Avanzini, Alessandra, Annamaria De Santis, and Irene Rossi. 2018. “Encoding, Interoperability, Lexicography: Digital Epigraphy through the Lens of DASI Experience.” In Crossing Experiences in Digital Epigraphy: From Practice to Discipline, edited by Annamaria De Santis and Irene Rossi, 1–17. Warsaw/Berlin: De Gruyter Open Poland. doi:10.1515/9783110607208-002.

The paper describes the main challenges faced, and the solutions adopted in the frame of the project DASI - Digital Archive for the study of pre-Islamic Arabian inscriptions. In particular, the methodological and technological issues emerged in the conversion from a domain-specific text-based project of digital edition of an epigraphic corpus, to an objective-driven archive for the study and dissemination of inscriptions in different languages and scripts are discussed. With a view to keeping pace with, and possibly fostering reasoning on best practices in the community of digital epigraphers beyond each specific cultural/linguistic domain, special attention is devoted to: the modelling of data and encoding (XML annotation vs database approach; the conceptual model for the valorization of the material aspect of the epigraph; the textual encoding for critical editions); interoperability (pros and cons of compliance to standards; harmonization of metadata; openness; semantic interoperability); lexicography (tools for under-resourced languages; translations).

View on www.degruyterbrill.com

Coffee, Neil, Christopher Forstall, and James Gawley. 2017. The Tesserae Project. PDF. Università Ca’ Foscari Venezia, Italia. doi:10.14277/6969-182-9/ANT-14-14.

The Tesserae Project offers a free online intertextual search tool for ancient Greek, Latin, and English. Tesserae has in the past allowed for a pairwise searching of literary texts in these languages for exact word or lemma similarities. This paper describes two new types of search now offered by Tesserae, by meaning (semantic search) and by sound.

View on edizionicafoscari.unive.it

Grossner, Karl, Susan Grunewald, and Ruth Mostern. 2023. “Bringing Places from the Distant Past to the Present: A Report on the World Historical Gazetteer.” International Journal on Digital Libraries 24 (3): 159–62. doi:10.1007/s00799-022-00341-2.

This article is a report about the progress and current status of the World Historical Gazetteer (whgazetteer.org) (WHG) in the context of its value for helping to organize and record digital and paleographic information. It summarizes the development and functionality of the WHG as a software platform for connecting specialist collections of historical place names. It also reviews the idea of places as entities (rather than simple objects with single labels). It also explains the utility of gazetteers in digital library infrastructure and describes potential future developments.

Moscato, Vincenzo, Marco Postiglione, and Giancarlo Sperlí. 2023. “Few-Shot Named Entity Recognition: Definition, Taxonomy and Research Directions.” ACM Trans. Intell. Syst. Technol. 14 (5). Association for Computing Machinery. doi:10.1145/3609483.

Recent years have seen an exponential growth (+98% in 2022 w.r.t. the previous year) of the number of research articles in the few-shot learning field, which aims at training machine learning models with extremely limited available data. The research interest toward few-shot learning systems for Named Entity Recognition (NER) is thus at the same time increasing. NER consists in identifying mentions of pre-defined entities from unstructured text, and serves as a fundamental step in many downstream tasks, such as the construction of Knowledge Graphs, or Question Answering. The need for a NER system able to be trained with few-annotated examples comes in all its urgency in domains where the annotation process requires time, knowledge and expertise (e.g., healthcare, finance, legal), and in low-resource languages. In this survey, starting from a clear definition and description of the few-shot NER (FS-NER) problem, we take stock of the current state-of-the-art and propose a taxonomy which divides algorithms in two macro-categories according to the underlying mechanisms: model-centric and data-centric. For each category, we line-up works as a story to show how the field is moving toward new research directions. Eventually, techniques, limitations, and key aspects are deeply analyzed to facilitate future studies.

View on doi.org

Multhoff, Anne. 2018. “A Methodological Framework for the Epigraphic South Arabian Lexicography. The Case of the Sabaic Online Dictionary.” In Crossing Experiences in Digital Epigraphy: From Practice to Discipline, edited by Annamaria De Santis and Irene Rossi, 118–32. Warsaw/Berlin: De Gruyter Open Poland. doi:10.1515/9783110607208-010.

View on www.degruyterbrill.com

Palladino, Chiara, and Tariq Yousef. 2024. “Development of Robust NER Models and Named Entity Tagsets for Ancient Greek.” In Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024, edited by Rachele Sprugnoli and Marco Passarotti, 89–97. Torino, Italia: ELRA and ICCL. https://aclanthology.org/2024.lt4hala-1.11/.

This contribution presents a novel approach to the development and evaluation of transformer-based models for Named Entity Recognition and Classification in Ancient Greek texts. We trained two models with annotated datasets by consolidating potentially ambiguous entity types under a harmonized set of classes. Then, we tested their performance with out-of-domain texts, reproducing a real-world use case. Both models performed very well under these conditions, with the multilingual model being slightly superior on the monolingual one. In the conclusion, we emphasize current limitations due to the scarcity of high-quality annotated corpora and to the lack of cohesive annotation strategies for ancient languages.

View on aclanthology.org

Salomon, Corinna. 2024. “Lexicon Leponticum – Concept and Implementation.” In Cisalpine Celtic Literacy – Proceedings of the International Symposium Maynooth 23–24 June 2022, edited by Corinna Salomon and David Stifter. Hagen.

Sövegjártó, Szilvia, and Márton Vér, eds. 2024. Exploring Multilingualism and Multiscriptism in Written Artefacts. Studies in Manuscript Cultures 38. Berlin; Boston: De Gruyter. doi:10.1515/9783111380544.

Yousef, Tariq, Chiara Palladino, and Farnoosh Shamsian. 2023. “Classical Philology in the Time of AI: Exploring the Potential of Parallel Corpora in Ancient Language.” In Proceedings of the Ancient Language Processing Workshop, edited by Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, and Marco C. Passarotti, 179–92. Varna, Bulgaria: INCOMA Ltd., Shoumen, Bulgaria. https://aclanthology.org/2023.alp-1.21/.

This paper provides an overview of diverse applications of parallel corpora in ancient languages, particularly Ancient Greek. In the first part, we provide the fundamental principles of parallel corpora and a short overview of their applications in the study of ancient texts. In the second part, we illustrate how to leverage on parallel corpora to perform various NLP tasks, including automatic translation alignment, dynamic lexica induction, and Named Entity Recognition. In the conclusions, we emphasize current limitations and future work.

View on aclanthology.org

Bibliography

Search and browse EpiHub's bibliography

Your search

Results 10 resources

Explore

General

Topic

Resource type

Publication year