The ERC project DASI is aimed at digitizing the overall epigraphic heritage of the ancient Arabian peninsula, in order to enhance knowledge of the pre-Islamic Arabian languages and cultures. This paper describes the challenges faced and the solutions proposed in the construction of a digital lexicon tool for under-resources languages such as those attested in the epigraphic documentation of pre-Islamic Arabia.
DASI is an ERC-Advanced Grant project aimed at digitizing the pre-Islamic inscriptions from Arabia and fostering best practices for the digitization of the epigraphic heritage related to Semitic languages. This paper describes the content model, the standards chosen, and exemplifies the vocabularies in view of a possible harmonization of data pertaining to the specific domain. The architecture of the system and the tools for encoding and retrieving textual content are also illustrated.
The ERC-Advanced Grant project DASI has contributed to define and foster best practices in the digitization of pre-Islamic inscriptions Arabian inscriptions. As one of the early attempts at digitizing the epigraphic heritage related to Semitic languages, it has been facing specific challenges in support description and text encoding. This contribute describes the solutions chosen to encode and represent different kinds of phenomena, such as phonemes typical of the Semitic languages, onomastics, textual portions, symbols and grammatical phenomena. Moreover a digital lexicon tool for under-resources languages, such as those attested in the epigraphic documentation of pre-Islamic Arabia, is illustrated.
Iconclass is an iconographic classification system from the domain of cultural heritage which is used to annotate subjects represented in the visual arts. In this work, we investigate the feasibility of automatically assigning Iconclass codes to visual artworks using a cross-modal retrieval set-up. We explore the text and image branches of the cross-modal network. In addition, we describe a multi-modal architecture that can jointly capitalize on multiple feature sources: textual features, coming from the titles for these artworks (in multiple languages) and visual features, extracted from photographic reproductions of the artworks. We utilize Iconclass definitions in English as matching labels. We evaluate our approach on a publicly available dataset of artworks (containing English and Dutch titles). Our results demonstrate that, in isolation, textual features strongly outperform visual features, although visual features can still offer a useful complement to purely linguistic featur es. Moreover, we show the cross-lingual (Dutch-English) strategy to be on par with the monolingual approach (English-English), which opens important perspectives for applications of this approach beyond resource-rich languages.