The ERC project DASI is aimed at digitizing the overall epigraphic heritage of the ancient Arabian peninsula, in order to enhance knowledge of the pre-Islamic Arabian languages and cultures. This paper describes the challenges faced and the solutions proposed in the construction of a digital lexicon tool for under-resources languages such as those attested in the epigraphic documentation of pre-Islamic Arabia.
DASI is an ERC-Advanced Grant project aimed at digitizing the pre-Islamic inscriptions from Arabia and fostering best practices for the digitization of the epigraphic heritage related to Semitic languages. This paper describes the content model, the standards chosen, and exemplifies the vocabularies in view of a possible harmonization of data pertaining to the specific domain. The architecture of the system and the tools for encoding and retrieving textual content are also illustrated.
The ERC-Advanced Grant project DASI has contributed to define and foster best practices in the digitization of pre-Islamic inscriptions Arabian inscriptions. As one of the early attempts at digitizing the epigraphic heritage related to Semitic languages, it has been facing specific challenges in support description and text encoding. This contribute describes the solutions chosen to encode and represent different kinds of phenomena, such as phonemes typical of the Semitic languages, onomastics, textual portions, symbols and grammatical phenomena. Moreover a digital lexicon tool for under-resources languages, such as those attested in the epigraphic documentation of pre-Islamic Arabia, is illustrated.
The paper describes the main challenges faced, and the solutions adopted in the frame of the project DASI - Digital Archive for the study of pre-Islamic Arabian inscriptions. In particular, the methodological and technological issues emerged in the conversion from a domain-specific text-based project of digital edition of an epigraphic corpus, to an objective-driven archive for the study and dissemination of inscriptions in different languages and scripts are discussed. With a view to keeping pace with, and possibly fostering reasoning on best practices in the community of digital epigraphers beyond each specific cultural/linguistic domain, special attention is devoted to: the modelling of data and encoding (XML annotation vs database approach; the conceptual model for the valorization of the material aspect of the epigraph; the textual encoding for critical editions); interoperability (pros and cons of compliance to standards; harmonization of metadata; openness; semantic interoperability); lexicography (tools for under-resourced languages; translations).
The Digital Archive for the Study of Pre-Islamic Arabian Inscriptions (DASI) is a five-year ERC project of the University of Pisa, directed by Prof. A. Avanzini. Started in May 2011, the project seeks to collect the whole corpus of pre-Islamic Arabian inscriptions in an open-access archive, with the aim of fostering studies and scientific publications on the epigraphic heritage of Arabia. The paper describes the main activities carried out in the first two years of the project: the IT research on the cataloguing methodologies of the epigraphic material, the digitization of thousands of pre-Islamic Arabian inscriptions, and the setting up of the archive website for the fruition of the catalogued material, which opened in October 2013. The project also encourages the involvement of international partners and promotes interest in pre-Islamic Arabia through a series of related activities and projects, such as the IVIEDINA digital library and the IMTO archaeological database, which are promoted in the Arabia Antica portal of the University of Pisa.
EDV (Epigraphic Database Vernacular) is a database collecting the vernacular inscriptions produced in Italy from the late Medieval to the Early Modern Age, and is a part of the EAGLE and IDEA projects. The present contribution illustrates the criteria used for the description and indexing of all inscriptions that record public script in language(s) other than Latin. The material is very varied as regards language, script, provenance, support and function. The author discusses briefly the editorial criteria that may prove most appropriate for its publication.
Textual databases enable precise linguistic comparisons and the study of chronological developments of languages in the geographic space and help safeguard endangered world heritage. In this article, we describe an ongoing study of planning and designing a catalogue of 400 Phoenician‑Punic inscriptions and examine strategies of catalogue standardization and implementation, tagging and annotation systems, digital sustainability and cost‑effectiveness. The database will be searchable (of metadata and textual data), linked, and open on the network.