<p>This edited volume collects together peer-reviewed papers that initially emanated from presentations at Digital Classicist seminars and conference panels.</p><p>This wide-ranging volume showcases exemplary applications of digital scholarship to the ancient world and critically examines the many challenges and opportunities afforded by such research. The chapters included here demonstrate innovative approaches that drive forward the research interests of both humanists and technologists while showing that rigorous scholarship is as central to digital research as it is to mainstream classical studies.</p><p>As with the earlier Digital Classicist publications, our aim is not to give a broad overview of the field of digital classics; rather, we present here a snapshot of some of the varied research of our members in order to engage with and contribute to the development of scholarship both in the fields of classical antiquity and Digital Humanities more broadly.</p> , <p>This wide-ranging volume is a collection of papers that initially emanated from presentations at Digital Classicist seminars and conference panels and showcases exemplary applications of digital scholarship to the ancient world. It critically examines the many challenges and opportunities afforded by such research and offers a snapshot of some of the varied work that is being undertaken by Digital Classicist members at the nexus of digital research and classical studies.</p>
This paper reviews the experience of the Ramses Project in constructing a richly annotated corpus of Late Egyptian that consists of 300 000 words in 2011 (and is expected to grow up to more than 1 million words in coming years). During the first five years of the project, this corpus has been encoded in hieroglyphic script, translated in French or English and received annotations for part-of-speech information, lemmatization, and morphological analysis. The methodology and working tools that have been developed in order to build this corpus are here discussed and future developments are presented.
This paper reports on the construction-based Treebank currently under development in the frame-work of the Ramses Project, which aims at building a multifaceted annotated corpus of Late Egyptian texts. We describe the specifications that have been implemented and we introduce the syntactic formalism and the related representation format that are used for the syntactic annotation. Further-more, the annotation scheme is discussed with particular attention paid to its evolutionary nature. Finally, we explain the methods as well as the annotating tool, called SyntaxEditor; we conclude by addressing the question of forthcoming developments, especially the search engine and a context-sensitive parser.
DASI archive was built during a five-year project directed by Alessandra Avanzini of the University of Pisa, which has been funded by the European Community within the Seventh Framework Programme "Ideas", Specific Programme "ERC - Advanced Grant". DASI seeks to gather all known pre-Islamic Arabian epigraphic material into a comprehensive online database, with the aim to make available to specialists and to the broader public a wide array of documents often underestimated because of their difficulty of access. By means of a digitization process through a hybrid data entry/xml system developed by the Scuola Normale Superiore di Pisa according to international encoding standards, DASI gives access at present to more than 8,600 Ancient South Arabian inscriptions, plus a number of inscriptions of the Ancient North Arabian and Nabataean corpora. Since 2018, DASI is maintained at the CNR.
<h3>Introduction</h3><br> <p>OntoNotes Release 5.0 is the final release of the OntoNotes project, a collaborative effort between <a href="http://www.bbn.com/" rel="nofollow">BBN Technologies</a>, the <a href="http://www.colorado.edu/" rel="nofollow">University of Colorado</a>, the <a href="http://www.upenn.edu/" rel="nofollow">University of Pennsylvania</a> and the <a href="http://www.isi.edu/home" rel="nofollow">University of Southern Californias Information Sciences Institute</a>. The goal of the project was to annotate a large corpus comprising various genres of text (news, conversational telephone speech, weblogs, usenet newsgroups, broadcast, talk shows) in three languages (English, Chinese, and Arabic) with structural information (syntax and predicate argument structure) and shallow semantics (word sense linked to an ontology and coreference).</p><br> <p>OntoNotes Release 5.0 contains the content of earlier releases -- OntoNotes Release 1.0 <a href="http://catalog.ldc.upenn.edu/LDC2007T21" rel="nofollow">LDC2007T21</a>, OntoNotes Release 2.0 <a href="http://catalog.ldc.upenn.edu/LDC2008T04" rel="nofollow">LDC2008T04</a>, OntoNotes Release 3.0 <a href="http://catalog.ldc.upenn.edu/LDC2009T24" rel="nofollow">LDC2009T24</a> and OntoNotes Release 4.0 <a href="http://catalog.ldc.upenn.edu/LDC2011T03" rel="nofollow">LDC2011T03</a> -- and adds source data from and/or additional annotations for, newswire (News), broadcast news (BN), broadcast conversation (BC), telephone conversation (Tele) and web data (Web) in English and Chinese and newswire data in Arabic. Also contained is English pivot text (Old Testament and New Testament text). This cumulative publication consists of 2.9 million words with counts shown in the table below.</p><br> <table><br> <tbody><br> <tr><br> <td> </td><br> <td>Arabic</td><br> <td>English</td><br> <td>Chinese</td><br> </tr><br> <tr><br> <td>News</td><br> <td>300k</td><br> <td>625k</td><br> <td>250k</td><br> </tr><br> <tr><br> <td>BN</td><br> <td>n/a</td><br> <td>200k</td><br> <td>250k</td><br> </tr><br> <tr><br> <td>BC</td><br> <td>n/a</td><br> <td>200k</td><br> <td>150k</td><br> </tr><br> <tr><br> <td>Web</td><br> <td>n/a</td><br> <td>300k</td><br> <td>150k</td><br> </tr><br> <tr><br> <td>Tele</td><br> <td>n/a</td><br> <td>120k</td><br> <td>100k</td><br> </tr><br> <tr><br> <td>Pivot</td><br> <td>n/a</td><br> <td>n/a</td><br> <td>300</td><br> </tr><br> </tbody><br> </table><br> <p> </p><br> <p>The OntoNotes project built on two time-tested resources, following the <a href="http://catalog.ldc.upenn.edu/LDC99T42" rel="nofollow">Penn Treebank</a> for syntax and the <a href="http://catalog.ldc.upenn.edu/LDC2004T14" rel="nofollow">Penn PropBank</a> for predicate-argument structure. Its semantic representation includes word sense disambiguation for nouns and verbs, with some word senses connected to an ontology, and coreference.</p><br> <h3>Data</h3><br> <p>Documents describing the annotation guidelines and the routines for deriving various views of the data from the database are included in the documentation directory of this release. The annotation is provided both in separate text files for each annotation layer (Treebank, PropBank, word sense, etc.) and in the form of an integrated relational database (ontonotes-v5.0.sql.gz) with a Python API to provide convenient cross-layer access.</p><br> <p>It is a known issue that this release contains some non-validating XML files. The included tools, however, use a non-validating XML parser to parse the .xml files and load the appropriate values.</p><br> <h3>Tools</h3><br> <p>This release includes OntoNotes DB Tool v0.999 beta, the tool used to assemble the database from the original annotation files. It can be found in the directory tools/ontonotes-db-tool-v0.999b. This tool can be used to derive various views of the data from the database, and it provides an API that can implement new queries or views. Licensing information for the OntoNotes DB Tool package is included in its source directory.</p><br> <h3>Samples</h3><br> <p>Please view these samples:</p><br> <ul><br> <li><a href="desc/addenda/LDC2013T19.cmn.jpg" rel="nofollow">Chinese</a></li><br> <li><a href="desc/addenda/LDC2013T19.ara.jpg" rel="nofollow">Arabic</a></li><br> <li><a href="desc/addenda/LDC2013T19.eng.jpg" rel="nofollow">English</a></li><br> </ul><br> <h3>Updates</h3><br> <p>Additional documentation was added on December 11, 2014 and is included in downloads after that date. </p><br> <h3>Acknowledgment</h3><br> <p>This work is supported in part by the Defense Advanced Research Projects Agency, GALE Program Grant No. HR0011-06-1-003. The content of this publication does not necessarily reflect the position or policy of the Government, and no official endorsement should be inferred.</p></br> Portions © 2006 Abu Dhabi TV, © 2006 Agence France Presse, © 2006 Al-Ahram, © 2006 Al Alam News Channel, © 2006 Al Arabiya, © 2006 Al Hayat, © 2006 Al Iraqiyah, © 2006 Al Quds-Al Arabi, © 2006 Anhui TV, © 2002, 2006 An Nahar, © 2006 Asharq-al-Awsat, © 2010 Bible League International, © 2005 Cable News Network, LP, LLLP, © 2000-2001 China Broadcasting System, © 2000-2001, 2005-2006 China Central TV, © 2006 China Military Online, © 2000-2001 China National Radio, © 2006 Chinanews.com, © 2000-2001 China Television System, © 1989 Dow Jones & Company, Inc., © 2006 Dubai TV, © 2006 Guangming Daily, © 2006 Kuwait TV, © 2005-2006 National Broadcasting Company, Inc., © 2006 New Tang Dynasty TV, © 2006 Nile TV, © 2006 Oman TV, © 2006 PAC Ltd, © 2006 Peoples Daily Online, © 2005-2006 Phoenix TV, © 2000-2001 Sinorama Magazine, © 2006 Syria TV, © 1996-1998, 2006 Xinhua News Agency, © 1996, 1997, 2005, 2007, 2008, 2009, 2011, 2013 Trustees of the University of Pennsylvania