VOICE CLARIAH


Abstract: 
The project integrates the Vienna-Oxford International Corpus of English into CLARIAH-AT and builds a new, enhanced web interface for VOICE Online to provide continued open access to this English as a lingua franca (ELF) corpus.
Body: 

The aim of the VOICE CLARIAH project is to ensure the long-term, user-friendly and open-access availability of the Vienna-Oxford International Corpus of English (VOICE), a digital one-million word corpus of spoken English as a lingua franca (ELF) interactions. For this purpose, the project team builds a new, enhanced web interface for VOICE Online (to be released in summer 2021) and integrates VOICE into the CLARIAH-AT infrastructure. The project enhances the system architecture of VOICE Online and the quality of VOICE data, for instance by providing an updated TEI-XML format that merges VOICE XML and VOICE POS XML and combines both layers of annotation in a single XML file for each corpus text. The improved system infrastructure complements existing corpus applications and offers new search, filter and style functions that are implemented through an integration of XML, NoSketch Engine, html and json technologies. The new advanced back-/frontend tools enable VOICE users to filter the corpus by selecting transcripts based on additional metadata categories previously unavailable as filters (like number of speakers). Increased style options facilitate and further customize the visualisation of VOICE transcripts. Enhanced search facilities support an extended range of queries, including searches for part-of-speech categories, but also searching for select features of conversational mark-up. The range of functions are made available in the newly-designed, intuitive and user-friendly VOICE Online web interface.

VOICE CLARIAH is an interdisciplinary collaboration of researchers from the Austrian Centre of Digital Humanities and Cultural Heritage (ACDH-CH) of the Austrian Academy of Sciences and the Department of English and American Studies of the University of Vienna. The project team combines applied, corpus and computational linguistic knowledge with IT expertise in software development, programming and web design and contributes to the international visibility of digital humanities research carried out in Austria.

The interplay of digital technologies and corpus linguistics works towards improved digital data processing for spoken corpora, analysis of interaction and multilingual data.

Principal Researcher: Priv.doz. Mag. Dr. Marie-Luise Pitzl, Austrian Centre of Digital Humanities and Cultural Heritage

Project Partner:

  • Mag. Daniel Schopper, Austrian Centre of Digital Humanities and Cultural Heritage
  • Univ.-Prof. Mag. Dr. Barbara Seidlhofer, University of Vienna

Project Team:

  • Hans Christian Breuer, University of Vienna
  • Mag. Dr. Ruth Osimk-Teasdale, University of Vienna
  • Mag. Hannes Pirker, Austrian Centre of Digital Humanities and Cultural Heritage
  • Mag. Stefanie Riegler, University of Vienna
  • Mag. Omar Siam, Austrian Centre of Digital Humanities and Cultural Heritage
Start date: 
2020
End date: 
2021
Publisher Person: 
Marie-Luise Pitzl
Accessibility: 
Open Access
Cover_image: 
Image: 
Kommentare: 

Trotz Versuche mit kleineren Dateigrößen hat es das System nich zugelassen, eine weitere Abbildung als Illustration im Beitragsinneren hochzuladen (wiederholte Fehlermeldung). Sofern möglich, würden wir die Illustration gerne nachreichen und bitten um Kontaktaufnahme. Nachrichten bitte mit CC an stefanie.riegler@univie.ac.at.

Projektverantwortliche/r: 
Person name: 
Marie-Luise Pitzl
Is contact: 
API Output Type: