NERDPool: Data Pool for Named Entity Recognition


Abstract: 
In the project NERDPool data from historical German text corpora and editing projects are used to develop gold standard training material for Named Entity Recognition models.
Body: 

Named Entity Recognition (NER) is the ability to automatically identify and extract information about named entities such as person and place names from unstructured data. It is a topic that gains increasingly attention in the Digital Humanities and digital scholarly editing. For the training of NER models for historical German language texts hardly any resources exist. NERDPool tries to overcome this issue by publishing a collection of gold standard named entity annotation samples through a dedicated web application/web service (https://nerdpool-api.acdh-dev.oeaw.ac.at/).

The Austrian Centre for Digital Humanities und Cultural Heritage (ACDH-CH) at the Austrian Academy of Science, the Centre for Information Modelling (ZIM) of the University of Graz, and the University of Innsbruck (UIBK) provide online access to gold standard NER data samples derived from the following historical German text collection:

  • In the project Imperial Diet of Regensburg, 1576 (ZIM), various documents recording the Imperial Diet of 1576 are edited. This project provides a substantial amount of early modern textual data for the training of models for the identification of person and place names in texts such as protocols and official records of meeting of the Imperial Estates, correspondence, etc.
  • The project Reading in the Alps compiled rural, low level administrative documents taken from so-called “Verfachbücher” from the second half of the 18th century.
  • Early Modern Newspapers; Das Wien[n]erische Diarium, 1700-1900 (ACDH-CH).
  • Minutes of the council of the ministers of Austria and of the Austro-Hungarian Monarchy and Minutes of the Austrian Academy of Sciences, 1850-1900 (ACDH-CH, IHB).
  • (private) correspondences; Schnitzler Briefwechsel, 1880-1940 (ACDH-CH).
Start date: 
2020
End date: 
2021
Publisher Person: 
Peter Andorfer
Roman Bleier
Matthias Schlögl
Michael Span
Accessibility: 
Open Access
Cover_image: 
Projektverantwortliche/r: 
Person name: 
Andorfer, Peter
Contact e-mail: 
Is contact: 
Person name: 
Bleier, Roman
Contact e-mail: 
Is contact: 
Person name: 
Schlögl, Matthias
Is contact: 
Person name: 
Span, Michael
Contact e-mail: 
Is contact: 
API Output Type: