HeidelTime temporal tagger


Publication Type:

Web Article

Authors:

Source:

(0)

URL:

http://dbs.ifi.uni-heidelberg.de/index.php?id=129

Keywords:

NLP, tagger, t_software

Abstract:

HeidelTime is a multilingual, cross-domain temporal tagger that extracts temporal expressions from documents and normalizes them according to the TIMEX3 annotation standard, which is part of the markup language TimeML (with focus on the "value" attribute).

Notes:

HeidelTime is a multilingual, cross-domain temporal tagger that extracts temporal expressions from documents and normalizes them according to the TIMEX3 annotation standard, which is part of the markup language TimeML (with focus on the "value" attribute).
HeidelTime uses different normalization strategies depending on the domain of the documents that are to be processed: news, narratives (e.g., Wikipedia articles), colloquial (e.g., SMS, tweets), and scientific (e.g., biomedical studies). It is a rule-based system and due to its architectural feature that the source code and the resources (patterns, normalization information, and rules) are strictly separated, one can simply develop resources for additional languages using HeidelTime's well-defined rule syntax.
Currently, 13 languages are supported with manually developed resources: English, Spanish, French, German, Dutch, Italian, Arabic, Vietnamese, Chinese, Russian, Croatian, Portuguese and Estonian.
- Daniel Schopper