hgb-ner-v1 / README.md
iprada's picture
Update README.md
7684e82 verified
metadata
license: mit
tags:
  - flair
  - token-classification
  - sequence-tagger-model
language: de
widget:
  - text: >-
      1536 Item Hannß Ulrich Fürfelder zinst jerlich zu fasnacht dem closter an
      den Steinen 1 ℔ 3ß vom hus zum Falckhen

Historisches Grundbuch der Stadt Basel Nested NER

A model for historical German developed by Ismail Prada Ziegler as part of the projekt Economies of Space. Practices, Discourses and Actors on the Basel Real Estate Market (1400-1700) at the University of Basel in cooperation with the Digital Humanities Bern. This Model was created to annotate nested document structures. It can be used to annotate flat text (such as in the example), but may perform slightly worse than models trained only for that task. You can annotate nested tags by using this script. You can find more info on this model here.

Performance

When annotating recursively:

PER ORG LOC
Precision 86.30% 82.69% 82.79%
Recall 85.82% 74.14% 78.46%
F1-Score 86.06% 78.18% 80.57%

Dataset

Not yet published dataset created from the Historical Land Registry of the city of Basel. Timeframe: 1400-1700. Language: Early New High German. 661 documents in train, 83 in dev. Language model based on the full HLRB corpus until 1800, appr. 120k documents.

The documents were annotated according to the BeNASch annotation guidelines. For this model, a simplified tagset was used.

The training data was prepared in a special way to accommodate nested annotation. See the linked paper for more information.

Citation

If you publish works using this model, please cite:

Prada Ziegler, I. (2024, May 30). What's in an entity? Exploring Nested Named Entity Recognition in the Historical Land Register of Basel (1400-1700). DH Benelux 2024, Leuven, Belgium. Zenodo. https://doi.org/10.5281/zenodo.11394453