Yekun's Note

Machine learning notes and writeup.

Fork me on GitHub

Evaluation Metrics of Named Entity Recognition

Here we briefly introduce some common evaluation metrics in NER tasks, considering both extracted boundary and entities.

Scenarios-that-NER-systems-predict

Exact Match

  • 1) Surface entity and type match (Both entity boundary and type are correct)
  • 2) System hypothesized an entity (predict entity that does not exist in ground truth)
  • 3) Systems miss an entity (entity exists in ground truth, but is not predicted by NER system)

Partial Match (Overlapping)

  • 4) Wrong entity type ( correct entity boundary, type disagree)
  • 5) Wrong boundaries (boundary overlap)
  • 6) Wrong boundaries and wrong entity type

Evaluation Metrics

CoNLL-2003: Computational Natural Language Learning

Automatic Content Extraction (ACE)

Message Understanding Conference (MUC)

  • Consider both entity boundary and entity type
  • Correct (COR): match
  • Incorrect(INC):not match
  • Partial(PAR):predicted entity boundary overlap with golden annotation,but they are not the same
  • Missing(MIS):golden annotation boundary is not identified (predictee do not have, but golden label do)
  • Spurius(SPU):predicted entity boundary does not exist in golden annotation(predictee have, but golden label do not)
  • See MUC-5 EVALUATION METRICS
  • Implementation in python version

SemEval‘13

  • Strict:Exact match (Both entity boundary and type are correct)
  • Exact boundary matching:predicted entity boundary is correct, regardless of entity boundary
  • Partial boundary matching:entity boundaries overlap, regardless of entity boundary
  • Type matching:some overlap between the system tagged entity and the gold annotation is required;

Scenario Golden Standard NER system prediction Measure
Entity Type Entity Boundary (Surface String) Entity Type Entity Boundary (Surface String) Type Partial Exact Strict
III MUSIC_NAME 告白气球 MIS MIS MIS MIS
II MUSIC_NAME 年轮 SPU SPU SPU SPU
V MUSIC_NAME 告白气球 MUSIC_NAME 一首告白气球 COR PAR INC INC
IV MUSIC_NAME 告白气球 SINGER 告白气球 INC COR COR INC
I MUSIC_NAME 告白气球 MUSIC_NAME 告白气球 COR COR COR COR
VI MUSIC_NAME 告白气球 SINGER 一首告白气球 INC PAR INC INC

Number of golden standard:

Number of predictee:

Exact match(i.e. Strict, Exact)

Partial match (i.e. Partial, Type)

F-measure

Measure Type Partial Exact Strict
Correct 2 2 2 1
Incorrect 2 0 2 3
Partial 0 2 0 0
Missed 1 1 1 1
Spurius 1 1 1 1
Precision 0.4 0.6 0.4 0.2
Recall 0.4 0.6 0.4 0.2
F1 score 0.4 0.6 0.4 0.2

Pypi library eval4ner installation: pip install -U eval4ner

For attribution in academic contexts, please cite this work as:

1
2
3
4
5
6
@misc{chai2021NER-eval,
author = {Chai, Yekun},
title = {{Evaluation Metrics of Named Entity Recognition}},
year = {2021},
howpublished = {\url{https://cyk1337.github.io/notes/2018/11/21/NLP/NER/NER-Evaluation-Metrics/}},
}

References