Here we briefly introduce some common evaluation metrics in NER tasks, considering both extracted boundary and entities.
Scenarios-that-NER-systems-predict
Exact Match
- 1) Surface entity and type match (Both entity boundary and type are correct)
- 2) System hypothesized an entity (predict entity that does not exist in ground truth)
- 3) Systems miss an entity (entity exists in ground truth, but is not predicted by NER system)
Partial Match (Overlapping)
- 4) Wrong entity type ( correct entity boundary, type disagree)
- 5) Wrong boundaries (boundary overlap)
- 6) Wrong boundaries and wrong entity type
Evaluation Metrics
CoNLL-2003: Computational Natural Language Learning
- Only considers previous 1,2,3 scenarios
- Exact match: precision, recall, f1 measure
- See Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition for details.
Automatic Content Extraction (ACE)
- Include weighting schema
- See Automatic Content Extraction 2008 Evaluation Plan (ACE08)
- See The Automatic Content Extraction (ACE) Program: Tasks, Data, and Evaluation
Message Understanding Conference (MUC)
- Consider both entity boundary and entity type
- Correct (COR): match
- Incorrect(INC):not match
- Partial(PAR):predicted entity boundary overlap with golden annotation,but they are not the same
- Missing(MIS):golden annotation boundary is not identified (predictee do not have, but golden label do)
- Spurius(SPU):predicted entity boundary does not exist in golden annotation(predictee have, but golden label do not)
- See MUC-5 EVALUATION METRICS
- Implementation in python version
SemEval‘13
- Strict:Exact match (Both entity boundary and type are correct)
- Exact boundary matching:predicted entity boundary is correct, regardless of entity boundary
- Partial boundary matching:entity boundaries overlap, regardless of entity boundary
- Type matching:some overlap between the system tagged entity and the gold annotation is required;
Scenario | Golden Standard | NER system prediction | Measure | |||||
---|---|---|---|---|---|---|---|---|
Entity Type | Entity Boundary (Surface String) | Entity Type | Entity Boundary (Surface String) | Type | Partial | Exact | Strict | |
III | MUSIC_NAME | 告白气球 | MIS | MIS | MIS | MIS | ||
II | MUSIC_NAME | 年轮 | SPU | SPU | SPU | SPU | ||
V | MUSIC_NAME | 告白气球 | MUSIC_NAME | 一首告白气球 | COR | PAR | INC | INC |
IV | MUSIC_NAME | 告白气球 | SINGER | 告白气球 | INC | COR | COR | INC |
I | MUSIC_NAME | 告白气球 | MUSIC_NAME | 告白气球 | COR | COR | COR | COR |
VI | MUSIC_NAME | 告白气球 | SINGER | 一首告白气球 | INC | PAR | INC | INC |
Number of golden standard:
Number of predictee:
Exact match(i.e. Strict, Exact)
Partial match (i.e. Partial, Type)
F-measure
Measure | Type | Partial | Exact | Strict |
---|---|---|---|---|
Correct | 2 | 2 | 2 | 1 |
Incorrect | 2 | 0 | 2 | 3 |
Partial | 0 | 2 | 0 | 0 |
Missed | 1 | 1 | 1 | 1 |
Spurius | 1 | 1 | 1 | 1 |
Precision | 0.4 | 0.6 | 0.4 | 0.2 |
Recall | 0.4 | 0.6 | 0.4 | 0.2 |
F1 score | 0.4 | 0.6 | 0.4 | 0.2 |
Pypi library eval4ner installation: pip install -U eval4ner
For attribution in academic contexts, please cite this work as:1
2
3
4
5
6@misc{chai2021NER-eval,
author = {Chai, Yekun},
title = {{Evaluation Metrics of Named Entity Recognition}},
year = {2021},
howpublished = {\url{https://cyk1337.github.io/notes/2018/11/21/NLP/NER/NER-Evaluation-Metrics/}},
}