Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien 4.2 TrustPilot
logo-home
Resume

Summary english essay

Note
-
Vendu
-
Pages
12
Publié le
27-04-2025
Écrit en
2024/2025

The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally.

Montrer plus Lire moins
Établissement
Freshman / 9th Grade
Cours
English language and composition









Oups ! Impossible de charger votre document. Réessayez ou contactez le support.

École, étude et sujet

Établissement
Freshman / 9th grade
Cours
English language and composition
Année scolaire
1

Infos sur le Document

Publié le
27 avril 2025
Nombre de pages
12
Écrit en
2024/2025
Type
Resume

Sujets

Aperçu du contenu

Do "English" Named Entity Recognizers Work Well on
Global Englishes?

Alexander Shan, John Bauer, Riley Carlson, Christopher Manning

arXiv (arXiv: 2404.13465v1)

Generated on April 27, 2025

, Do "English" Named Entity Recognizers Work Well on
Global Englishes?


Abstract
The vast majority of the popular English named entity recognition (NER) datasets contain American or
British English data, despite the existence of many global varieties of English. As such, it is unclear
whether they generalize for analyzing use of English globally. To test this, we build a newswire dataset,
the Worldwide English NER Dataset, to analyze NER model performance on low-resource English
variants from around the world. We test widely used NER toolkits and transformer models, including
models using the pre-trained contextual models RoBERTa and ELECTRA, on three datasets: a
commonly used British English newswire dataset, CoNLL 2003, a more American focused dataset
OntoNotes, and our global dataset. All models trained on the CoNLL or OntoNotes datasets
experienced significant performance drops-over 10 F1 in some cases-when tested on the Worldwide
English dataset. Upon examination of region-specific errors, we observe the greatest performance
drops for Oceania and Africa, while Asia and the Middle East had comparatively strong performance.
Lastly, we find that a combined model trained on the Worldwide dataset and either CoNLL or
OntoNotes lost only 1-2 F1 on both test sets.

Do “English” Named Entity Recognizers work well on Global Englishes? Alexander Shan ,John Bauer
,Riley Carlson andChristopher D. Manning Department of Computer Science Stanford University
Stanford, CA 94305-9030, U.S.A. {azshan, horatio, rileydc, manning}@stanford.edu Abstract The vast
majority of the popular English named entity recognition (NER) datasets con- tain American or British
English data, despite the existence of many global varieties of En- glish. As such, it is unclear whether
they gen- eralize for analyzing use of English globally. To test this, we build a newswire dataset, the
Worldwide English NER Dataset, to analyze NER model performance on “low-resource” English
variants from around the world. We test widely used NER toolkits and transformer models, including
RoBERTa and ELECTRA, on three datasets: a commonly used British English newswire dataset,
CoNLL 2003, a more American-focused dataset, OntoNotes, and our global dataset. All models trained
on the CoNLL or OntoNotes datasets experienced significant performance drops—over 10% F1 in
some cases—when tested on the Worldwide English dataset. Upon examination of region- specific
errors, we observe the greatest perfor- mance drops for Oceania and Africa, while Asia and the Middle
East had comparatively strong performance. Lastly, we find that a com- bined model trained on the
Worldwide dataset and either CoNLL or OntoNotes lost only 1–2% F1 on both test sets. 1 Introduction
Most of English Named Entity Recognition (NER) uses American or British English data, with less at-
tention paid to low-resource English contexts. Mul- tiple problems may occur in low-resource NER
settings; for example, named entities with region- specific meanings can be confused for common
words. Indeed, the Japanese Diet is a governmental body, but NER models focused on US and British
English may incorrectly interpret this entity as a medical term. Among many NER datasets released in
recent years,1the most widely used datasets are CoNLL 1A collection of NER references is available at
https: //github.com/juand-r/entity-recognition-datasets2003 (Tjong Kim Sang and De Meulder, 2003)
and OntoNotes (Weischedel et al., 2013), which focus on British and American English, with significant
European Parliament coverage. Other recently cre- ated NER datasets study the medical domain, such
as the n2c2 challenges (Henry et al., 2019), histor- ical English (Ehrmann et al., 2022), or music rec-
ommendation terminology (Epure and Hennequin, 2023), still using American and British English. The
lack of regional variety in these datasets sug- gests that models trained on these datasets might not
accurately recognize entities from more global contexts. Furthermore, the lack of test data for other
regions makes it difficult to even measure this phenomenon. In this work, we evaluate the performance
of a variety of NER tools, including Flair and SpaCy on this dataset. We then retrain two commonly
€6,19
Accéder à l'intégralité du document:

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Faites connaissance avec le vendeur
Seller avatar
cleoellis

Faites connaissance avec le vendeur

Seller avatar
cleoellis University of the People
S'abonner Vous devez être connecté afin de suivre les étudiants ou les cours
Vendu
0
Membre depuis
8 mois
Nombre de followers
0
Documents
11
Dernière vente
-
Essay, Notes, Test, Quizzes

0,0

0 revues

5
0
4
0
3
0
2
0
1
0

Récemment consulté par vous

Pourquoi les étudiants choisissent Stuvia

Créé par d'autres étudiants, vérifié par les avis

Une qualité sur laquelle compter : rédigé par des étudiants qui ont réussi et évalué par d'autres qui ont utilisé ce document.

Le document ne convient pas ? Choisis un autre document

Aucun souci ! Tu peux sélectionner directement un autre document qui correspond mieux à ce que tu cherches.

Paye comme tu veux, apprends aussitôt

Aucun abonnement, aucun engagement. Paye selon tes habitudes par carte de crédit et télécharge ton document PDF instantanément.

Student with book image

“Acheté, téléchargé et réussi. C'est aussi simple que ça.”

Alisha Student

Foire aux questions