100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4.2 TrustPilot
logo-home
Resumen

Summary english essay

Puntuación
-
Vendido
-
Páginas
12
Subido en
27-04-2025
Escrito en
2024/2025

The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally.

Mostrar más Leer menos
Institución
Freshman / 9th Grade
Grado
English language and composition









Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Escuela, estudio y materia

Institución
Freshman / 9th grade
Grado
English language and composition
Año escolar
1

Información del documento

Subido en
27 de abril de 2025
Número de páginas
12
Escrito en
2024/2025
Tipo
Resumen

Temas

Vista previa del contenido

Do "English" Named Entity Recognizers Work Well on
Global Englishes?

Alexander Shan, John Bauer, Riley Carlson, Christopher Manning

arXiv (arXiv: 2404.13465v1)

Generated on April 27, 2025

, Do "English" Named Entity Recognizers Work Well on
Global Englishes?


Abstract
The vast majority of the popular English named entity recognition (NER) datasets contain American or
British English data, despite the existence of many global varieties of English. As such, it is unclear
whether they generalize for analyzing use of English globally. To test this, we build a newswire dataset,
the Worldwide English NER Dataset, to analyze NER model performance on low-resource English
variants from around the world. We test widely used NER toolkits and transformer models, including
models using the pre-trained contextual models RoBERTa and ELECTRA, on three datasets: a
commonly used British English newswire dataset, CoNLL 2003, a more American focused dataset
OntoNotes, and our global dataset. All models trained on the CoNLL or OntoNotes datasets
experienced significant performance drops-over 10 F1 in some cases-when tested on the Worldwide
English dataset. Upon examination of region-specific errors, we observe the greatest performance
drops for Oceania and Africa, while Asia and the Middle East had comparatively strong performance.
Lastly, we find that a combined model trained on the Worldwide dataset and either CoNLL or
OntoNotes lost only 1-2 F1 on both test sets.

Do “English” Named Entity Recognizers work well on Global Englishes? Alexander Shan ,John Bauer
,Riley Carlson andChristopher D. Manning Department of Computer Science Stanford University
Stanford, CA 94305-9030, U.S.A. {azshan, horatio, rileydc, manning}@stanford.edu Abstract The vast
majority of the popular English named entity recognition (NER) datasets con- tain American or British
English data, despite the existence of many global varieties of En- glish. As such, it is unclear whether
they gen- eralize for analyzing use of English globally. To test this, we build a newswire dataset, the
Worldwide English NER Dataset, to analyze NER model performance on “low-resource” English
variants from around the world. We test widely used NER toolkits and transformer models, including
RoBERTa and ELECTRA, on three datasets: a commonly used British English newswire dataset,
CoNLL 2003, a more American-focused dataset, OntoNotes, and our global dataset. All models trained
on the CoNLL or OntoNotes datasets experienced significant performance drops—over 10% F1 in
some cases—when tested on the Worldwide English dataset. Upon examination of region- specific
errors, we observe the greatest perfor- mance drops for Oceania and Africa, while Asia and the Middle
East had comparatively strong performance. Lastly, we find that a com- bined model trained on the
Worldwide dataset and either CoNLL or OntoNotes lost only 1–2% F1 on both test sets. 1 Introduction
Most of English Named Entity Recognition (NER) uses American or British English data, with less at-
tention paid to low-resource English contexts. Mul- tiple problems may occur in low-resource NER
settings; for example, named entities with region- specific meanings can be confused for common
words. Indeed, the Japanese Diet is a governmental body, but NER models focused on US and British
English may incorrectly interpret this entity as a medical term. Among many NER datasets released in
recent years,1the most widely used datasets are CoNLL 1A collection of NER references is available at
https: //github.com/juand-r/entity-recognition-datasets2003 (Tjong Kim Sang and De Meulder, 2003)
and OntoNotes (Weischedel et al., 2013), which focus on British and American English, with significant
European Parliament coverage. Other recently cre- ated NER datasets study the medical domain, such
as the n2c2 challenges (Henry et al., 2019), histor- ical English (Ehrmann et al., 2022), or music rec-
ommendation terminology (Epure and Hennequin, 2023), still using American and British English. The
lack of regional variety in these datasets sug- gests that models trained on these datasets might not
accurately recognize entities from more global contexts. Furthermore, the lack of test data for other
regions makes it difficult to even measure this phenomenon. In this work, we evaluate the performance
of a variety of NER tools, including Flair and SpaCy on this dataset. We then retrain two commonly
$6.99
Accede al documento completo:

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada

Conoce al vendedor
Seller avatar
cleoellis

Conoce al vendedor

Seller avatar
cleoellis University of the People
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
0
Miembro desde
8 meses
Número de seguidores
0
Documentos
11
Última venta
-
Essay, Notes, Test, Quizzes

0.0

0 reseñas

5
0
4
0
3
0
2
0
1
0

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes