Resumen

Summary english essay

Puntuación

Vendido

Páginas

Subido en

27-04-2025

Escrito en

2024/2025

The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally.

Mostrar más Leer menos

Institución

Freshman / 9th Grade

Grado

English language and composition

Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: Freshman / 9th grade
Grado: English language and composition
Año escolar: 1

Información del documento

Subido en: 27 de abril de 2025
Número de páginas: 12
Escrito en: 2024/2025
Tipo: Resumen

Temas

englis
english
essay
work

Vista previa del contenido

Do "English" Named Entity Recognizers Work Well on
Global Englishes?

Alexander Shan, John Bauer, Riley Carlson, Christopher Manning

arXiv (arXiv: 2404.13465v1)

Generated on April 27, 2025

, Do "English" Named Entity Recognizers Work Well on
Global Englishes?

Abstract
The vast majority of the popular English named entity recognition (NER) datasets contain American or
British English data, despite the existence of many global varieties of English. As such, it is unclear
whether they generalize for analyzing use of English globally. To test this, we build a newswire dataset,
the Worldwide English NER Dataset, to analyze NER model performance on low-resource English
variants from around the world. We test widely used NER toolkits and transformer models, including
models using the pre-trained contextual models RoBERTa and ELECTRA, on three datasets: a
commonly used British English newswire dataset, CoNLL 2003, a more American focused dataset
OntoNotes, and our global dataset. All models trained on the CoNLL or OntoNotes datasets
experienced significant performance drops-over 10 F1 in some cases-when tested on the Worldwide
English dataset. Upon examination of region-specific errors, we observe the greatest performance
drops for Oceania and Africa, while Asia and the Middle East had comparatively strong performance.
Lastly, we find that a combined model trained on the Worldwide dataset and either CoNLL or
OntoNotes lost only 1-2 F1 on both test sets.

Do “English” Named Entity Recognizers work well on Global Englishes? Alexander Shan ,John Bauer
,Riley Carlson andChristopher D. Manning Department of Computer Science Stanford University
Stanford, CA 94305-9030, U.S.A. {azshan, horatio, rileydc, manning}@stanford.edu Abstract The vast
majority of the popular English named entity recognition (NER) datasets con- tain American or British
English data, despite the existence of many global varieties of En- glish. As such, it is unclear whether
they gen- eralize for analyzing use of English globally. To test this, we build a newswire dataset, the
Worldwide English NER Dataset, to analyze NER model performance on “low-resource” English
variants from around the world. We test widely used NER toolkits and transformer models, including
RoBERTa and ELECTRA, on three datasets: a commonly used British English newswire dataset,
CoNLL 2003, a more American-focused dataset, OntoNotes, and our global dataset. All models trained
on the CoNLL or OntoNotes datasets experienced significant performance drops—over 10% F1 in
some cases—when tested on the Worldwide English dataset. Upon examination of region- specific
errors, we observe the greatest perfor- mance drops for Oceania and Africa, while Asia and the Middle
East had comparatively strong performance. Lastly, we find that a com- bined model trained on the
Worldwide dataset and either CoNLL or OntoNotes lost only 1–2% F1 on both test sets. 1 Introduction
Most of English Named Entity Recognition (NER) uses American or British English data, with less at-
tention paid to low-resource English contexts. Mul- tiple problems may occur in low-resource NER
settings; for example, named entities with region- specific meanings can be confused for common
words. Indeed, the Japanese Diet is a governmental body, but NER models focused on US and British
English may incorrectly interpret this entity as a medical term. Among many NER datasets released in
recent years,1the most widely used datasets are CoNLL 1A collection of NER references is available at
https: //github.com/juand-r/entity-recognition-datasets2003 (Tjong Kim Sang and De Meulder, 2003)
and OntoNotes (Weischedel et al., 2013), which focus on British and American English, with significant
European Parliament coverage. Other recently cre- ated NER datasets study the medical domain, such
as the n2c2 challenges (Henry et al., 2019), histor- ical English (Ehrmann et al., 2022), or music rec-
ommendation terminology (Epure and Hennequin, 2023), still using American and British English. The
lack of regional variety in these datasets sug- gests that models trained on these datasets might not
accurately recognize entities from more global contexts. Furthermore, the lack of test data for other
regions makes it difficult to even measure this phenomenon. In this work, we evaluate the performance
of a variety of NER tools, including Flair and SpaCy on this dataset. We then retrain two commonly

$6.99

Accede al documento completo:

100% de satisfacción garantizada

Inmediatamente disponible después del pago

Tanto en línea como en PDF

No estas atado a nada

Conoce al vendedor

cleoellis

Conoce al vendedor

cleoellis University of the People

Ver perfil

Seguir

Vendido

Miembro desde

8 meses

Número de seguidores

Documentos

Última venta

Essay, Notes, Test, Quizzes

0.0

0 reseñas

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller cleoellis. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $6.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 16 years now