Escrito por estudiantes que aprobaron Inmediatamente disponible después del pago Leer en línea o como PDF ¿Documento equivocado? Cámbialo gratis 4,6 TrustPilot
logo-home
Resumen

Summary english

Puntuación
-
Vendido
-
Páginas
11
Subido en
27-04-2025
Escrito en
2024/2025

The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally.

Mostrar más Leer menos
Institución
Freshman / 9th Grade
Grado
English language and composition

Vista previa del contenido

Malaysian English News Decoded: A Linguistic Resource
for Named Entity and Relation Extraction

Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam

arXiv (arXiv: 2402.14521v1)

Generated on April 27, 2025

, Malaysian English News Decoded: A Linguistic Resource
for Named Entity and Relation Extraction


Abstract
Standard English and Malaysian English exhibit notable differences, posing challenges for natural
language processing (NLP) tasks on Malaysian English. Unfortunately, most of the existing datasets
are mainly based on standard English and therefore inadequate for improving NLP tasks in Malaysian
English. An experiment using state-of-the-art Named Entity Recognition (NER) solutions on Malaysian
English news articles highlights that they cannot handle morphosyntactic variations in Malaysian
English. To the best of our knowledge, there is no annotated dataset available to improvise the model.
To address these issues, we constructed a Malaysian English News (MEN) dataset, which contains
200 news articles that are manually annotated with entities and relations. We then fine-tuned the spaCy
NER tool and validated that having a dataset tailor-made for Malaysian English could improve the
performance of NER in Malaysian English significantly. This paper presents our effort in the data
acquisition, annotation methodology, and thorough analysis of the annotated dataset. To validate the
quality of the annotation, inter-annotator agreement was used, followed by adjudication of
disagreements by a subject matter expert. Upon completion of these tasks, we managed to develop a
dataset with 6,061 entities and 3,268 relation instances. Finally, we discuss on spaCy fine-tuning setup
and analysis on the NER performance. This unique dataset will contribute significantly to the
advancement of NLP research in Malaysian English, allowing researchers to accelerate their progress,
particularly in NER and relation extraction. The dataset and annotation guideline has been published on
Github.

Malaysian English News Decoded: A Linguistic Resource for Named Entity and Relation Extraction
Mohan Raj1, Lay-Ki Soon1, Ong Huey Fang1, and Bhawani Selvaretnam2 1School of Information
Technology, Monash University Malaysia,2Valiantlytix Sdn Bhd 1Jalan Lagoon Selatan, 47500
Selangor, Malaysia, 2Lorong Utara C, Pjs 52, 46200 Petaling Jaya, Selangor 1{mohan.chanthran,
soon.layki, ong.hueyfang}@monash.edu, Abstract Standard English and
Malaysian English exhibit notable differences, posing challenges for natural language processing (NLP)
tasks on Malaysian English. Unfortunately, most of the existing datasets are mainly based on standard
English and therefore inadequate for improving NLP tasks in Malaysian English. An experiment using
state-of-the-art Named Entity Recognition (NER) solutions on Malaysian English news articles
highlights that they cannot handle morphosyntactic variations in Malaysian English. To the best of our
knowledge, there is no annotated dataset available to improvise the model. To address these issues,
we constructed a Malaysian English News (MEN) dataset, which contains 200 news articles that are
manually annotated with entities and relations. We then fine-tuned the spaCy NER tool and validated
that having a dataset tailor-made for Malaysian English could improve the performance of NER in
Malaysian English significantly. This paper presents our effort in the data acquisition, annotation
methodology, and thorough analysis of the annotated dataset. To validate the quality of the annotation,
inter-annotator agreement was used, followed by adjudication of disagreements by a subject matter
expert. Upon completion of these tasks, we managed to develop a dataset with 6,061 entities and
3,268 relation instances. Finally, we discuss on spaCy fine-tuning setup and analysis on the NER
performance. This unique dataset will contribute significantly to the advancement of NLP research in
Malaysian English, allowing researchers to accelerate their progress, particularly in NER and relation
extraction. The dataset and annotation guideline has been published on Github. Keywords: Annotated
Dataset, Malaysian English, Named Entity Recognition, Relation Extraction, Low- Resource Language
1. Introduction 1.1. Overview Relation Extraction (RE) is a natural language pro- cessing (NLP) task
that involves identifying rela- tions between a pair of entities mentioned in a text. This task requires

Escuela, estudio y materia

Institución
Freshman / 9th grade
Grado
English language and composition
Año escolar
1

Información del documento

Subido en
27 de abril de 2025
Número de páginas
11
Escrito en
2024/2025
Tipo
RESUMEN

Temas

$6.99
Accede al documento completo:

¿Documento equivocado? Cámbialo gratis Dentro de los 14 días posteriores a la compra y antes de descargarlo, puedes elegir otro documento. Puedes gastar el importe de nuevo.
Escrito por estudiantes que aprobaron
Inmediatamente disponible después del pago
Leer en línea o como PDF

Conoce al vendedor
Seller avatar
cleoellis

Conoce al vendedor

Seller avatar
cleoellis University of the People
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
-
Miembro desde
10 meses
Número de seguidores
0
Documentos
11
Última venta
-
Essay, Notes, Test, Quizzes

0.0

0 reseñas

5
0
4
0
3
0
2
0
1
0

Documentos populares

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes