Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien 4,6 TrustPilot
logo-home
Resume

Summary english

Note
-
Vendu
-
Pages
11
Publié le
27-04-2025
Écrit en
2024/2025

The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally.

Montrer plus Lire moins
Établissement
Freshman / 9th Grade
Cours
English language and composition









Oups ! Impossible de charger votre document. Réessayez ou contactez le support.

École, étude et sujet

Établissement
Freshman / 9th grade
Cours
English language and composition
Année scolaire
1

Infos sur le Document

Publié le
27 avril 2025
Nombre de pages
11
Écrit en
2024/2025
Type
Resume

Sujets

Aperçu du contenu

Malaysian English News Decoded: A Linguistic Resource
for Named Entity and Relation Extraction

Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam

arXiv (arXiv: 2402.14521v1)

Generated on April 27, 2025

, Malaysian English News Decoded: A Linguistic Resource
for Named Entity and Relation Extraction


Abstract
Standard English and Malaysian English exhibit notable differences, posing challenges for natural
language processing (NLP) tasks on Malaysian English. Unfortunately, most of the existing datasets
are mainly based on standard English and therefore inadequate for improving NLP tasks in Malaysian
English. An experiment using state-of-the-art Named Entity Recognition (NER) solutions on Malaysian
English news articles highlights that they cannot handle morphosyntactic variations in Malaysian
English. To the best of our knowledge, there is no annotated dataset available to improvise the model.
To address these issues, we constructed a Malaysian English News (MEN) dataset, which contains
200 news articles that are manually annotated with entities and relations. We then fine-tuned the spaCy
NER tool and validated that having a dataset tailor-made for Malaysian English could improve the
performance of NER in Malaysian English significantly. This paper presents our effort in the data
acquisition, annotation methodology, and thorough analysis of the annotated dataset. To validate the
quality of the annotation, inter-annotator agreement was used, followed by adjudication of
disagreements by a subject matter expert. Upon completion of these tasks, we managed to develop a
dataset with 6,061 entities and 3,268 relation instances. Finally, we discuss on spaCy fine-tuning setup
and analysis on the NER performance. This unique dataset will contribute significantly to the
advancement of NLP research in Malaysian English, allowing researchers to accelerate their progress,
particularly in NER and relation extraction. The dataset and annotation guideline has been published on
Github.

Malaysian English News Decoded: A Linguistic Resource for Named Entity and Relation Extraction
Mohan Raj1, Lay-Ki Soon1, Ong Huey Fang1, and Bhawani Selvaretnam2 1School of Information
Technology, Monash University Malaysia,2Valiantlytix Sdn Bhd 1Jalan Lagoon Selatan, 47500
Selangor, Malaysia, 2Lorong Utara C, Pjs 52, 46200 Petaling Jaya, Selangor 1{mohan.chanthran,
soon.layki, ong.hueyfang}@monash.edu, Abstract Standard English and
Malaysian English exhibit notable differences, posing challenges for natural language processing (NLP)
tasks on Malaysian English. Unfortunately, most of the existing datasets are mainly based on standard
English and therefore inadequate for improving NLP tasks in Malaysian English. An experiment using
state-of-the-art Named Entity Recognition (NER) solutions on Malaysian English news articles
highlights that they cannot handle morphosyntactic variations in Malaysian English. To the best of our
knowledge, there is no annotated dataset available to improvise the model. To address these issues,
we constructed a Malaysian English News (MEN) dataset, which contains 200 news articles that are
manually annotated with entities and relations. We then fine-tuned the spaCy NER tool and validated
that having a dataset tailor-made for Malaysian English could improve the performance of NER in
Malaysian English significantly. This paper presents our effort in the data acquisition, annotation
methodology, and thorough analysis of the annotated dataset. To validate the quality of the annotation,
inter-annotator agreement was used, followed by adjudication of disagreements by a subject matter
expert. Upon completion of these tasks, we managed to develop a dataset with 6,061 entities and
3,268 relation instances. Finally, we discuss on spaCy fine-tuning setup and analysis on the NER
performance. This unique dataset will contribute significantly to the advancement of NLP research in
Malaysian English, allowing researchers to accelerate their progress, particularly in NER and relation
extraction. The dataset and annotation guideline has been published on Github. Keywords: Annotated
Dataset, Malaysian English, Named Entity Recognition, Relation Extraction, Low- Resource Language
1. Introduction 1.1. Overview Relation Extraction (RE) is a natural language pro- cessing (NLP) task
that involves identifying rela- tions between a pair of entities mentioned in a text. This task requires
€6,20
Accéder à l'intégralité du document:

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Faites connaissance avec le vendeur
Seller avatar
cleoellis

Faites connaissance avec le vendeur

Seller avatar
cleoellis University of the People
S'abonner Vous devez être connecté afin de suivre les étudiants ou les cours
Vendu
0
Membre depuis
8 mois
Nombre de followers
0
Documents
11
Dernière vente
-
Essay, Notes, Test, Quizzes

0,0

0 revues

5
0
4
0
3
0
2
0
1
0

Récemment consulté par vous

Pourquoi les étudiants choisissent Stuvia

Créé par d'autres étudiants, vérifié par les avis

Une qualité sur laquelle compter : rédigé par des étudiants qui ont réussi et évalué par d'autres qui ont utilisé ce document.

Le document ne convient pas ? Choisis un autre document

Aucun souci ! Tu peux sélectionner directement un autre document qui correspond mieux à ce que tu cherches.

Paye comme tu veux, apprends aussitôt

Aucun abonnement, aucun engagement. Paye selon tes habitudes par carte de crédit et télécharge ton document PDF instantanément.

Student with book image

“Acheté, téléchargé et réussi. C'est aussi simple que ça.”

Alisha Student

Foire aux questions