100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary english

Rating
-
Sold
-
Pages
11
Uploaded on
27-04-2025
Written in
2024/2025

The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally.

Show more Read less
Institution
Freshman / 9th Grade
Course
English language and composition









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Freshman / 9th grade
Course
English language and composition
School year
1

Document information

Uploaded on
April 27, 2025
Number of pages
11
Written in
2024/2025
Type
Summary

Content preview

Malaysian English News Decoded: A Linguistic Resource
for Named Entity and Relation Extraction

Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam

arXiv (arXiv: 2402.14521v1)

Generated on April 27, 2025

, Malaysian English News Decoded: A Linguistic Resource
for Named Entity and Relation Extraction


Abstract
Standard English and Malaysian English exhibit notable differences, posing challenges for natural
language processing (NLP) tasks on Malaysian English. Unfortunately, most of the existing datasets
are mainly based on standard English and therefore inadequate for improving NLP tasks in Malaysian
English. An experiment using state-of-the-art Named Entity Recognition (NER) solutions on Malaysian
English news articles highlights that they cannot handle morphosyntactic variations in Malaysian
English. To the best of our knowledge, there is no annotated dataset available to improvise the model.
To address these issues, we constructed a Malaysian English News (MEN) dataset, which contains
200 news articles that are manually annotated with entities and relations. We then fine-tuned the spaCy
NER tool and validated that having a dataset tailor-made for Malaysian English could improve the
performance of NER in Malaysian English significantly. This paper presents our effort in the data
acquisition, annotation methodology, and thorough analysis of the annotated dataset. To validate the
quality of the annotation, inter-annotator agreement was used, followed by adjudication of
disagreements by a subject matter expert. Upon completion of these tasks, we managed to develop a
dataset with 6,061 entities and 3,268 relation instances. Finally, we discuss on spaCy fine-tuning setup
and analysis on the NER performance. This unique dataset will contribute significantly to the
advancement of NLP research in Malaysian English, allowing researchers to accelerate their progress,
particularly in NER and relation extraction. The dataset and annotation guideline has been published on
Github.

Malaysian English News Decoded: A Linguistic Resource for Named Entity and Relation Extraction
Mohan Raj1, Lay-Ki Soon1, Ong Huey Fang1, and Bhawani Selvaretnam2 1School of Information
Technology, Monash University Malaysia,2Valiantlytix Sdn Bhd 1Jalan Lagoon Selatan, 47500
Selangor, Malaysia, 2Lorong Utara C, Pjs 52, 46200 Petaling Jaya, Selangor 1{mohan.chanthran,
soon.layki, ong.hueyfang}@monash.edu, Abstract Standard English and
Malaysian English exhibit notable differences, posing challenges for natural language processing (NLP)
tasks on Malaysian English. Unfortunately, most of the existing datasets are mainly based on standard
English and therefore inadequate for improving NLP tasks in Malaysian English. An experiment using
state-of-the-art Named Entity Recognition (NER) solutions on Malaysian English news articles
highlights that they cannot handle morphosyntactic variations in Malaysian English. To the best of our
knowledge, there is no annotated dataset available to improvise the model. To address these issues,
we constructed a Malaysian English News (MEN) dataset, which contains 200 news articles that are
manually annotated with entities and relations. We then fine-tuned the spaCy NER tool and validated
that having a dataset tailor-made for Malaysian English could improve the performance of NER in
Malaysian English significantly. This paper presents our effort in the data acquisition, annotation
methodology, and thorough analysis of the annotated dataset. To validate the quality of the annotation,
inter-annotator agreement was used, followed by adjudication of disagreements by a subject matter
expert. Upon completion of these tasks, we managed to develop a dataset with 6,061 entities and
3,268 relation instances. Finally, we discuss on spaCy fine-tuning setup and analysis on the NER
performance. This unique dataset will contribute significantly to the advancement of NLP research in
Malaysian English, allowing researchers to accelerate their progress, particularly in NER and relation
extraction. The dataset and annotation guideline has been published on Github. Keywords: Annotated
Dataset, Malaysian English, Named Entity Recognition, Relation Extraction, Low- Resource Language
1. Introduction 1.1. Overview Relation Extraction (RE) is a natural language pro- cessing (NLP) task
that involves identifying rela- tions between a pair of entities mentioned in a text. This task requires
$6.99
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
cleoellis

Get to know the seller

Seller avatar
cleoellis University of the People
View profile
Follow You need to be logged in order to follow users or courses
Sold
0
Member since
8 months
Number of followers
0
Documents
11
Last sold
-
Essay, Notes, Test, Quizzes

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions