100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary english test essay

Rating
-
Sold
-
Pages
8
Uploaded on
27-04-2025
Written in
2024/2025

The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally.

Show more Read less
Institution
Freshman / 9th Grade
Course
English language and composition









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Freshman / 9th grade
Course
English language and composition
School year
1

Document information

Uploaded on
April 27, 2025
Number of pages
8
Written in
2024/2025
Type
Summary

Content preview

The Edinburgh International Accents of English Corpus:
Towards the Democratization of English ASR

Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, Ondrej Klejch, Peter Bell

arXiv (arXiv: 2303.18110v1)

Generated on April 27, 2025

, The Edinburgh International Accents of English Corpus:
Towards the Democratization of English ASR


Abstract
English is the most widely spoken language in the world, used daily by millions of people as a first or
second language in many different contexts. As a result, there are many varieties of English. Although
the great many advances in English automatic speech recognition (ASR) over the past decades, results
are usually reported based on test datasets which fail to represent the diversity of English as spoken
today around the globe. We present the first release of The Edinburgh International Accents of English
Corpus (EdAcc). This dataset attempts to better represent the wide diversity of English, encompassing
almost 40 hours of dyadic video call conversations between friends. Unlike other datasets, EdAcc
includes a wide range of first and second-language varieties of English and a linguistic background
profile of each speaker. Results on latest public, and commercial models show that EdAcc highlights
shortcomings of current English ASR models. The best performing model, trained on 680 thousand
hours of transcribed data, obtains an average of 19.7% word error rate (WER) -- in contrast to the 2.7%
WER obtained when evaluated on US English clean read speech. Across all models, we observe a
drop in performance on Indian, Jamaican, and Nigerian English speakers. Recordings, linguistic
backgrounds, data statement, and evaluation scripts are released on our website
(https://groups.inf.ed.ac.uk/edacc/) under CC-BY-SA license.

THE EDINBURGH INTERNATIONAL ACCENTS OF ENGLISH CORPUS: TOWARDS THE
DEMOCRATIZATION OF ENGLISH ASR Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea
Carmantini, Ondrej Klejch, Peter Bell School of Informatics, The University of Edinburgh ABSTRACT
English is the most widely spoken language in the world, used daily by millions of people as a ■rst or
second language in many different contexts. As a result, there are many varieties of English. Although
the great many advances in English automatic speech recognition (ASR) over the past decades, results
are usually reported based on test datasets which fail to represent the diversity of English as spo- ken
today around the globe. We present the ■rst release of The Edinburgh International Accents of English
Corpus ( EdAcc ). This dataset attempts to better represent the wide diversity of English, encompassing
almost 40 hours of dyadic video call conversations between friends. Unlike other datasets, EdAcc
includes a wide range of ■rst and second-language varieties of English and a lin- guistic background
pro■le of each speaker. Results on latest public, and commercial models show that EdAcc highlights
shortcomings of current English ASR models. The best performing model, trained on 680 thousand
hours of transcribed data, obtains an average of 19.7% word error rate (WER) – in contrast to the 2.7%
WER ob- tained when evaluated on US English clean read speech. Across all models, we observe a
drop in performance on Indian, Jamaican, and Nigerian English speakers. Recordings, linguistic
backgrounds, data statement, and evaluation scripts are released on our website under
CC-BY-SA1license.2We hope that this work will encourage future research on a wider range of English
varieties to create more acces- sible speech technologies. Index Terms : conversational speech, bias in
speech recognition, En- glish accents, automatic speech recognition 1. INTRODUCTION English is a
■rst language for more than 370 million people [1], hav- ing been spread through (settler) colonialism
over hundreds of years [2]. In recent decades, English has only gained power as a lingua franca in
global business, international politics, media and pop cul- ture, and academia. As a result, there are an
estimated 1 billion people who speak English as a second language and most of the state-of-the-art
language technology research caters to it. Even though language technologies work better for English
than for other languages, there are still vast performance differences be- tween English varieties, with
higher performance for US and UK [3, 4, 5]. There are hundreds of varieties of English spoken by peo-
ple in different geographical areas and social contexts [6]. Most of these are poorly supported by
$8.49
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
cleoellis

Get to know the seller

Seller avatar
cleoellis University of the People
View profile
Follow You need to be logged in order to follow users or courses
Sold
0
Member since
8 months
Number of followers
0
Documents
11
Last sold
-
Essay, Notes, Test, Quizzes

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions