Escrito por estudiantes que aprobaron Inmediatamente disponible después del pago Leer en línea o como PDF ¿Documento equivocado? Cámbialo gratis 4,6 TrustPilot
logo-home
Resumen

Summary english test essay

Puntuación
-
Vendido
-
Páginas
8
Subido en
27-04-2025
Escrito en
2024/2025

The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally.

Mostrar más Leer menos
Institución
Freshman / 9th Grade
Grado
English language and composition

Vista previa del contenido

The Edinburgh International Accents of English Corpus:
Towards the Democratization of English ASR

Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, Ondrej Klejch, Peter Bell

arXiv (arXiv: 2303.18110v1)

Generated on April 27, 2025

, The Edinburgh International Accents of English Corpus:
Towards the Democratization of English ASR


Abstract
English is the most widely spoken language in the world, used daily by millions of people as a first or
second language in many different contexts. As a result, there are many varieties of English. Although
the great many advances in English automatic speech recognition (ASR) over the past decades, results
are usually reported based on test datasets which fail to represent the diversity of English as spoken
today around the globe. We present the first release of The Edinburgh International Accents of English
Corpus (EdAcc). This dataset attempts to better represent the wide diversity of English, encompassing
almost 40 hours of dyadic video call conversations between friends. Unlike other datasets, EdAcc
includes a wide range of first and second-language varieties of English and a linguistic background
profile of each speaker. Results on latest public, and commercial models show that EdAcc highlights
shortcomings of current English ASR models. The best performing model, trained on 680 thousand
hours of transcribed data, obtains an average of 19.7% word error rate (WER) -- in contrast to the 2.7%
WER obtained when evaluated on US English clean read speech. Across all models, we observe a
drop in performance on Indian, Jamaican, and Nigerian English speakers. Recordings, linguistic
backgrounds, data statement, and evaluation scripts are released on our website
(https://groups.inf.ed.ac.uk/edacc/) under CC-BY-SA license.

THE EDINBURGH INTERNATIONAL ACCENTS OF ENGLISH CORPUS: TOWARDS THE
DEMOCRATIZATION OF ENGLISH ASR Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea
Carmantini, Ondrej Klejch, Peter Bell School of Informatics, The University of Edinburgh ABSTRACT
English is the most widely spoken language in the world, used daily by millions of people as a ■rst or
second language in many different contexts. As a result, there are many varieties of English. Although
the great many advances in English automatic speech recognition (ASR) over the past decades, results
are usually reported based on test datasets which fail to represent the diversity of English as spo- ken
today around the globe. We present the ■rst release of The Edinburgh International Accents of English
Corpus ( EdAcc ). This dataset attempts to better represent the wide diversity of English, encompassing
almost 40 hours of dyadic video call conversations between friends. Unlike other datasets, EdAcc
includes a wide range of ■rst and second-language varieties of English and a lin- guistic background
pro■le of each speaker. Results on latest public, and commercial models show that EdAcc highlights
shortcomings of current English ASR models. The best performing model, trained on 680 thousand
hours of transcribed data, obtains an average of 19.7% word error rate (WER) – in contrast to the 2.7%
WER ob- tained when evaluated on US English clean read speech. Across all models, we observe a
drop in performance on Indian, Jamaican, and Nigerian English speakers. Recordings, linguistic
backgrounds, data statement, and evaluation scripts are released on our website under
CC-BY-SA1license.2We hope that this work will encourage future research on a wider range of English
varieties to create more acces- sible speech technologies. Index Terms : conversational speech, bias in
speech recognition, En- glish accents, automatic speech recognition 1. INTRODUCTION English is a
■rst language for more than 370 million people [1], hav- ing been spread through (settler) colonialism
over hundreds of years [2]. In recent decades, English has only gained power as a lingua franca in
global business, international politics, media and pop cul- ture, and academia. As a result, there are an
estimated 1 billion people who speak English as a second language and most of the state-of-the-art
language technology research caters to it. Even though language technologies work better for English
than for other languages, there are still vast performance differences be- tween English varieties, with
higher performance for US and UK [3, 4, 5]. There are hundreds of varieties of English spoken by peo-
ple in different geographical areas and social contexts [6]. Most of these are poorly supported by

Escuela, estudio y materia

Institución
Freshman / 9th grade
Grado
English language and composition
Año escolar
1

Información del documento

Subido en
27 de abril de 2025
Número de páginas
8
Escrito en
2024/2025
Tipo
RESUMEN

Temas

$8.49
Accede al documento completo:

¿Documento equivocado? Cámbialo gratis Dentro de los 14 días posteriores a la compra y antes de descargarlo, puedes elegir otro documento. Puedes gastar el importe de nuevo.
Escrito por estudiantes que aprobaron
Inmediatamente disponible después del pago
Leer en línea o como PDF

Conoce al vendedor
Seller avatar
cleoellis

Conoce al vendedor

Seller avatar
cleoellis University of the People
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
-
Miembro desde
10 meses
Número de seguidores
0
Documentos
11
Última venta
-
Essay, Notes, Test, Quizzes

0.0

0 reseñas

5
0
4
0
3
0
2
0
1
0

Documentos populares

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes