The Edinburgh International Accents of English Corpus:
Towards the Democratization of English ASR
Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, Ondrej Klejch, Peter Bell
arXiv (arXiv: 2303.18110v1)
Generated on April 27, 2025
, The Edinburgh International Accents of English Corpus:
Towards the Democratization of English ASR
Abstract
English is the most widely spoken language in the world, used daily by millions of people as a first or
second language in many different contexts. As a result, there are many varieties of English. Although
the great many advances in English automatic speech recognition (ASR) over the past decades, results
are usually reported based on test datasets which fail to represent the diversity of English as spoken
today around the globe. We present the first release of The Edinburgh International Accents of English
Corpus (EdAcc). This dataset attempts to better represent the wide diversity of English, encompassing
almost 40 hours of dyadic video call conversations between friends. Unlike other datasets, EdAcc
includes a wide range of first and second-language varieties of English and a linguistic background
profile of each speaker. Results on latest public, and commercial models show that EdAcc highlights
shortcomings of current English ASR models. The best performing model, trained on 680 thousand
hours of transcribed data, obtains an average of 19.7% word error rate (WER) -- in contrast to the 2.7%
WER obtained when evaluated on US English clean read speech. Across all models, we observe a
drop in performance on Indian, Jamaican, and Nigerian English speakers. Recordings, linguistic
backgrounds, data statement, and evaluation scripts are released on our website
(https://groups.inf.ed.ac.uk/edacc/) under CC-BY-SA license.
THE EDINBURGH INTERNATIONAL ACCENTS OF ENGLISH CORPUS: TOWARDS THE
DEMOCRATIZATION OF ENGLISH ASR Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea
Carmantini, Ondrej Klejch, Peter Bell School of Informatics, The University of Edinburgh ABSTRACT
English is the most widely spoken language in the world, used daily by millions of people as a ■rst or
second language in many different contexts. As a result, there are many varieties of English. Although
the great many advances in English automatic speech recognition (ASR) over the past decades, results
are usually reported based on test datasets which fail to represent the diversity of English as spo- ken
today around the globe. We present the ■rst release of The Edinburgh International Accents of English
Corpus ( EdAcc ). This dataset attempts to better represent the wide diversity of English, encompassing
almost 40 hours of dyadic video call conversations between friends. Unlike other datasets, EdAcc
includes a wide range of ■rst and second-language varieties of English and a lin- guistic background
pro■le of each speaker. Results on latest public, and commercial models show that EdAcc highlights
shortcomings of current English ASR models. The best performing model, trained on 680 thousand
hours of transcribed data, obtains an average of 19.7% word error rate (WER) – in contrast to the 2.7%
WER ob- tained when evaluated on US English clean read speech. Across all models, we observe a
drop in performance on Indian, Jamaican, and Nigerian English speakers. Recordings, linguistic
backgrounds, data statement, and evaluation scripts are released on our website under
CC-BY-SA1license.2We hope that this work will encourage future research on a wider range of English
varieties to create more acces- sible speech technologies. Index Terms : conversational speech, bias in
speech recognition, En- glish accents, automatic speech recognition 1. INTRODUCTION English is a
■rst language for more than 370 million people [1], hav- ing been spread through (settler) colonialism
over hundreds of years [2]. In recent decades, English has only gained power as a lingua franca in
global business, international politics, media and pop cul- ture, and academia. As a result, there are an
estimated 1 billion people who speak English as a second language and most of the state-of-the-art
language technology research caters to it. Even though language technologies work better for English
than for other languages, there are still vast performance differences be- tween English varieties, with
higher performance for US and UK [3, 4, 5]. There are hundreds of varieties of English spoken by peo-
ple in different geographical areas and social contexts [6]. Most of these are poorly supported by
Towards the Democratization of English ASR
Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, Ondrej Klejch, Peter Bell
arXiv (arXiv: 2303.18110v1)
Generated on April 27, 2025
, The Edinburgh International Accents of English Corpus:
Towards the Democratization of English ASR
Abstract
English is the most widely spoken language in the world, used daily by millions of people as a first or
second language in many different contexts. As a result, there are many varieties of English. Although
the great many advances in English automatic speech recognition (ASR) over the past decades, results
are usually reported based on test datasets which fail to represent the diversity of English as spoken
today around the globe. We present the first release of The Edinburgh International Accents of English
Corpus (EdAcc). This dataset attempts to better represent the wide diversity of English, encompassing
almost 40 hours of dyadic video call conversations between friends. Unlike other datasets, EdAcc
includes a wide range of first and second-language varieties of English and a linguistic background
profile of each speaker. Results on latest public, and commercial models show that EdAcc highlights
shortcomings of current English ASR models. The best performing model, trained on 680 thousand
hours of transcribed data, obtains an average of 19.7% word error rate (WER) -- in contrast to the 2.7%
WER obtained when evaluated on US English clean read speech. Across all models, we observe a
drop in performance on Indian, Jamaican, and Nigerian English speakers. Recordings, linguistic
backgrounds, data statement, and evaluation scripts are released on our website
(https://groups.inf.ed.ac.uk/edacc/) under CC-BY-SA license.
THE EDINBURGH INTERNATIONAL ACCENTS OF ENGLISH CORPUS: TOWARDS THE
DEMOCRATIZATION OF ENGLISH ASR Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea
Carmantini, Ondrej Klejch, Peter Bell School of Informatics, The University of Edinburgh ABSTRACT
English is the most widely spoken language in the world, used daily by millions of people as a ■rst or
second language in many different contexts. As a result, there are many varieties of English. Although
the great many advances in English automatic speech recognition (ASR) over the past decades, results
are usually reported based on test datasets which fail to represent the diversity of English as spo- ken
today around the globe. We present the ■rst release of The Edinburgh International Accents of English
Corpus ( EdAcc ). This dataset attempts to better represent the wide diversity of English, encompassing
almost 40 hours of dyadic video call conversations between friends. Unlike other datasets, EdAcc
includes a wide range of ■rst and second-language varieties of English and a lin- guistic background
pro■le of each speaker. Results on latest public, and commercial models show that EdAcc highlights
shortcomings of current English ASR models. The best performing model, trained on 680 thousand
hours of transcribed data, obtains an average of 19.7% word error rate (WER) – in contrast to the 2.7%
WER ob- tained when evaluated on US English clean read speech. Across all models, we observe a
drop in performance on Indian, Jamaican, and Nigerian English speakers. Recordings, linguistic
backgrounds, data statement, and evaluation scripts are released on our website under
CC-BY-SA1license.2We hope that this work will encourage future research on a wider range of English
varieties to create more acces- sible speech technologies. Index Terms : conversational speech, bias in
speech recognition, En- glish accents, automatic speech recognition 1. INTRODUCTION English is a
■rst language for more than 370 million people [1], hav- ing been spread through (settler) colonialism
over hundreds of years [2]. In recent decades, English has only gained power as a lingua franca in
global business, international politics, media and pop cul- ture, and academia. As a result, there are an
estimated 1 billion people who speak English as a second language and most of the state-of-the-art
language technology research caters to it. Even though language technologies work better for English
than for other languages, there are still vast performance differences be- tween English varieties, with
higher performance for US and UK [3, 4, 5]. There are hundreds of varieties of English spoken by peo-
ple in different geographical areas and social contexts [6]. Most of these are poorly supported by