Findings of the LoResMT 2021 Shared Task on COVID and
Sign Language for Low-resource Languages
Atul Kr. Ojha, Chao-Hong Liu, Katharina Kann, John Ortega, Sheetal Shatam, Theodorus Fransen
arXiv (arXiv: 2108.06598v2)
Generated on April 27, 2025
, Findings of the LoResMT 2021 Shared Task on COVID and
Sign Language for Low-resource Languages
Abstract
We present the findings of the LoResMT 2021 shared task which focuses on machine translation (MT)
of COVID-19 data for both low-resource spoken and sign languages. The organization of this task was
conducted as part of the fourth workshop on technologies for machine translation of low resource
languages (LoResMT). Parallel corpora is presented and publicly available which includes the following
directions: English$\leftrightarrow$Irish, English$\leftrightarrow$Marathi, and Taiwanese Sign
language$\leftrightarrow$Traditional Chinese. Training data consists of 8112, 20933 and 128608
segments, respectively. There are additional monolingual data sets for Marathi and English that consist
of 21901 segments. The results presented here are based on entries from a total of eight teams. Three
teams submitted systems for English$\leftrightarrow$Irish while five teams submitted systems for
English$\leftrightarrow$Marathi. Unfortunately, there were no systems submissions for the Taiwanese
Sign language$\leftrightarrow$Traditional Chinese task. Maximum system performance was computed
using BLEU and follow as 36.0 for English--Irish, 34.6 for Irish--English, 24.2 for English--Marathi, and
31.3 for Marathi--English.
Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource
Languages Atul Kr. Ojha1, Chao-Hong
Katharina John Sheetal
Theodorus 1Data
Science Institute, NUIG, Galway 2Panlingua Language Processing LLP, New Delhi 3Potamu Research
Ltd 4University of Colorado at Boulder 5New York University Abstract We present the ■ndings of the
LoResMT 2021 shared task which focuses on machine trans- lation (MT) of COVID-19 data for both
low-resource spoken and sign languages. The orga- nization of this task was conducted as part of the
fourth workshop on technologies for ma- chine translation of low resource languages (LoResMT).
Parallel corpora is presented and pub- licly available which includes the following directions: English
$Irish, English $Marathi, and Taiwanese Sign language $Traditional Chinese. Training data consists of
8112, 20933 and 128608 segments, respectively. There are additional monolingual data sets for
Marathi and English that consist of 21901 segments. The results presented here are based on entries
from a total of eight teams. Three teams submitted systems for English $Irish while ■ve teams sub-
mitted systems for English $Marathi. Unfortunately, there were no systems submissions for the
Taiwanese Sign language $Traditional Chinese task. Maximum system performance was computed
using BLEU and follow as 36.0 for English–Irish, 34.6 for Irish–English, 24.2 for English–Marathi, and
31.3 for Marathi–English. 1 Introduction The workshop on technologies for machine translation of low
resource languages (LoResMT)1 is a yearly workshop which focuses on scienti■c research topics and
technological resources for machine translation (MT) using low-resource languages. Based on the
success of its three predecessors (Liu, 2018; Karakanta et al., 2019, 2020), the fourth LoResMT
workshop into- duces a shared task section based on COVID-19 and sign language data as part of its
research objectives. The hope is to provide assistance with translation for low-resource languages
where it could be needed most during the COVID-19 pandemic.
1https://sites.google.com/view/loresmt/arXiv:2108.06598v2 [cs.CL] 18 Aug 2021
To provide a trajectory of the LoResMT shared task success, a summary of the previous tasks follows.
The ■rst LoResMT shared task (Karakanta et al., 2019) took place in 2019. There, monolingual and
parallel corpora for Bhojpuri, Magahi, Sindhi, and Latvian were provided as training data for two types
of machine translation systems: neural and statistical. As an extension to the ■rst shared task, a
Sign Language for Low-resource Languages
Atul Kr. Ojha, Chao-Hong Liu, Katharina Kann, John Ortega, Sheetal Shatam, Theodorus Fransen
arXiv (arXiv: 2108.06598v2)
Generated on April 27, 2025
, Findings of the LoResMT 2021 Shared Task on COVID and
Sign Language for Low-resource Languages
Abstract
We present the findings of the LoResMT 2021 shared task which focuses on machine translation (MT)
of COVID-19 data for both low-resource spoken and sign languages. The organization of this task was
conducted as part of the fourth workshop on technologies for machine translation of low resource
languages (LoResMT). Parallel corpora is presented and publicly available which includes the following
directions: English$\leftrightarrow$Irish, English$\leftrightarrow$Marathi, and Taiwanese Sign
language$\leftrightarrow$Traditional Chinese. Training data consists of 8112, 20933 and 128608
segments, respectively. There are additional monolingual data sets for Marathi and English that consist
of 21901 segments. The results presented here are based on entries from a total of eight teams. Three
teams submitted systems for English$\leftrightarrow$Irish while five teams submitted systems for
English$\leftrightarrow$Marathi. Unfortunately, there were no systems submissions for the Taiwanese
Sign language$\leftrightarrow$Traditional Chinese task. Maximum system performance was computed
using BLEU and follow as 36.0 for English--Irish, 34.6 for Irish--English, 24.2 for English--Marathi, and
31.3 for Marathi--English.
Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource
Languages Atul Kr. Ojha1, Chao-Hong
Katharina John Sheetal
Theodorus 1Data
Science Institute, NUIG, Galway 2Panlingua Language Processing LLP, New Delhi 3Potamu Research
Ltd 4University of Colorado at Boulder 5New York University Abstract We present the ■ndings of the
LoResMT 2021 shared task which focuses on machine trans- lation (MT) of COVID-19 data for both
low-resource spoken and sign languages. The orga- nization of this task was conducted as part of the
fourth workshop on technologies for ma- chine translation of low resource languages (LoResMT).
Parallel corpora is presented and pub- licly available which includes the following directions: English
$Irish, English $Marathi, and Taiwanese Sign language $Traditional Chinese. Training data consists of
8112, 20933 and 128608 segments, respectively. There are additional monolingual data sets for
Marathi and English that consist of 21901 segments. The results presented here are based on entries
from a total of eight teams. Three teams submitted systems for English $Irish while ■ve teams sub-
mitted systems for English $Marathi. Unfortunately, there were no systems submissions for the
Taiwanese Sign language $Traditional Chinese task. Maximum system performance was computed
using BLEU and follow as 36.0 for English–Irish, 34.6 for Irish–English, 24.2 for English–Marathi, and
31.3 for Marathi–English. 1 Introduction The workshop on technologies for machine translation of low
resource languages (LoResMT)1 is a yearly workshop which focuses on scienti■c research topics and
technological resources for machine translation (MT) using low-resource languages. Based on the
success of its three predecessors (Liu, 2018; Karakanta et al., 2019, 2020), the fourth LoResMT
workshop into- duces a shared task section based on COVID-19 and sign language data as part of its
research objectives. The hope is to provide assistance with translation for low-resource languages
where it could be needed most during the COVID-19 pandemic.
1https://sites.google.com/view/loresmt/arXiv:2108.06598v2 [cs.CL] 18 Aug 2021
To provide a trajectory of the LoResMT shared task success, a summary of the previous tasks follows.
The ■rst LoResMT shared task (Karakanta et al., 2019) took place in 2019. There, monolingual and
parallel corpora for Bhojpuri, Magahi, Sindhi, and Latvian were provided as training data for two types
of machine translation systems: neural and statistical. As an extension to the ■rst shared task, a