An Empirical Investigation of Multi-bridge Multilingual NMT
models
Anoop Kunchukuttan
arXiv (arXiv: 2110.07304v1)
Generated on April 27, 2025
, An Empirical Investigation of Multi-bridge Multilingual NMT
models
Abstract
In this paper, we present an extensive investigation of multi-bridge, many-to-many multilingual NMT
models (MB-M2M) ie., models trained on non-English language pairs in addition to English-centric
language pairs. In addition to validating previous work which shows that MB-M2M models can
overcome zeroshot translation problems, our analysis reveals the following results about multibridge
models: (1) it is possible to extract a reasonable amount of parallel corpora between non-English
languages for low-resource languages (2) with limited non-English centric data, MB-M2M models are
competitive with or outperform pivot models, (3) MB-M2M models can outperform English-Any models
and perform at par with Any-English models, so a single multilingual NMT system can serve all
translation directions.
arXiv:2110.07304v1 [cs.CL] 14 Oct 2021An Empirical Investigation of Multi-bridge Multilingual N MT
models Anoop Kunchukuttan Microsoft India, Hyderabad Abstract In this
paper, we present an extensive investigation of multi-bridge, many-to-many multilingual NMT models
(MB- M2M) i.e.,models trained on non-English language pairs in addition to English-centric language
pairs. In addition to val- idating previous work which shows that MB-MNMT models can overcome
zeroshot translation problems, our analysis r e- veals the following results about multibridge models: (1)
i t is possible to extract a reasonable amount of parallel corpora be- tween non-English languages for
low-resource languages (2 ) with limited non-English centric data, MB-M2M models are competitive
with or outperform pivot models, (3) MB-M2M models can outperform English-Any models and perform
at par with Any-English models, so a single multilingual NMT system can serve all translation
directions. Introduction Neural Machine Translation has led to signi■cant advances in MT quality in
recent times (Bahdanau, Cho, and Bengio 2015; Wu et al. 2016; Sennrich, Haddow, and Birch 2016b,a;
Vaswani et al. 2017). MT research has seen signi■cant efforts in translation between English and othe
r languages, driven in signi■cant measure by availability of English-centric parallel corpora.
Particularly, multili ngual NMT models using English-centric parallel corpora have shown signi■cant
improvements for translation between En- glish and low-resources languages (Firat, Cho, and Bengio
2016; Johnson et al. 2017). Translation between non- English languages has received lesser attention,
with the default approach being pivot translation (Lakew et al. 2017 ). Pivot translation is a strong
baseline, but needs multiple decoding steps resulting in increased latency and cascadin g errors.
Zeroshot translation using English-centric many-to-many multilingual models (EC-M2M) (Johnson et al.
2017) is promising, but is plagued by problems of spurious corre- lation between input and output
language (Gu et al. 2019; Arivazhagan et al. 2019). Hence, vanilla zeroshot translat ion quality
signi■cantly lags behind pivot translation. Vario us methods have been proposed to address these
limitations by aligning encoder representations (Arivazhagan et al. 2019 ) Copyright © 2022,
Association for the Advancement of Arti■c ial Intelligence (www.aaai.org). All rights reserved.or using
pseudo-parallel corpus between non-English lan- guages during training (Lakew et al. 2017). Recently,
there has been interest in multi-bridge many-to- many multilingual models (MB-M2M, referred to as
multi- bridge models henceforth). These models are trained on di- rect parallel corpora between
non-English languages in add i- tion to English-centric corpora (Rios, M¨ uller, and Sennri ch 2020;
Freitag and Firat 2020; Fan et al. 2020). Such corpora can either be mined from monolingual cor- pora
(Fan et al. 2020) using bitext mining approaches like LASER (Artetxe and Schwenk 2019) and LABSE
(Feng et al. 2020) or extracted from English-centric parall el corpora (Rios, M¨ uller, and Sennrich 2020;
Freitag and Fira t 2020). These works show that multi-bridge models can over- come zeroshot
translation problems and perform at par/bet- ter than pivot approaches. In addition, models using sep-
models
Anoop Kunchukuttan
arXiv (arXiv: 2110.07304v1)
Generated on April 27, 2025
, An Empirical Investigation of Multi-bridge Multilingual NMT
models
Abstract
In this paper, we present an extensive investigation of multi-bridge, many-to-many multilingual NMT
models (MB-M2M) ie., models trained on non-English language pairs in addition to English-centric
language pairs. In addition to validating previous work which shows that MB-M2M models can
overcome zeroshot translation problems, our analysis reveals the following results about multibridge
models: (1) it is possible to extract a reasonable amount of parallel corpora between non-English
languages for low-resource languages (2) with limited non-English centric data, MB-M2M models are
competitive with or outperform pivot models, (3) MB-M2M models can outperform English-Any models
and perform at par with Any-English models, so a single multilingual NMT system can serve all
translation directions.
arXiv:2110.07304v1 [cs.CL] 14 Oct 2021An Empirical Investigation of Multi-bridge Multilingual N MT
models Anoop Kunchukuttan Microsoft India, Hyderabad Abstract In this
paper, we present an extensive investigation of multi-bridge, many-to-many multilingual NMT models
(MB- M2M) i.e.,models trained on non-English language pairs in addition to English-centric language
pairs. In addition to val- idating previous work which shows that MB-MNMT models can overcome
zeroshot translation problems, our analysis r e- veals the following results about multibridge models: (1)
i t is possible to extract a reasonable amount of parallel corpora be- tween non-English languages for
low-resource languages (2 ) with limited non-English centric data, MB-M2M models are competitive
with or outperform pivot models, (3) MB-M2M models can outperform English-Any models and perform
at par with Any-English models, so a single multilingual NMT system can serve all translation
directions. Introduction Neural Machine Translation has led to signi■cant advances in MT quality in
recent times (Bahdanau, Cho, and Bengio 2015; Wu et al. 2016; Sennrich, Haddow, and Birch 2016b,a;
Vaswani et al. 2017). MT research has seen signi■cant efforts in translation between English and othe
r languages, driven in signi■cant measure by availability of English-centric parallel corpora.
Particularly, multili ngual NMT models using English-centric parallel corpora have shown signi■cant
improvements for translation between En- glish and low-resources languages (Firat, Cho, and Bengio
2016; Johnson et al. 2017). Translation between non- English languages has received lesser attention,
with the default approach being pivot translation (Lakew et al. 2017 ). Pivot translation is a strong
baseline, but needs multiple decoding steps resulting in increased latency and cascadin g errors.
Zeroshot translation using English-centric many-to-many multilingual models (EC-M2M) (Johnson et al.
2017) is promising, but is plagued by problems of spurious corre- lation between input and output
language (Gu et al. 2019; Arivazhagan et al. 2019). Hence, vanilla zeroshot translat ion quality
signi■cantly lags behind pivot translation. Vario us methods have been proposed to address these
limitations by aligning encoder representations (Arivazhagan et al. 2019 ) Copyright © 2022,
Association for the Advancement of Arti■c ial Intelligence (www.aaai.org). All rights reserved.or using
pseudo-parallel corpus between non-English lan- guages during training (Lakew et al. 2017). Recently,
there has been interest in multi-bridge many-to- many multilingual models (MB-M2M, referred to as
multi- bridge models henceforth). These models are trained on di- rect parallel corpora between
non-English languages in add i- tion to English-centric corpora (Rios, M¨ uller, and Sennri ch 2020;
Freitag and Firat 2020; Fan et al. 2020). Such corpora can either be mined from monolingual cor- pora
(Fan et al. 2020) using bitext mining approaches like LASER (Artetxe and Schwenk 2019) and LABSE
(Feng et al. 2020) or extracted from English-centric parall el corpora (Rios, M¨ uller, and Sennrich 2020;
Freitag and Fira t 2020). These works show that multi-bridge models can over- come zeroshot
translation problems and perform at par/bet- ter than pivot approaches. In addition, models using sep-