Pivot Through English: Reliably Answering Multilingual
Questions without Document Retrieval
Ivan Montero, Shayne Longpre, Ni Lao, Andrew J. Frank, Christopher DuBois
arXiv (arXiv: 2012.14094v2)
Generated on April 27, 2025
, Pivot Through English: Reliably Answering Multilingual
Questions without Document Retrieval
Abstract
Existing methods for open-retrieval question answering in lower resource languages (LRLs) lag
significantly behind English. They not only suffer from the shortcomings of non-English document
retrieval, but are reliant on language-specific supervision for either the task or translation. We formulate
a task setup more realistic to available resources, that circumvents document retrieval to reliably
transfer knowledge from English to lower resource languages. Assuming a strong English question
answering model or database, we compare and analyze methods that pivot through English: to map
foreign queries to English and then English answers back to target language answers. Within this task
setup we propose Reranked Multilingual Maximal Inner Product Search (RM-MIPS), akin to semantic
similarity retrieval over the English training set with reranking, which outperforms the strongest
baselines by 2.7% on XQuAD and 6.2% on MKQA. Analysis demonstrates the particular efficacy of this
strategy over state-of-the-art alternatives in challenging settings: low-resource languages, with
extensive distractor data and query distribution misalignment. Circumventing retrieval, our analysis
shows this approach offers rapid answer generation to almost any language off-the-shelf, without the
need for any additional training data in the target language.
Pivot Through English: Reliably Answering Multilingual Questions without Document Retrieval Ivan
Montero University of Washington Longpre Apple Inc.
Ni Lao Apple Inc. J. Frank Apple Inc.
DuBois Apple Inc. Abstract Existing methods for
open-retrieval question answering in lower resource languages (LRLs) lag signi■cantly behind English.
They not only suffer from the shortcomings of non- English document retrieval, but are reliant on
language-speci■c supervision for either the task or translation. We formulate a task setup more
realistic to available resources, that cir- cumvents document retrieval to reliably trans- fer knowledge
from English to lower resource languages. Assuming a strong English ques- tion answering model or
database, we com- pare and analyze methods that pivot through English: to map foreign queries to
English and then English answers back to target lan- guage answers. Within this task setup we propose
Reranked Multilingual Maximal Inner Product Search (RM-MIPS), akin to seman- tic similarity retrieval
over the English train- ing set with reranking, which outperforms the strongest baselines by 2.7% on
XQuAD and 6.2% on MKQA. Analysis demonstrates the particular ef■cacy of this strategy over state-
of-the-art alternatives in challenging settings: low-resource languages, with extensive dis- tractor data
and query distribution misalign- ment. Circumventing retrieval, our analysis shows this approach offers
rapid answer gen- eration to almost any language off-the-shelf, without the need for any additional
training data in the target language. 1 Introduction Open-Retrieval question answering (ORQA) has
seen extensive progress in English, signi■cantly outperforming systems in lower resource languages
(LRLs). This advantage is largely driven by the scale of labelled data and open source retrieval tools
that exist predominantly for higher resource languages (HRLs) — usually English. To remedy this
discrepancy, recent work lever- ages English supervision to improve multilingual Figure 1:
Cross-Lingual Pivots (XLP): We intro- duce the “Cross Lingual Pivots” task, formulated as a solution to
multilingual question answering that cir- cumvents document retrieval in low resource languages (LRL).
To answer LRL queries, approaches may lever- age a question-answer system or database in a high
re- source language (HRL), such as English. systems, either by simple translation or zero shot transfer
(Asai et al., 2018; Cui et al., 2019; Charlet et al., 2020). While these approaches have helped
generalize reading comprehension models to new languages, they are of limited practical use without
reliable information retrieval in the target language, which they often implicitly assume. In practice, we
Questions without Document Retrieval
Ivan Montero, Shayne Longpre, Ni Lao, Andrew J. Frank, Christopher DuBois
arXiv (arXiv: 2012.14094v2)
Generated on April 27, 2025
, Pivot Through English: Reliably Answering Multilingual
Questions without Document Retrieval
Abstract
Existing methods for open-retrieval question answering in lower resource languages (LRLs) lag
significantly behind English. They not only suffer from the shortcomings of non-English document
retrieval, but are reliant on language-specific supervision for either the task or translation. We formulate
a task setup more realistic to available resources, that circumvents document retrieval to reliably
transfer knowledge from English to lower resource languages. Assuming a strong English question
answering model or database, we compare and analyze methods that pivot through English: to map
foreign queries to English and then English answers back to target language answers. Within this task
setup we propose Reranked Multilingual Maximal Inner Product Search (RM-MIPS), akin to semantic
similarity retrieval over the English training set with reranking, which outperforms the strongest
baselines by 2.7% on XQuAD and 6.2% on MKQA. Analysis demonstrates the particular efficacy of this
strategy over state-of-the-art alternatives in challenging settings: low-resource languages, with
extensive distractor data and query distribution misalignment. Circumventing retrieval, our analysis
shows this approach offers rapid answer generation to almost any language off-the-shelf, without the
need for any additional training data in the target language.
Pivot Through English: Reliably Answering Multilingual Questions without Document Retrieval Ivan
Montero University of Washington Longpre Apple Inc.
Ni Lao Apple Inc. J. Frank Apple Inc.
DuBois Apple Inc. Abstract Existing methods for
open-retrieval question answering in lower resource languages (LRLs) lag signi■cantly behind English.
They not only suffer from the shortcomings of non- English document retrieval, but are reliant on
language-speci■c supervision for either the task or translation. We formulate a task setup more
realistic to available resources, that cir- cumvents document retrieval to reliably trans- fer knowledge
from English to lower resource languages. Assuming a strong English ques- tion answering model or
database, we com- pare and analyze methods that pivot through English: to map foreign queries to
English and then English answers back to target lan- guage answers. Within this task setup we propose
Reranked Multilingual Maximal Inner Product Search (RM-MIPS), akin to seman- tic similarity retrieval
over the English train- ing set with reranking, which outperforms the strongest baselines by 2.7% on
XQuAD and 6.2% on MKQA. Analysis demonstrates the particular ef■cacy of this strategy over state-
of-the-art alternatives in challenging settings: low-resource languages, with extensive dis- tractor data
and query distribution misalign- ment. Circumventing retrieval, our analysis shows this approach offers
rapid answer gen- eration to almost any language off-the-shelf, without the need for any additional
training data in the target language. 1 Introduction Open-Retrieval question answering (ORQA) has
seen extensive progress in English, signi■cantly outperforming systems in lower resource languages
(LRLs). This advantage is largely driven by the scale of labelled data and open source retrieval tools
that exist predominantly for higher resource languages (HRLs) — usually English. To remedy this
discrepancy, recent work lever- ages English supervision to improve multilingual Figure 1:
Cross-Lingual Pivots (XLP): We intro- duce the “Cross Lingual Pivots” task, formulated as a solution to
multilingual question answering that cir- cumvents document retrieval in low resource languages (LRL).
To answer LRL queries, approaches may lever- age a question-answer system or database in a high
re- source language (HRL), such as English. systems, either by simple translation or zero shot transfer
(Asai et al., 2018; Cui et al., 2019; Charlet et al., 2020). While these approaches have helped
generalize reading comprehension models to new languages, they are of limited practical use without
reliable information retrieval in the target language, which they often implicitly assume. In practice, we